R - Using str_split and unlist to create two columns

Question

R - Using str_split and unlist to create two columns

I have a data set that has dates and interest rates in the same column. I need to split these two numbers into two separate columns, however, when I use the following code:

Split <- str_split(df$Dates, "[ ]", n = 2) Dates <- unlist(Split)[1] Rates <- unlist(Split)[2]

It returns only the first "value" of each element, that is, "1971-04-01" for dates and "7.43" for bets. I need it to return all the values for part of the split line and the same for the second part of split lines

The following is part of the data set, summary lines = 518.

  1971-04-01 7.31 1971-05-01 7.43 1971-06-01 7.53 1971-07-01 7.60 1971-08-01 7.70 1971-09-01 7.69 1971-10-01 7.63 1971-11-01 7.55 1971-12-01 7.48 1972-01-01 7.44

thanks

+7

r strsplit

j riot Jun 30 '14 at 18:14

source share

7 answers

Using data @ user2583119 (specify the minimum reproducible code, including the data set):

 library(qdap) colsplit2df(data.frame(Split), sep = " ") ## X1 X2 ## 1 1971-06-01 7.53 ## 2 1971-05-01 7.43 ## 3 1971-06-01 7.53

+2

Tyler rinker Jun 30 '14 at 18:43

source share

You can use reshape2::colsplit

 library(reshape2) colsplit(df$Dates, ' ', names = c('Dates','Rates')) # Dates Rates # 1 1971-04-01 7.31 # 2 1971-05-01 7.43 # 3 1971-06-01 7.53 # 4 1971-07-01 7.60 # 5 1971-08-01 7.70 # 6 1971-09-01 7.69 # 7 1971-10-01 7.63 # 8 1971-11-01 7.55 # 9 1971-12-01 7.48 # 10 1972-01-01 7.44

+2

mnel Jun 30 '14 at 23:53

source share

I may be biased, but I would suggest my cSplit function for this problem.

Firstly, I assume that we start with the following (single column) data.frame (where there are several spaces between the value of "date" and the value of "rate").

 df <- data.frame( Date = c("1971-04-01 7.31", "1971-05-01 7.43", "1971-06-01 7.53", "1971-07-01 7.60", "1971-08-01 7.70", "1971-09-01 7.69", "1971-10-01 7.63", "1971-11-01 7.55", "1971-12-01 7.48", "1972-01-01 7.44"))

Next, get the cSplit function from my GitHub Gist and use it. You can divide by regex (here are a few spaces).

 cSplit(df, "Date", "\\s+", fixed = FALSE) # Date_1 Date_2 # 1: 1971-04-01 7.31 # 2: 1971-05-01 7.43 # 3: 1971-06-01 7.53 # 4: 1971-07-01 7.60 # 5: 1971-08-01 7.70 # 6: 1971-09-01 7.69 # 7: 1971-10-01 7.63 # 8: 1971-11-01 7.55 # 9: 1971-12-01 7.48 # 10: 1972-01-01 7.44

Since the function converts a data.frame to data.table , you have access to setnames , which will allow you to rename your columns in place.

 setnames(cSplit(df, "Date", "\\s+", fixed = FALSE), c("Dates", "Rates"))[] # Dates Rates # 1: 1971-04-01 7.31 # 2: 1971-05-01 7.43 # 3: 1971-06-01 7.53 # 4: 1971-07-01 7.60 # 5: 1971-08-01 7.70 # 6: 1971-09-01 7.69 # 7: 1971-10-01 7.63 # 8: 1971-11-01 7.55 # 9: 1971-12-01 7.48 # 10: 1972-01-01 7.44

+2

A5C1D2H2I1M1N2O1R2T1 Jul 01 '14 at 4:13

source share

also:

  Split <- c("1971-06-01 7.53", "1971-05-01 7.43", "1971-06-01 7.53")

Your code selects only the first observation.

  Str <- unlist(str_split(Split, "[ ]", n=2)) Str[1] #[1] "1971-06-01"

If you look at the output of unlist (..), the dates are followed by values. That way you can use a boolean index.

 Str[c(T,F)] #[1] "1971-06-01" "1971-05-01" "1971-06-01" as.numeric(Str[c(F,T)]) #[1] 7.53 7.43 7.53

You can convert to two columns of a data frame from Split using read.table

  read.table(text=Split, header=F, sep="",stringsAsFactors=F) # V1 V2 # 1 1971-06-01 7.53 # 2 1971-05-01 7.43 # 3 1971-06-01 7.53

+1

akrun Jun 30 '14 at 18:43

source share

Try the following:

 Split <- c("1971-06-01 7.53", "1971-05-01 7.43", "1971-06-01 7.53") df <- unlist(str_split(string = Split, pattern = "\\s")) df

0

lawyeR Jun 30 '14 at 18:24

source share

 df <- data.frame( Date = c("1971-04-01 7.31", "1971-05-01 7.43", "1971-06-01 7.53", "1971-07-01 7.60", "1971-08-01 7.70", "1971-09-01 7.69", "1971-10-01 7.63", "1971-11-01 7.55", "1971-12-01 7.48", "1972-01-01 7.44")) do.call(rbind, strsplit(as.character(df$Date), split = '\\s+', fixed = FALSE))

0

Liangbo huang Mar 09 '17 at 3:34

source share

David Arenburg · Accepted Answer · 2014-06-30T18:24:38+0000

Failed to do

 Split <- strsplit(as.character(df$Dates), " ", fixed = TRUE) Dates <- sapply(Split, "[", 1) Rates <- sapply(Split, "[", 2)

R - Using str_split and unlist to create two columns

More articles: