R - Using str_split and unlist to create two columns

I have a data set that has dates and interest rates in the same column. I need to split these two numbers into two separate columns, however, when I use the following code:

Split <- str_split(df$Dates, "[ ]", n = 2) Dates <- unlist(Split)[1] Rates <- unlist(Split)[2] 

It returns only the first "value" of each element, that is, "1971-04-01" for dates and "7.43" for bets. I need it to return all the values ​​for part of the split line and the same for the second part of split lines

The following is part of the data set, summary lines = 518.

  1971-04-01 7.31 1971-05-01 7.43 1971-06-01 7.53 1971-07-01 7.60 1971-08-01 7.70 1971-09-01 7.69 1971-10-01 7.63 1971-11-01 7.55 1971-12-01 7.48 1972-01-01 7.44 

thanks

+7
r strsplit
source share
7 answers

Failed to do

 Split <- strsplit(as.character(df$Dates), " ", fixed = TRUE) Dates <- sapply(Split, "[", 1) Rates <- sapply(Split, "[", 2) 
+6
source share

Using data @ user2583119 (specify the minimum reproducible code, including the data set):

 library(qdap) colsplit2df(data.frame(Split), sep = " ") ## X1 X2 ## 1 1971-06-01 7.53 ## 2 1971-05-01 7.43 ## 3 1971-06-01 7.53 
+2
source share

You can use reshape2::colsplit

 library(reshape2) colsplit(df$Dates, ' ', names = c('Dates','Rates')) # Dates Rates # 1 1971-04-01 7.31 # 2 1971-05-01 7.43 # 3 1971-06-01 7.53 # 4 1971-07-01 7.60 # 5 1971-08-01 7.70 # 6 1971-09-01 7.69 # 7 1971-10-01 7.63 # 8 1971-11-01 7.55 # 9 1971-12-01 7.48 # 10 1972-01-01 7.44 
+2
source share

I may be biased, but I would suggest my cSplit function for this problem.

Firstly, I assume that we start with the following (single column) data.frame (where there are several spaces between the value of "date" and the value of "rate").

 df <- data.frame( Date = c("1971-04-01 7.31", "1971-05-01 7.43", "1971-06-01 7.53", "1971-07-01 7.60", "1971-08-01 7.70", "1971-09-01 7.69", "1971-10-01 7.63", "1971-11-01 7.55", "1971-12-01 7.48", "1972-01-01 7.44")) 

Next, get the cSplit function from my GitHub Gist and use it. You can divide by regex (here are a few spaces).

 cSplit(df, "Date", "\\s+", fixed = FALSE) # Date_1 Date_2 # 1: 1971-04-01 7.31 # 2: 1971-05-01 7.43 # 3: 1971-06-01 7.53 # 4: 1971-07-01 7.60 # 5: 1971-08-01 7.70 # 6: 1971-09-01 7.69 # 7: 1971-10-01 7.63 # 8: 1971-11-01 7.55 # 9: 1971-12-01 7.48 # 10: 1972-01-01 7.44 

Since the function converts a data.frame to data.table , you have access to setnames , which will allow you to rename your columns in place.

 setnames(cSplit(df, "Date", "\\s+", fixed = FALSE), c("Dates", "Rates"))[] # Dates Rates # 1: 1971-04-01 7.31 # 2: 1971-05-01 7.43 # 3: 1971-06-01 7.53 # 4: 1971-07-01 7.60 # 5: 1971-08-01 7.70 # 6: 1971-09-01 7.69 # 7: 1971-10-01 7.63 # 8: 1971-11-01 7.55 # 9: 1971-12-01 7.48 # 10: 1972-01-01 7.44 
+2
source share

also:

  Split <- c("1971-06-01 7.53", "1971-05-01 7.43", "1971-06-01 7.53") 

Your code selects only the first observation.

  Str <- unlist(str_split(Split, "[ ]", n=2)) Str[1] #[1] "1971-06-01" 

If you look at the output of unlist (..), the dates are followed by values. That way you can use a boolean index.

 Str[c(T,F)] #[1] "1971-06-01" "1971-05-01" "1971-06-01" as.numeric(Str[c(F,T)]) #[1] 7.53 7.43 7.53 

You can convert to two columns of a data frame from Split using read.table

  read.table(text=Split, header=F, sep="",stringsAsFactors=F) # V1 V2 # 1 1971-06-01 7.53 # 2 1971-05-01 7.43 # 3 1971-06-01 7.53 
+1
source share

Try the following:

 Split <- c("1971-06-01 7.53", "1971-05-01 7.43", "1971-06-01 7.53") df <- unlist(str_split(string = Split, pattern = "\\s")) df 
0
source share
 df <- data.frame( Date = c("1971-04-01 7.31", "1971-05-01 7.43", "1971-06-01 7.53", "1971-07-01 7.60", "1971-08-01 7.70", "1971-09-01 7.69", "1971-10-01 7.63", "1971-11-01 7.55", "1971-12-01 7.48", "1972-01-01 7.44")) do.call(rbind, strsplit(as.character(df$Date), split = '\\s+', fixed = FALSE)) 
0
source share

All Articles