Split a character string several times every two characters

Question

Split a character string several times every two characters

I have a character column in my data framework that looks like

df<- data.frame(a=c("AaBbCC","AABBCC","AAbbCC"))#df a 1 AaBbCC 2 AABBCC 3 AAbbCC

I would like to split this column every two characters. Therefore, in this case, I would like to get three columns named VA,VB,VC . I tried

 library(tidyr) library(dplyr) df<- data.frame(a=c("AaBbCC","AABBCC","AAbbCC"))%>% separate(a,c(paste("V",LETTERS[1:3],sep="")),sep=c(2,2)) VA VB VC 1 Aa BbCC 2 AA BBCC 3 AA bbCC

but this is not the desired result. I like to have the result, which is now in VC , split into VB (all letters B) and VC (all letters C). How do I get R to split every two characters. The row length in the column is always the same for each row (6 in this example). I will have lines 10 in length.

+6

string r dataframe tidyr

user2386786 Jan 9 '16 at 15:10

source share

2 answers

We could do it with base R

 read.csv(text=gsub('(..)(?!$)', '\\1,', df$a, perl=TRUE),col.names=paste0("V", LETTERS[1:3]), header=FALSE) # VA VB VC #1 Aa Bb CC #2 AA BB CC #3 AA bb CC

If we read directly from the file, another read.fwf parameter

 read.fwf(file="yourfile.txt", widths=c(2,2,2), skip=1)

+4

akrun Jan 9 '16 at 16:13

source share

Jaap · Accepted Answer · 2016-01-09T15:40:07+0000

You were very close. You need to specify the separation positions as sep = c(2,4) instead of sep = c(2,2) :

 df <- separate(df, a, c(paste0("V",LETTERS[1:3])),sep = c(2,4))

You are getting:

 > df VA VB VC 1 Aa Bb CC 2 AA BB CC 3 AA bb CC

In the R database, you can do (borrow from @rawr comment):

 l <- ave(as.character(df$a), FUN = function(x) strsplit(x, '(?<=..)', perl = TRUE)) df <- data.frame(do.call('rbind', l))

which gives:

 > df X1 X2 X3 1 Aa Bb CC 2 AA BB CC 3 AA bb CC

Split a character string several times every two characters

More articles: