Tidyr split only the first n instances

I have data.frame in R which for simplicity has one column that I want to split. It looks like this:

V1
Value_is_the_best_one
This_is_the_prettiest_thing_I've_ever_seen
Here_is_the_next_example_of_what_I_want

My real data is very large (millions of rows), so I would like to use a separate tidyr function (because it is amazingly fast) to separate JUST the first few instances. I want the result to be as follows:

V1       V2     V3     V4 
Value    is     the    best_one
This     is     the    prettiest_thing_I've_ever_seen
Here     is     the    next_example_of_what_I_want

As you can see, the separator _, column V4 can have different numbers of separators. I want to keep V4 (don't drop it), but I don’t have to worry about how many things there are. There will always be four columns (i.e. none of my rows have only V1-V3).

Here is my tidyr start command, with which I worked:

separate(df, V1, c("V1", "V2", "V3", "V4"), sep="_")

V4 ( , ).

+4
2

extra "merge". , .

separate(df, V1, c("V1", "V2", "V3", "V4"), extra = "merge")

     V1 V2  V3                             V4
1 Value is the                       best_one
2  This is the prettiest_thing_I've_ever_seen
3  Here is the    next_example_of_what_I_want
+11

extract

library(tidyr)
extract(df1, V1, into = paste0("V", 1:4), "([^_]+)_([^_]+)_([^_]+)_(.*)")
#      V1 V2  V3                             V4
# 1 Value is the                       best_one
# 2  This is the prettiest_thing_I've_ever_seen
# 3  Here is the    next_example_of_what_I_want

stri_split library(stringi),

library(stringi)
do.call(rbind, stri_split(df1$V1, fixed="_", n=4))
+4

All Articles