R - Divide by "\ n" or three spaces and save at least one space when there are three spaces

Hope I can explain it so easily for you. I will need this, since the missing information in the line will be marked as three spaces and, surprisingly, does not execute \n for the next piece of information.

Suppose I have a line like:

 string <- "abc def ghi jkl" 

I want the result of a regex expression (possibly with strsplit() with a more advanced function) to be:

 [[1]] [1] "abc" "" "def" "ghi" "" "jkl" 

It breaks when it finds \n , and that it breaks and inserts a space when it finds three spaces. I need to mark this missing information as another value. If not, this breaks my script, assuming the following information is, for example, three spaces combined with a def line.

thanks

+7
split regex r
source share
2 answers

Here are two solutions that use strsplit but differ in how they are separated:

1) split on newline Delete all lines of a newline giving s1 , and then add a new line after every third character giving s2 . Separate s2 by newlines and replace each occurrence of three consecutive spaces with an empty string.

 Split <- function(string) { s1 <- gsub("\n", "", string) s2 <- gsub("(.{3})", "\\1\n", s1) spl <- strsplit(s2, "\n") lapply(spl, function(s) replace(s, s == " ", "")) } # test string <- "abc\n def\nghi jkl" Split(string) ## [[1]] ## [1] "abc" "" "def" "ghi" "" "jkl" 

2) divided by zero width 3 char regexp Remove new lines and split using the specified regular expression. Finally, replace each subsequent three spaces with an empty string.

 Split2 <- function(string) { s1 <- gsub("\n", "", string) spl <- strsplit(s1, "(?<=...)", perl = TRUE) lapply(spl, function(s) replace(s, s == " ", "")) } # test string <- "abc\n def\nghi jkl" Split2(string) ## [[1]] ## [1] "abc" "" "def" "ghi" "" "jkl" 

Note: 1 . Note that other answers to this question do not work for the next input line (which has two empty fields in a row), but the answers here correctly recognize two empty 3 character fields in a row after the abc field:

 string2 <- "abc\n def\nghi jkl" # 6 spaces before d, 3 spaces before j Split(string2) ## [[1]] ## [1] "abc" "" "" "def" "ghi" "" "jkl" Split2(string2) ## [[1]] ## [1] "abc" "" "" "def" "ghi" "" "jkl" 

Note 2: The two above solutions can also be well expressed using the magrittr pipeline:

 library(magrittr) string %>% gsub(pattern = "\n", replacement = "") %>% gsub(pattern = "(.{3})", replacement = "\\1\n") %>% strsplit("\n") %>% lapply(function(s) replace(s, s == " ", "")) ## [[1]] ## [1] "abc" "" "def" "ghi" "" "jkl" library(magrittr) string %>% gsub(pattern = "\n", replacement = "") %>% strsplit("(?<=...)", perl = TRUE) %>% lapply(function(s) replace(s, s == " ", "")) ## [[1]] ## [1] "abc" "" "def" "ghi" "" "jkl" 
+5
source share
 (string <- "abc def ghi jkl") # [1] "abc\n def\nghi jkl" rle(strsplit(string, '\\s')[[1]])$values # [1] "abc" "" "def" "ghi" "" "jkl" 
+3
source share