Inconsistent behavior between str_split and strsplit

The documentation for str_split in the stringr package indicates that for the template argument:

If "" is split into individual characters.

which suggests that he behaves the same way as strsplit in this regard. Nevertheless,

 library(stringr) str_split("abcab","") [[1]] [1] "" "a" "b" "c" "a" "b" 

with leading blank line. This compares to

 strsplit("abcab","") [[1]] [1] "a" "b" "c" "a" "b" 

Leading blank lines appear to be normal behavior when breaking up non-empty lines,

 strsplit("abcab","ab") [[1]] [1] "" "c" 

but even then str_split creates the extra length of the empty string:

 str_split("abcab","ab") [[1]] [1] "" "c" "" 

Is this mismatch an error, a feature, an error in the documentation, or just another concept of "expected behavior"?

+7
source share
1 answer

If you use commas as separators, "expected" (your mileage may vary), the result is more obvious:

 # expect "" "2" "3" "4" "" strsplit(",2,3,4,", ",") # [[1]] # [1] "" "2" "3" "4" str_split(",2,3,4,", ",") # [[1]] # [1] "" "2" "3" "4" "" 

If I have commas n , then I expect to return elements (n+1) . Therefore, I prefer the results from str_split . Nevertheless, I would not call it a mistake in strsplit , since in execution it is advertised:

(from? strplit) Note that this means that if there is a match at the beginning of a (non-empty) line, the first output element is "", but if there is a match at the end of the line, the result is the same as when deleting the match.

"" harder because there is no way to count the number of lines "" in a line. Therefore, to consider it as a special case seems justified.

(from? str_split) If "" is split into individual characters.

Based on this, I suggest you find a mistake and should take advice and report it!

+4
source

All Articles