Divide the string by char

scala has a standard way of splitting a string in StringOps.split

this behavior surprised me somewhat.

To demonstrate using the quick convenience feature

 def sp(str: String) = str.split('.').toList 

the following expressions evaluate to true

 (sp("") == List("")) //expected (sp(".") == List()) //I would have expected List("", "") (sp("ab") == List("a", "b")) //expected (sp(".b") == List("", "b")) //expected (sp("a.") == List("a")) //I would have expected List("a", "") (sp("..") == List()) // I would have expected List("", "", "") (sp(".a.") == List("", "a")) // I would have expected List("", "a", "") 

so I expected split to return an array with (number of separator occurrences) + 1 element, but this is clearly not the case.

This is almost higher, but delete all trailing blank lines, but this is not true for splitting blank lines.

I can not define the template here. What rules does StringOps.split follow?

For bonus points, is there a good way (without copying / adding lines too much) to get the split I expect?

+5
source share
4 answers

For the curious, you can find the code here. https://github.com/scala/scala/blob/v2.12.0-M1/src/library/scala/collection/immutable/StringLike.scala

See the split function with the character as an argument (line 206).

I think the general pattern going through here is all the missing empty splitting results that are ignored.

Except for the first, for which "if the char delimiter is not found, just send the entire string." Logic is applied.

I am trying to find if there is any project documentation around them.

Also, if you use a string instead of char for the delimiter, it will revert to java regex split. As @LRLucena has already been mentioned, if you provide a limit parameter with a value greater than size, you will get your final empty results. see http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String,%20int)

+3
source

You can use split with regex. I'm not sure, but I assume that the second parameter is the largest size of the resulting array.

 def sp(str: String) = str.split("\\.", str.length+1).toList 
+2
source

Seems to be consistent with these three rules:

1) Empty substrings are discarded.

2) An empty substring is considered final before it is considered leading, if applicable.

3) The first case, without delimiters, is an exception.

0
source

split follows the behavior of http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String)

This separation is around the delimiter character with the following exceptions:

  • Regardless of everything else, splitting an empty string will always give Array("")
  • Any trailing empty substrings are removed
  • Surrogate characters correspond only if the matching character is not part of a surrogate pair.
0
source

All Articles