Why does splitting an empty string return a non-empty array?

Splitting into an empty string returns an array of size 1:

scala> "".split(',') res1: Array[String] = Array("") 

Note that this returns an empty array:

 scala> ",,,,".split(',') res2: Array[String] = Array() 

Please explain:)

+76
java scala
Feb 11 2018-11-11T00:
source share
9 answers

For the same reason that

 ",test" split ',' 

and

 ",test," split ',' 

will return an array of size 2. All before the first match is returned as the first element.

+28
Feb 11 '11 at 1:52
source share

If you divide the orange zero time, you will have exactly one part - the orange.

+59
Feb 11 '11 at 4:27
source share

Splitting an empty string returns an empty string as the first element. If the delimiter is not found in the target string, you will get an array of size 1 that contains the original string, even if it is empty.

+40
Feb 11 2018-11-11T00:
source share

The separation methods of Java and Scala work in two stages, for example:

  • First, separate the line by separator. The natural consequence is that if the string does not contain a separator, a singleton array containing only the input string is returned,
  • Secondly, delete all the far right empty lines. For this reason, ",,".split(",") returns an empty array.

According to this, the result "".split(",") should be an empty array due to the second step, right?

Must. Unfortunately, this is an artificially introduced corner case. And this is bad, but at least it's java.util.regex.Pattern in java.util.regex.Pattern if you remember to take a look at the documentation:

For n == 0, the result is the same as for n <0, except that trailing blank lines will not be returned. (Note that the case where the input itself is an empty string is special, as described above, and the limit parameter is not applied there.)

Solution 1. Always pass -1 as the second parameter.

So, I advise you to always pass n == -1 as the second parameter (this will skip the second step above) if you donโ€™t know what exactly you want to achieve / you are sure that an empty line is not what your program will get in quality of input.

Solution 2. Use the Guava Splitter class

If you are already using Guava in your project, you can try the Splitter class (documentation) . It has a very rich API and makes your code very easy to understand.

 Splitter.on(".").split(".abc") // "", "a", "b", "c", "" Splitter.on(",").omitEmptyStrings().split("a,,b,,c") // "a", "b", "c" Splitter.on(CharMatcher.anyOf(",.")).split("a,bc") // "a", "b", "c" Splitter.onPattern("=>?").split("a=b=>c") // "a", "b", "c" Splitter.on(",").limit(2).split("a,b,c") // "a", "b,c" 
+27
Jun 13 '16 at 18:13
source share

"a".split(",") โ†’ "a" therefore "".split(",") โ†’ ""

+23
Apr 15 '13 at 11:06 on
source share

In all programming languages, I know that an empty string is still a valid string. Thus, splitting using any separator will always return a single array of elements, where this element is an empty string. If it was an empty (not empty) String, this would be another problem.

+4
Feb 11 2018-11-11T00:
source share

This split behavior is inherited from Java, for better or worse ...
Scala does not override the definition from the String primitive.

Note that you can use the limit argument to change the behavior :

The limit parameter controls the number of uses of the template and, therefore, affects the length of the resulting array. If the limit n is greater than zero, the pattern will be applied no more than n - 1 times, the length of the array will be no more than n, and the last element of the array will contain all input data outside the last matched separator. If n is not positive, the pattern will be applied as many times as possible, and the array can be of any length. If n is zero, the pattern will be applied as many times as possible, the array can be of any length, and the final empty lines will be discarded.

i.e. you can set limit=-1 to get the behavior of (all?) other languages:

 @ ",a,,b,,".split(",") res1: Array[String] = Array("", "a", "", "b") @ ",a,,b,,".split(",", -1) // limit=-1 res2: Array[String] = Array("", "a", "", "b", "", "") 



It seems that the Java behavior looks rather confusing , but:

The behavior above can be observed at least from Java 5 to Java 8.

An attempt was made to change the behavior to return an empty array when splitting an empty string in JDK-6559590 . However, he was soon returned to JDK-8028321 when he causes regression in various places. This change never hits the original release of Java 8.

Note. The split method was not in Java from the very beginning (it is not in version 1.0.2 ), but actually there are less than 1.4 (for example, see JSR51 around 2002). I'm still investigating ...

It is not clear why Java chose this in the first place (my suspicion that it was originally an oversight / error in the "edge case"), but now it is irrevocably baked in this language, and therefore it remains .

+1
Oct. 20 '17 at 4:47 on
source share

An empty string has no special status when splitting a string. You can use:

 Some(str) .filter(_ != "") .map(_.split(",")) .getOrElse(Array()) 
0
04 Oct '18 at 10:19
source share

You can use the following below to return an empty string.

 String[] files = new String[0]; 
0
Dec 13 '18 at 12:23
source share



All Articles