Currently (including Java 8) this can be done using split() , but in the real world this approach is not used, since it looks like it is based on an error (look-behind in Java should have an obvious maximum length, but this solution uses \w+ , which does not comply with this restriction). Instead, use the Pattern and Matcher classes to avoid over compromising thins and serving the hell, as this behavior may change in future versions of Java or in Java-like environments such as Android.
Is this what you are looking for?
(you can replace \\w with \\S to include all non-spatial characters, but in this example I will leave \\w since it is easier to read the regular expression with \\w\\s and then \\S\\s ) sub>
String input = "one two three four five six seven"; String[] pairs = input.split("(?<!\\G\\w+)\\s"); System.out.println(Arrays.toString(pairs));
exit:
[one two, three four, five six, seven]
\G is the previous match, (?<!regex) is the negative lookbehind.
In split we try
- find spaces →
\\S - which are not predicted →
(?<!negativeLookBehind) - some word →
\\w+ - with previously matched (space) →
\\G - before that →
\\G\\w+ .
The only confusion I encountered was how it would work for the first space, since we want that space to be ignored. Important information is that \\G at start corresponds to the beginning of a line ^ .
So, before the first iterative regular expression looks like (?<!^\\w+) , and since the first do space has ^\\w+ before, it cannot be a match for split. The next space will not have this problem, so it will be consistent, and information about it (for example, its position in the input String) will be stored in \\G and used later in the next negative image.
So, for the 3rd space, the regular expression will check if there is a previously associated space \\G and the word \\w+ before it. Since the result of this test will be positive, a negative appearance will not take it, therefore this space will not be consistent, but the 4th space will not have this problem, because the space before it will not be the same as in \\G (it will be have a different position in the input String).
Also, if someone wants to split into let, say, every third space, you can use this form (based on @maybeWeCouldStealAVan answer , which was deleted when I posted this fragment of the answer)
input.split("(?<=\\G\\w{1,100}\\s\\w{1,100}\\s\\w{1,100})\\s")
Instead of 100, you can use some larger value, which will be at least the size of the length of the longest word in String.
I just noticed that we can use + instead of {1,maxWordLength} if we want to divide with every odd number, like every third, fifth, seventh, for example
String data = "0,0,1,2,4,5,3,4,6,1,3,3,4,5,1,1"; String[] array = data.split("(?<=\\G\\d+,\\d+,\\d+,\\d+,\\d+),");