string1.split("(?=-)");
This works because split actually accepts a regular expression . What you actually see is a “positive result of a zero-width search”.
I would like to explain more, but my daughter wants to play tea. :)
Edit: Back!
To explain this, I will first show you another split operation:
"Ram-sita-laxman".split("");
This splits your line on each line of zero length. There is a zero-length string between each character. Therefore, the result:
["", "R", "a", "m", "-", "s", "i", "t", "a", "-", "l", "a", "x", "m", "a", "n"]
Now I change my regular expression ( "" ) only to match strings of zero length if followed by a dash.
"Ram-sita-laxman".split("(?=-)"); ["Ram", "-sita", "-laxman"]
In this example ?= Means "lookahead". In particular, this means a “positive outlook”. Why is it positive? Because you can also have a negative lookahead ( ?! ) That will break into every line of zero length that is not , and then a dash:
"Ram-sita-laxman".split("(?!-)"); ["", "R", "a", "m-", "s", "i", "t", "a-", "l", "a", "x", "m", "a", "n"]
You can also have a positive lookbehind ( ?<= ) That will be split into every line of zero length preceded by a dash:
"Ram-sita-laxman".split("(?<=-)"); ["Ram-", "sita-", "laxman"]
Finally, you can also have a negative lookbehind ( ?<! ) That will break into every line of zero length that is not before which a dash indicates:
"Ram-sita-laxman".split("(?<!-)"); ["", "R", "a", "m", "-s", "i", "t", "a", "-l", "a", "x", "m", "a", "n"]
These four expressions are known collectively as "inverse expressions."
Bonus: combining them
I just wanted to show an example that I came across recently that combines two kinds of searches. Suppose you want to split the CapitalCase identifier into your tokens:
"MyAwesomeClass" => ["My", "Awesome", "Class"]
You can accomplish this using this regex:
"MyAwesomeClass".split("(?<=[az])(?=[AZ])");
This is split into every zero-length string, preceded by a lowercase letter ( (?<=[az]) ), followed by an uppercase letter ( (?=[AZ]) ).
This method also works with camelCase identifiers.