JavaScript: avoiding blank lines with String.split and regex

Question

JavaScript: avoiding blank lines with String.split and regex

I am creating a syntax shortcut and I am using String.split to create tokens from the input string. The first problem is that String.split creates a huge amount of empty lines, which leads to the fact that everything will be rather slow than otherwise.

For example, "***".split(/(\*)/)→ ["", "*", "", "*", "", "*", ""]. Is there any way to avoid this?

Another problem is the priority of the expression in the regular expression itself. Let's say I'm trying to parse a C-style multi-line comment. That is /* comment */. Now suppose the input string "/****/". If I used the following regular expression, it would work, but would produce many additional tokens (and all these empty lines!).

/(\/\*|\*\/|\*)/

It's best to read /*'s, */and then read everything else *in one token. That is, the best result for the specified string is ["/*", "**", "*/"]. However, when using a regex that should do this, I get bad results. The regular expression looks like this: /(\/\*|\*\/|\*+)/.

The result of this expression, however, is as follows: ["/*", "***", "/"]. I guess this is because the last part is greedy, so she steals the match from another part.

The only solution I found is to make a negative expression like this:

/(\/\*|\*\/|\*+(?!\/)/

This gives the expected result, but it is very slow compared to the other, and it has an effect for large strings.

Is there a solution to any of these problems?

+4

javascript split regex tokenize

user2503048 Nov 11 '13 at 23:36

2

anubhava · Answer 1 · 2013-11-11T23:42:29+0000

, :

arr = "***".split(/(?=\*)/);
//=> ["*", "*", "*"]

filter(Boolean) :

arr = "***".split(/(\*)/).filter(Boolean);
//=> ["*", "*", "*"]

georg · Answer 2 · 2013-11-12T00:38:21+0000

match, split:

> str = "/****/"
"/****/"
> str.match(/(\/\*)(.*?)(\*\/)/)
["/****/", "/*", "**", "*/"]

, ? .

JavaScript: avoiding blank lines with String.split and regex

More articles: