JavaScript automatic semicolon indiscriminately

I am writing a JavaScript preprocessor that automatically inserts semicolons in places where necessary. Do not ask why.

Now I know that a common way to solve this problem is to write a JavaScript parser and add semicolons, if necessary, in accordance with the specifications. However, I do not want to do this for the following reasons:

  • I do not want to write a full-fledged parser.
  • I want to keep comments and spaces.

I already (correctly) implemented the second and third rule for automatically inserting a semicolon using a simple scanner.

The first rule, however, is more difficult to implement. Therefore, I have three questions:

  • Is it possible to implement the first rule with a simple scanner with lookaheads and lookbehinds?
  • If possible, has someone already done this?
  • If not, how can I solve this problem?

For completeness, here are three rules:

  • When, when a program is analyzed from left to right, a token (called an offensive token) appears that is not allowed by any grammar production, then a semicolon is automatically inserted before the offending token if one or more of the following conditions is true:

    • The offensive token is separated from the previous token by at least one LineTerminator.

    • Offensive Token } .

  • When, when a program is parsed from left to right, the end of the input token stream is encountered, and the parser cannot parse the input token stream as one complete ECMAScript program, then a semicolon is automatically inserted at the end of the input stream.

  • When, when a program is analyzed from left to right, there is a token that is allowed by some grammar product, but production is limited production, and the token will be the first token for the terminal or nonterminal immediately after the annotation “[no LineTerminator here]” within the limited production (and therefore , such a token is called a limited token), and a limited token is separated from the previous token by at least one Line Terminator, then a semicolon is automatically inserted before the limited t Ken.

However, there is an additional redefinition condition for the previous rules: a semicolon is never inserted automatically if the semicolon is then parsed as an empty statement or if this semicolon becomes one of two semicolons in the header a for (section 12.6.3 ).

+4
source share
1 answer

It is not possible to achieve what you want with a scanner (tokenizer). This is because the answer "do we need a semicolon?" you need to answer "Is the next token an offensive token?" and to answer that, you need JavaScript grammar, because the offensive token is defined as what the grammar does not allow in this place.

I had some success with creating a list of all tokens and then processing this list in the second step (so I would have some context). Using this approach, you can fix some places by writing code as follows:

  • Iterations over tokens back (starting from the last, going to the beginning of the file)
  • If the current token is IF , FOR , WHILE , VAR , etc .:
    • Skip spaces and comments before the token
    • If the current token is not ; then insert one

This approach works because errors are not random. People always make the same mistakes. Most of the time people forget ; after the end of the line and look for the missing one ; before a keyword is a good way to find them.

But this approach will only bring you so far. If you must reliably find all missing semicolons, you must write a JavaScript parser (or reuse an existing one).

+4
source

All Articles