It is not possible to achieve what you want with a scanner (tokenizer). This is because the answer "do we need a semicolon?" you need to answer "Is the next token an offensive token?" and to answer that, you need JavaScript grammar, because the offensive token is defined as what the grammar does not allow in this place.
I had some success with creating a list of all tokens and then processing this list in the second step (so I would have some context). Using this approach, you can fix some places by writing code as follows:
- Iterations over tokens back (starting from the last, going to the beginning of the file)
- If the current token is
IF , FOR , WHILE , VAR , etc .:- Skip spaces and comments before the token
- If the current token is not
; then insert one
This approach works because errors are not random. People always make the same mistakes. Most of the time people forget ; after the end of the line and look for the missing one ; before a keyword is a good way to find them.
But this approach will only bring you so far. If you must reliably find all missing semicolons, you must write a JavaScript parser (or reuse an existing one).
source share