Parsing error token recovery (lemon)

I use Lemon as a parser generator; its error handling is the same as yacc and bison if you don't know Lemon.

Lemon has the ability to define an error marker in a rule set to catch parsing errors. The default behavior of the generated parser is to destroy the token causing the error; is there any way to override this behavior so that I can save the token?

Here is an example to show what happens: basically I add tokens for each rule together to reform the input line, here is an example of a grammar:

input ::= string(A) { printf("%s", A); } // Print the result string(A) ::= string(B) part(C). { A = append(B, C); } string(A) ::= part(B). { A = B; } part(A) ::= NUMBER(B) NAME(C). { A = append(C, B); } // Rearrange the number and name part(A) ::= error(B). { A = B; } // On error keep the token anyways 

At the entrance:

 "Username 1234Joseph" 

I get the output:

 "Joseph1234" 

Since the text "Username" is played out by the parser in the rule part (A) :: = error (B), but I really want:

 "Username Joseph1234" 

as a conclusion.

If you can solve this problem in a bison or other parser generator, I would take this as an answer :)

+4
source share
2 answers

If yacc / bison, a parsing error whenever possible disables the tool in error recovery mode. He will try to reset the tokens on his way to a "clean" state.

I cannot find the link for the lemon, so I cannot show any lemon code to fix this, but with yacc / bison, the rules could be used.

Namely, you need to adjust your error rule to indicate that the parser is ok with yyerrok so that it does not remove tokens. Then it will try to re-read the β€œbad” token, so you need to clear it with yyclearin . Finally, since the rule attached to your error code contains the contents of your token, you will need to configure a function that will adjust your input stack by taking the current contents of the token and creating a new (correct) token with the same contents.

As an example, if a grammar defined as MyOther MyOther saw MyTok MyOther:

 stack MyTok: "the text" MyOther: "new text" stack MyOther: "the text" MyOther: "new text" 

To do this, learn yybackup . I cannot find an alternative method, although yybackup disapproving.

+2
source

It is old, but why not ...

Grammar should include spaces. Currently, the grammar only allows a sequence of NUMBER NAME tokens (no spaces between tokens).

+2
source

Source: https://habr.com/ru/post/1316226/


All Articles