Implement "cut" in the recursive descent parser

I am implementing the PEG parser generator in Python, and so far I have been successful except with the "cut" function that the prologue should know about.

The idea is that after parsing the cut ( ! ) Character, no alternatives should be taken at the same level.

 expre = '(' ! list ')' | atom. 

Means that after ( it is clear that the parsing should fail or not execute the second option.

I use the Python exception system (very efficient) to force the return, so I tried to have a special FailedCut exception that would break the closing selection, but that didn't work.

Any pointers to how this functionality is implemented in other parser generators are useful.

Perhaps the problem I encountered is the lack of locality. The code generated for the left side of the rule will look something like this:

 cut_seen = False try: self.token('(') cut_seen = True self.call('list') self.token(')') except FailedParse as e: if cut_seen: raise FailedCut(e) raise 

Then, the code generated for the select statement ( | ) will skip the following options if it catches FailedCut . What I mean due to the lack of locality is that the choice that caught FailedCut can be deep in calls, so it’s too difficult to distinguish between the effect.

Instead of generating the code generated for the sequences, try to report on the environment of the selection of cuts, I could make the code generated for the selection, beware of them. This would make the volume of cuts very local, unlike Prolog, but good enough for what I want in the PEG parser, which should fix the parameter after a certain sequence of tokens is visible, so the error report refers to this location in the source , and not to another place where another option may be available.

It just occurred to me that if the code generated for the rule / predicate catches FailedCut and translates it into the usual FailedParse exception, then the cuts will have the correct scope.

In relation to the @false question, here is a complete example of what I want to work:

 start = expre ; expre = named | term ; named = word ':' ! term; term = word ; 

In this grammar, word can be reached through named or term , but I would like the parser to pass the named branch after I saw :

Decision

In fairness, I have published my work so far at https://bitbucket.org/apalala/grako/ .

The final decision sequences are enclosed in this context manager:

 @contextmanager def _sequence(self): self._push_cut() try: yield except FailedParse as e: if self._cut(): self.error(e, FailedCut) else: raise finally: self._pop_cut() 

And the options in the selection function are attached to this:

 @contextmanager def _option(self): p = self._pos try: self._push_ast() try: yield ast = self.ast finally: self._pop_ast() self.ast.update(ast) except FailedCut as e: self._goto(p) raise e.nested except FailedParse: self._goto(p) 

It forces you to exit the selection instead of returning to try the next option.

Cutting yourself is implemented in this way:

 def _cut(self): self._cut_stack[-1] = True 

Full source code can be found on Bitbucket .

+7
source share
3 answers

The solution suggested at the end of my question worked:

 cut_seen = False try: self.token('(') cut_seen = True self.call('list') self.token(')') except FailedParse as e: if cut_seen: raise FailedCut(e) raise 

Then, whenever a choice or optional is computed, the code looks like this:

 p = self.pos try: # code for the expression except FailedCut: raise except FailedParse: self.goto(p) 

Edit

The actual solution required the preservation of the “cut out stack”. The source code is int Bitbucket.

+2
source

In Prolog with ISO exception handling Prolog ( catch/3 and throw/1 ), a cut can be implemented as:

 cut. % Simply succeeds cut :- throw(cut). % on backtracking throws an exception 

To do this, you need to catch this exception in the appropriate places. For example, each target (i.e. nonterminal) of a user predicate can now be wrapped:

 catchcut(Goal) :- catch(Goal,cut,fail). 

This is not the most efficient way to implement the reduction, because it does not free up resources upon successful completion ! but that may be enough for your purposes. In addition, this method can now interfere with custom use of catch/3 . But you probably don't want to emulate the entire Prolog language anyway.

Also, consider using Prolog -grammars directly. There is a lot of small print that does not show up when implemented in another language.

+3
source

Just read it.

I suggested a deep cut_seen (for example, with a parser state change) and a save and restore state with local variables. This uses a thread stack as a " cut_seen stack".

But you have a different solution, and I'm sure that you are already in order.

BTW: a good compiler is just the opposite of what I do with pyPEG so that I can learn a lot; -)

+1
source

All Articles