The difference between left / right recursive, output left / right, priority, associativity, etc.

I am currently studying language processors, and a topic that arises very often is the direction in which grammar elements are consumed. Left to right or right to left. I understand the concept, but there seem to be so many ways to write these rules, and I'm not sure if they are all the same. What I have seen so far:

Vertical / left recursion, Right / left output, Decrease right / left, priority, associativity, etc.

Does all this mean the same thing?

+7
compiler-construction parsing grammar
source share
2 answers

No, they all have different meanings.

Vertical and left recursion refer to recursion in production rules. A product for a nonterminal recursive, if it can output a sequence containing this nonterminal; it is left-recursive if the nonterminal can appear at the beginning (left edge) of the derived sequence and it is right-recursive if it can appear at the end (right edge). Production can be recursive without being left or right recursive, and it can be either left or right recursive.

For example:

term: term '*' factor { /* left-recursive */ } assignment: lval '=' assignment { /* right-recursive */ } 

The above examples are direct recursion; nonterminal directly outputs a sequence containing nonterminal. Recursion can also be indirect; it's still recursion.

All common parsing algorithms are processed from left to right, which is the first L in LL and LR. Top-down parsing (LL) finds the leftmost output (second L), while bottom-up parsing (LR) finds the rightmost output (R).

Effectively, both types of analyzer start with one nonterminal (start character) and “guess” the output based on some nonterminal in the current sequence until the input text is received. In the leftmost derivation, it is always the leftmost nonterminal that expands. In the very right derivation, it is always the most right non-terminal.

Thus, the analyzer from top to bottom always guesses which production to use for the first non-terminal, after which it needs to work again on what is now the first non-terminal. (“Guess” here is informal, he can look at the input that needs to be matched - or at least the next k input tokens - to determine which production to use.) This is called top-down processing because it builds a top-down parsing tree .

It’s easier (at least for me) to visualize the action of the analyzer from the bottom up in the reverse order; he builds a parse tree from bottom to top, repeating enough of the whole input to find some kind of production that will be the last derivation in the derivation chain. Therefore, it produces the rightmost conclusion, but it brings it back to the fore.

In an LR grammar for an operator language (roughly speaking, a grammar for languages ​​that look like arithmetic expressions), left and right associativity are modeled using left and right recursive grammar rules, respectively. "Associativity" is an unofficial description of grammar, as well as "priority."

The priority is modeled using a series of grammar rules, each of which relates to the following rule (and usually this leads to recursive production for processing parentheses - '(' expr ')' , which is neither left, recursive).

There is an older bottom-up style of analysis called “parsing operator priorities” in which priority is clearly part of the language description. One common operator precedence algorithm is the so-called Shunting Yard algorithm. But if you have a LALR (1) parser generator, like a bison, you can use it instead because it is more general and more accurate.

+8
source share

(I am NOT an expert in parser and compiler theory. I happen to learn something related. And I would like to share what I have found so far.)

I highly recommend taking a look at this amazing article .

It explains and illustrates the LL and LR algorithm. You can clearly see why LL is called top to bottom and LR is called bottom to top.

Some quote:

The main difference between the operation of LL and LR parsers is that the LL parser outputs a preliminary walk of the parsing tree and the LR parser outputs a subsequent walk.

...

We converge on a very simple model of how LL and LR parsers work. Both read the input token stream and output the same token stream, inserting the rules into the appropriate places to achieve pre-order (LL) or post-order (LR) traversal of the parsing tree.

...

When you see type designations LL (1), LR (0), etc., the number in brackets indicates the number of tokens.

As for the acronyms: ( source )

The first L in LR and LL means: that the parser reads text input in one direction without backup ; this direction is usually from left to right in each line and from top to bottom along the lines the complete input file.

The remaining R and L mean: the most right and left output, respectively.

These are two different strong analysis strategies. . The parsing strategy defines the following nonterminal for rewriting. ( source )

  • For the leftmost output, it is always the leftmost non-terminal.
  • For the most correct conclusion, it is always the most faithful non-terminal.
0
source share

All Articles