FIRST and FOLLOW What do they use for parsing?

What are FIRST and FOLLOW, what are they used for parsing? Are they used for parsers from top to bottom or from bottom to top?

Can someone explain to me FIRST and FOLLOW SETS for the following set of grammar rules:

> E := E+T | T > > T := T*V | T > > V := <id> 
+4
source share
3 answers

They are typically used in LL parsers (from top to bottom) to check if a working parser has come across in any situation where there is more than one way to continue parsing.

If you have an alternative A | B A | B , and also there is FIRST(A) = {"a"} and FIRST(B) = {"b", "a"} , then you will have a FIRST / FIRST conflict, because when "a" appears at the input , you don’t know whether to extend A or B. (Assuming lookahead is 1).

On the other hand, if you have a Nonterminal that can be null, for example AOpt: ("a")? then you need to make sure that FOLLOW(AOpt) does not contain "a", because otherwise you would not know whether to extend AOpt or not like it here: S: AOpt "a" Either S or AOpt can consume "a" that gives us the FIRST / FOLLOW conflict.

FIRST sets can also be used during the parsing process for performance reasons. If you have a NULLNNN with a null value, you can expand it to find out if it can consume anything, or it can be faster to check if the FIRST(NullableNt) contains the next token, and if you don't just ignore it (reverse tracing against predictive parsing). Another performance improvement would be to provide an additional lexical scanner with the current FIRST set, so the scanner will not use all possible terminals, but only those that are currently permitted by the context. This conflicts with reserved terminals, but this is not always necessary.

There are different types of conflicts in paired analyzers, namely: reduction / decrease and shift / decrease. They also use elements to detect conflicts, not FIRST, FOLLOW.

Your grammar will not work with LL parsers because it contains left recursion. But the FIRST for E, T, and V will be {id} (assuming your T := T*V | T means T := T*V | V ).

+2
source

Answer:

E> E + T | T

left recursion

E-> TE

E '-> + TE' | eipsilon

T> T * Y | T

left recursion

T> VT

T '-> * VT' | epsilon

no recursion on the left in

In → (identifier)

Therefore, the grammar:

E-> TE

E '-> + TE' | epsilon

T> VT

T '-> * VT' | epsilon

V-> (id)

FIRST (E) = {(}

FIRST (E ') = {+, epsilon}

FIRST (T) = {(}

FIRST (T ') = {*, epsilon}

FIRST (B) = {(}

Start character = FOLLOW (E) = {$}

E-> TE 'E' -> TE '| epsilon: FOLLOW (E ') = FOLLOW (E) = {$}

E-> TE 'E' -> + TE '| epsilon: FOLLOW (T) = FIRST (E ') = {+ $}

T> VT 'T' -> * VT '| epsilon: FOLLOW (T ') = FOLLOW (T) = {+, $}

T-> VT ', T → * VT' | epsilon: FOLLOW (V) = FIRST (T) = {*, epsilon}

Rules for the first sets

 If X is a terminal then First(X) is just X! If there is a Production X → ε then add ε to first(X) If there is a Production X → Y1Y2..Yk then add first(Y1Y2..Yk) to first(X) First(Y1Y2..Yk) is either First(Y1) (if First(Y1) doesn't contain ε) OR (if First(Y1) does contain ε) then First (Y1Y2..Yk) is everything in First(Y1) except for ε as well as everything in First(Y2..Yk) If First(Y1) First(Y2)..First(Yk) all contain ε then add ε to First(Y1Y2..Yk) as well. 

Rules for subsequent sets

 First put $ (the end of input marker) in Follow(S) (S is the start symbol) If there is a production A → aBb, (where a can be a whole string) then everything in FIRST(b) except for ε is placed in FOLLOW(B). If there is a production A → aB, then everything in FOLLOW(A) is in FOLLOW(B) If there is a production A → aBb, where FIRST(b) contains ε, then everything in FOLLOW(A) is in FOLLOW(B) 
+1
source

Wikipedia is your friend. See the discussion of LL parsers and first / next sets.

They are mainly used as the basis for the construction of the parser, for example, as part of parser generators. You can also use them to talk about the properties of grammars, but most people don't need to.

0
source

All Articles