Ambiguity resolution when creating a C ++ parser

Question

Ambiguity resolution when creating a C ++ parser

I wrote a LALR (1) parser for C ++ 17. I found 156 ambiguity, some of which I can resolve according to the standard, but others that I cannot.

For example: Shift-Decrease conflict occurs when parsing "operator + <......" when there is less :

We can analyze it as:

(one)

template-id → operator-function-id <......>

or

(2)

unqualified-id → operator-function-id where (1) needs to be moved, but (2) needs to be reduced.

However, the standard has:

After a search by name (3.4) discovers that the name is the name of the template or that the operator-function-id or identifier literaloperator refers to a set of overloaded functions, any member of which is a function template, if it then <, <is always considered as a delimiter of the list of argument templates and never works as an operator less. When parsing a list of argument templates, the first non-nested> 137 is taken as a trailing delimiter, not a larger operator.

So, we choose a shift.

Unfortunately, there are many ambiguities, I can not find a resolution. Here I list some of them (some of them can clearly make a choice, but I just can’t find evidence):

Is there any part in the standard that indicates that “bias” is the default choice when ambiguity arises?

descriptor

(1) when the noptr declarator parses and the left-paren occurs, I have to reduce it according to:

ptr-declarator → noptr-declarator

or move the left guy to satisfy:

declarator → noptr-declarator parameters and qualifiers

parameters and qualifiers → left-paren parameter-declaration-position right-paren ......

(2) when the identifier identifier is parsed and the left bracket appears, I have to reduce it according to:

noptr-declarator → declarator-id noptr-declarator → noptr-declarator \ left-bracket? constant expression \ right-bracket? attribute-specifier-seq

or slide the left square to satisfy:

noptr-declarator → declarator-id attribute-specifier-seq

(attribute-specifier-seq is [[.......]])

+8

c ++ parsing lalr

ChungkingExpress Feb 24 '16 at 9:36

source share

2 answers

Ira Baxter · Answer 1 · 2016-02-24T10:14:17+0000

Follow TonyD's comment: see Why C ++ cannot be parsed by the LR (1) parser?

In some places, you essentially have to preserve the ambiguity that arises in the analysis and resolve it by performing name resolution, or, equivalently, you need to link the name resolution in the parsing process. In any case, you must interpret the standard to determine how the ambiguities should be resolved, and yes, this is a very difficult task.

Then you will find out what compilers do; both GCC and MS have many extensions and variations from the standard, both in terms of syntax and with semantic interpretation (they create programs that give different results in different compilers). Finally, you can find what abominations are in the system header files; these are hacks that are added by the compiler to make their life convenient and very poorly documented, if at all.

erip · Answer 2 · 2016-02-24T12:13:19+0000

C ++ - Turing Complete for parsing .

Very relevant post here .

Ambiguity resolution when creating a C ++ parser

More articles: