Our C Front End can analyze code containing preprocessor elements, can do it enough, and still create usable ASTs. (Yes, the parse tree has accurate file / row / column information).
There are a number of limitations that allow most code to be processed. In those few cases, it cannot handle it; often a small, easy change to the source file giving equivalent code solves the problem.
Here is an approximate set of rules and restrictions:
- #includes and #defines can occur wherever there may be a declaration or statement, but not in the middle of an instruction. They rarely cause problems.
- macro calls can occur when function calls occur in expressions or can appear without a semicolon instead of statements. Macro calls that span non-well-formed chunks are not handled well (was anyone surprised?). The latter occur sometimes, but not infrequently, and require manual revision. The OP example "j (v, oid) *" is problematic, but it is really rare in code.
- #if ... #endif should be wrapped around the basic concepts of the language (nonterminals) (constant, expression, operator, declaration, function) or sequences of such entities, or around certain incorrectly formed, but usually encountered idioms, such as if ( exp) { . Each symbol arm must contain the same syntax as the other arms. #if wrapped around random text, used as a bad comment, is problematic, but easily captured in the source, making a real comment. If these conditions are not met, you need to change the source code, often moving #if #elsif #else #end several tokens.
In our experience, you can revise a code base of 50,000 lines in a few hours to get around these problems. Although this seems annoying (and it does), the alternative is to not understand the source code at all, which is much worse than annoying.
You also want more than just a parser. See Life After Parsing for what happens after you manage to get the parsing tree. We performed additional work on creating symbol tables in which declarations are written in the context of the preprocessor in which they are embedded, which allows us to check the type that includes the preprocessor conditions.
source share