What is compilation lexical and parsing in a C compiler?

What is lexical and parsing during the compilation process. Is pre-processing done after lexical and parsing?

+4
source share
5 answers

Consider this code:

int a = 10; if (a < 4) { printf("%d", a); } 

At the stage of Lexical analysis : you define each word / token and assign a value to it. In the above code, you start by defining i , followed by n , followed by t , and then the space is the word int and that is the language keyword: 1 , followed by 0 , and the space is the number 10 , etc. .

At the Parsing stage: you check whether the code should follow the language syntax (grammar rules). For example, you check if there is only one variable in the operator's LHS (taking into account the C language) that each statement ends ; what if conditional / boolean expression etc.

Like others, pre-processing typically occurs before lexical analysis or parsing.

+12
source

Lexical analysis occurs BEFORE parsing. This is logical, because when you need to call a macro, you must first determine the boundaries of the identifier. This is done using lexical analysis. After that, the parsing will hit. Note that compilers usually do not generate a complete pre-processed source before starting parsing. They read a source that selects one token at a time, preprocess, if necessary, and feed the result to parsing.

In one case, lexical analysis occurs twice. This is paste buffering. Take a look at the code:

 #define En(x) Abcd ## x ## x enum En(5) { a, b = 20, c, d }; 

This code defines an enumeration named Abcd55 . When ## processed during macro expansion, data is placed in the internal buffer. After that, this buffer is scanned in the same way as the small #include. During scanning, the compiler breaks the contents of the buffer into tokens. It may happen that the boundaries of the scanned tokens do not correspond to the boundaries of the original tokens that were buffered. In the example above, 3 tokens are placed in the buffer, but only one is retrieved.

+2
source

Pre-processing takes place before iirc lexical analysis. Comments are filtered out, #define, ... and after that the compiler generates tokens with a scanner / lexer (lexical analysis). After that, compilers generate parsets that are intended for parsing.

+1
source

There are exceptions, but this usually happens as follows:

  • Preprocess - convert program text to program text
  • Lexical analysis - convert program text to "tokens", which are essentially small integers with attached attributes
  • Parsing - Converting program text to abstract syntax

The definition of "abstract syntax" may vary. In single-pass compilers, abstract syntax is tartget code. But theses usually usually represent a tree or DAG, which logically represent the structure of the program.

0
source

When we talk about the C programming language, we should note that there is an ISO standard (ANSI) for the language. Here is the latest public project C99 (ISO / IEC 9899: 1999): www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf

There is a section called β€œ5.1.1.2 Translation Phases” that tells how to analyze program C. There are stages:

... a few steps to handle multiple bytes, trigraphs, and backslashes ...

3). The source file is split into preprocessing tokens and sequences of space characters (including comments).

This is a lexical analysis for preprocessing. Only preprocessor directives, punctuation marks, string constants, identifiers, comments are lexified in it.

4). Preprocessing directives are executed, macro commands are expanded

This is preprocessing. This phase will also include files with #include , and then remove preprocessing directives (for example, #define or #ifdef and others)

... handling string literals ...

7). Symbols of white space separating tokens are no longer significant. each pre-processing token is converted to a token. The resulting tokens are syntactically and semantically parsed and translated as a translation unit.

Converting to token means detecting keywords and detecting constants. This is the final lexical analysis step; syntactic and semantic analysis.

So your question is:

Is pre-processing done after lexical and parsing?

To perform pre-processing, some lexical analysis is required, therefore the order is: lexical_for_preprocessor, preprocessing, true_lexical, other_analysis.

PS: Real C compiler can be organized a little differently, but it should behave in the same way as in the standard one.

0
source

All Articles