Context setting
The syntax analysis (to check whether the input text matches the given grammar) consists of two stages:
- tokenization, which is performed using tools such as lex or flex, with the yylex () interface and
- parsing the token stream generated in step 1 (in accordance with the user-defined grammar), which is performed using tools such as bison / yacc with the yyparse () interface).
During phase 1 , when using the input stream, each call to yylex () identifies the token (string char), and yytext points to the first character of this line. For example: With input stream "int x = 10;" and with lex rules for tokenization corresponding to the C language, then the first 5 yylex () calls will identify the next 5 tokens "int", "x", "=", "10", ";", and every time yytext will point to the first char of the return token.
Phase 2 The parser (which you mentioned as yacc) is a program that calls this yylex function each time to get a token and uses these tokens to see if it complies with the grammar rules. These yylex calls return tokens as some integer codes. For example, in the previous example, the first 5 calls of yylex () can return the following integers to the parser: TYPE, ID, EQ_OPERATOR, and INTEGER (the actual integer values โโof which are defined in some header file).
Now all parsers can see these whole codes, which can sometimes be useful. For example, in a run-time example, you might need to associate TYPE with int, ID with some character table pointer, and INTEGER with decimal 10. To facilitate this, each token returned by yylex is associated with another VALUE, whose default type is int, but for of this you can have custom types. In lex, this VALUE is available as yylval.
For example, again, as in the example execution, yylex may have the following rule for identification 10
[0-9]+ { yylval.intval = atoi(yytext); return INTEGER; }
and follow to determine x
[a-zA-Z][a-zA-Z0-9]* {yylval.sym_tab_ptr = SYM_TABLE(yytext); return ID;}
Note that here I defined the type VALUE (or yylval) as a union containing int (intval) and int * pointer (sym_tab_ptr).
But in the yacc world, this VALUE is identified / available as $ n. For example, consider the following yacc rule to determine a specific assignment statement.
TYPE ID '=' VAL: { //In this action part of the yacc rule, use $2 to get the symbol table pointer associated with ID, use $4 to get decimal 10.}
Answering your question
If you want to access the yytext value of a specific token (which is associated with the lex world) in the yacc world, use this old friend VALUE, as shown below:
- Complement the join type VALUE to add another field: char * lex_token_str
- In the rule lex do yylval.lex_token_str = strdup (yytext)
- Then in the world of yacc access to it using the corresponding $ n.
- If you want to access another token value (for example, for the identifier identifier of the token lex, the parser may want to access both the name and the pointer of the symbol table), and then increase the type of union VALUE with the structure member containing char * (for name) and int * (for the symtab pointer).