How to use yylval with strings in yacc

I want to pass the actual token string. If I have a token named ID, then I want my yacc file to really know which identifier is being called. I have to pass the string using yylval to the yacc file from the flex file. How to do it?

+4
source share
3 answers

See the Flex section of the YACC Interoperability Guide .

15 Interaction with Yacc

One of the main uses of flex is the companion for yacc parser generator. yatch parsers are waiting to call a routine called yylex () to find the next input token. The routine should return the type next token, as well as the associated value in the global yylval. To use flex with yacc, you specify the -d option for yacc to instruct it to generate the y.tab.h file containing the definitions of all% of the tokens in the yacc input. This file is then included in the flex scanner. For example, if one of the tokens is TOK_NUMBER, part of the scanner may look like this:

%{ #include "y.tab.h" %} %% [0-9]+ yylval = atoi( yytext ); return TOK_NUMBER; 
+4
source

The key to returning a string or any complex type via yylval is the YYSTYPE union created by yacc in the y.tab.h file. YYSTYPE is a member association for each type of token defined in the yacc source file. For example, to return the string associated with the SYMBOL token in the yacc source file, you declare this union YYSTYPE using % union in the yacc source file:

 /*** Yacc YYSTYPE Union ***/ /* The yacc parser maintains a stack (array) of token values while it is parsing. This union defines all the possible values tokens may have. Yacc creates a typedef of YYSTYPE for this union. All token types (see %type declarations below) are taken from the field names of this union. The global variable yylval which lex uses to return token values is declared as a YYSTYPE union. */ %union { long int4; /* Constant integer value */ float fp; /* Constant floating point value */ char *str; /* Ptr to constant string (strings are malloc'd) */ exprT expr; /* Expression - constant or address */ operatorT *operatorP; /* Pointer to run-time expression operator */ }; %type <str> SYMBOL 

Then in the source LEX file there is a template corresponding to the SYMBOL token. Responsibility for the code associated with this rule returns a real string representing SYMBOL. You cannot just pass a pointer to a yytext buffer, because it is a static buffer that is reused for every marker that matches. To return consistent text, the yytext static buffer must be replicated on the heap using _strdup () and a pointer to this line passed through yyval.str. Just then, the yacc rule coincides with the responsibility of the SYMBOL token to free the selected heap line when it is executed with it.

 [A-Za-z_][A-Za-z0-9_]* {{ int i; /* * condition letter followed by zero or more letters * digits or underscores * Convert matched text to uppercase * Search keyword table * if found * return <keyword> * endif * * set lexical value string to matched text * return <SYMBOL> */ /*** KEYWORDS and SYMBOLS ***/ /* Here we match a keywords or SYMBOL as a letter * followed by zero or more letters, digits or * underscores. */ /* Convert the matched input text to uppercase */ _strupr(yytext); /* Convert to uppercase */ /* First we search the keyword table */ for (i = 0; i<NITEMS(keytable); i++) { if (strcmp(keytable[i].name, yytext)==0) return (keytable[i].token); } /* Return a SYMBOL since we did not match a keyword */ yylval.str=_strdup(yytext); return (SYMBOL); }} 
+12
source

Context setting

The syntax analysis (to check whether the input text matches the given grammar) consists of two stages:

  • tokenization, which is performed using tools such as lex or flex, with the yylex () interface and
  • parsing the token stream generated in step 1 (in accordance with the user-defined grammar), which is performed using tools such as bison / yacc with the yyparse () interface).

During phase 1 , when using the input stream, each call to yylex () identifies the token (string char), and yytext points to the first character of this line. For example: With input stream "int x = 10;" and with lex rules for tokenization corresponding to the C language, then the first 5 yylex () calls will identify the next 5 tokens "int", "x", "=", "10", ";", and every time yytext will point to the first char of the return token.

Phase 2 The parser (which you mentioned as yacc) is a program that calls this yylex function each time to get a token and uses these tokens to see if it complies with the grammar rules. These yylex calls return tokens as some integer codes. For example, in the previous example, the first 5 calls of yylex () can return the following integers to the parser: TYPE, ID, EQ_OPERATOR, and INTEGER (the actual integer values โ€‹โ€‹of which are defined in some header file).

Now all parsers can see these whole codes, which can sometimes be useful. For example, in a run-time example, you might need to associate TYPE with int, ID with some character table pointer, and INTEGER with decimal 10. To facilitate this, each token returned by yylex is associated with another VALUE, whose default type is int, but for of this you can have custom types. In lex, this VALUE is available as yylval.

For example, again, as in the example execution, yylex may have the following rule for identification 10

 [0-9]+ { yylval.intval = atoi(yytext); return INTEGER; } 

and follow to determine x

 [a-zA-Z][a-zA-Z0-9]* {yylval.sym_tab_ptr = SYM_TABLE(yytext); return ID;} 

Note that here I defined the type VALUE (or yylval) as a union containing int (intval) and int * pointer (sym_tab_ptr).

But in the yacc world, this VALUE is identified / available as $ n. For example, consider the following yacc rule to determine a specific assignment statement.

 TYPE ID '=' VAL: { //In this action part of the yacc rule, use $2 to get the symbol table pointer associated with ID, use $4 to get decimal 10.} 

Answering your question

If you want to access the yytext value of a specific token (which is associated with the lex world) in the yacc world, use this old friend VALUE, as shown below:

  • Complement the join type VALUE to add another field: char * lex_token_str
  • In the rule lex do yylval.lex_token_str = strdup (yytext)
  • Then in the world of yacc access to it using the corresponding $ n.
  • If you want to access another token value (for example, for the identifier identifier of the token lex, the parser may want to access both the name and the pointer of the symbol table), and then increase the type of union VALUE with the structure member containing char * (for name) and int * (for the symtab pointer).
+1
source

All Articles