Revised issue
The value in the Yacc stack is controlled by YYSTYPE or %union . Use YYSTYPE when type information is simple; use %union when it is complicated.
One of my grammars contains:
struct Token { int toktype; char *start; char *end; }; typedef struct Token Token; #define YYSTYPE Token
For various reasons (not necessarily good ones), my grammar uses a manual lexical analyzer instead of Lex.
In grammar rules, you refer to elements such as NAME in your example, like $1 (where the actual number depends on where the token appears in the list of tokens or terminals that make up the rule).
For example (the same grammar):
disconnect : K_DISCONNECT K_CURRENT { conn->ctype = CONN_CURRENT; } | K_DISCONNECT K_ALL { conn->ctype = CONN_ALL; } | K_DISCONNECT K_DEFAULT { conn->ctype = CONN_DEFAULT; } | K_DISCONNECT string { conn->ctype = CONN_STRING; set_connection(conn, $2.start, $2.end); } ;
and
load : K_LOAD K_FROM opt_file_pipe string load_opt_list K_INSERT { set_string("load file", load->file, sizeof(load->file), $4.start, $4.end); load->stmt = $6.start; } ;
I donβt know if yylex() hand-made outline helps; in grammar, this is a function in the same file as yyparse() .
static const char *c_token; static int yylex(void) { char buffer[MAX_LEXTOKENLENGTH]; const char *start; if (c_token == 0) abort(); if (bare_filename_ok) start = scan_for_filename(c_token, &c_token); else start = sqltoken(c_token, &c_token); yylval.start = CONST_CAST(char *, start); yylval.end = CONST_CAST(char *, c_token); if (*start == '\0') { yylval.toktype = 0; return yylval.toktype; } set_token(buffer, sizeof(buffer), start, c_token); #ifdef YYDEBUG if (YYDEBUGVAR > 1) printf("yylex(): token = %s\n", buffer); #endif if (isalpha((unsigned char)buffer[0]) || buffer[0] == '_') { Keyword kw; Keyword *p; kw.keyword = buffer; p = (Keyword *)bsearch(&kw, keylist, DIM(keylist), sizeof(Keyword), kw_compare); if (p == 0) yylval.toktype = S_IDENTIFIER; else yylval.toktype = p->token; } else if (buffer[0] == '\'') { yylval.toktype = S_SQSTRING; } else if (buffer[0] == '"') { yylval.toktype = S_DQSTRING; } else if (isdigit((unsigned char)buffer[0])) { yylval.toktype = S_NUMBER; } else if (buffer[0] == '.' && isdigit((unsigned char)buffer[1])) { yylval.toktype = S_NUMBER; }
... different characters with one character are recognized ...
else if (buffer[0] == ':') { assert(buffer[1] == '\0'); yylval.toktype = C_COLON; } else { yylval.toktype = S_ERROR; } return yylval.toktype; }
Original question
Usually a variable is a global variable - your Yacc code uses one of two possible declarations:
extern char *yytext; extern char yytext[];
Which one is correct depends on how your version of Lex defines it.
If you want to add a length (possibly yytextlen ), then you can define such a variable and get each return from yylex() to set yytextlen . In addition, you can organize a wwlex() call for your grammar, and your wwlex() just:
int wwlex(void) { int rc = yylex(); yytextlen = strlen(yytext); return rc; }
Or you can arrange for Lex to generate the renamed code, and Yacc continues to call yylex() , and you provide the code above as yylex() and call its renamed Lex function. In any case, it works.