I am working on a simple SQL-like query parser, and I need to be able to capture subqueries that can occur literally in certain places. I found lexer states to be the best solution and were able to do POC using curly braces to mark the beginning and end. However, the subqueries will be separated by brackets, not italics, and brackets can occur in other places, so I canβt be a state with every guy open. This information is easily accessible with the help of a parser, so I was hoping to call the beginning and end in the appropriate places in the parser rules. This, however, did not work, because lexer seems to tokenize the stream immediately, and therefore tokens are generated in the INITIAL state. Is there a workaround for this problem? Here is a diagram of what I was trying to do:
def p_value_subquery(p): """ value : start_sub end_sub """ p[0] = "( " + p[1] + " )" def p_start_sub(p): """ start_sub : OPAR """ start_subquery(p.lexer) p[0] = p[1] def p_end_sub(p): """ end_sub : CPAR """ subquery = end_subquery(p.lexer) p[0] = subquery
The start_subquery () and end_subquery () parameters are defined as follows:
def start_subquery(lexer): lexer.code_start = lexer.lexpos
Lexer-lexers just discover a private guy:
@lex.TOKEN(r"\(") def t_subquery_SUBQST(t): lexer.level += 1 @lex.TOKEN(r"\)") def t_subquery_SUBQEN(t): lexer.level -= 1 @lex.TOKEN(r".") def t_subquery_anychar(t): pass
I would be grateful for any help.
haridsv
source share