Boost Spirit and Lex analyzer problem

I am trying to (step by step) change the example code from the documentation, but not much different from what I expect. In particular, the if statement fails when (my intention is) it should be passed (there was an "else", but this part of the analyzer was removed during debugging). Assignment job works fine. I also had a while statement that had the same problem as the if statement, so I'm sure that if I can get help to understand why no one is working, it should be easy to get another. This should be a little subtle because it is almost literally what one example has.

#include <iostream> #include <fstream> #include <string> #define BOOST_SPIRIT_DEBUG #include <boost/config/warning_disable.hpp> #include <boost/spirit/include/qi.hpp> #include <boost/spirit/include/lex_lexertl.hpp> #include <boost/spirit/include/phoenix_operator.hpp> #include <boost/spirit/include/phoenix_statement.hpp> #include <boost/spirit/include/phoenix_container.hpp> namespace qi = boost::spirit::qi; namespace lex = boost::spirit::lex; inline std::string read_from_file( const char* infile ) { std::ifstream instream( infile ); if( !instream.is_open() ) { std::cerr << "Could not open file: \"" << infile << "\"" << std::endl; exit( -1 ); } instream.unsetf( std::ios::skipws ); return( std::string( std::istreambuf_iterator< char >( instream.rdbuf() ), std::istreambuf_iterator< char >() ) ); } template< typename Lexer > struct LangLexer : lex::lexer< Lexer > { LangLexer() { identifier = "[a-zA-Z][a-zA-Z0-9_]*"; number = "[-+]?(\\d*\\.)?\\d+([eE][-+]?\\d+)?"; if_ = "if"; else_ = "else"; this->self = lex::token_def<> ( '(' ) | ')' | '{' | '}' | '=' | ';'; this->self += identifier | number | if_ | else_; this->self( "WS" ) = lex::token_def<>( "[ \\t\\n]+" ); } lex::token_def<> if_, else_; lex::token_def< std::string > identifier; lex::token_def< double > number; }; template< typename Iterator, typename Lexer > struct LangGrammar : qi::grammar< Iterator, qi::in_state_skipper< Lexer > > { template< typename TokenDef > LangGrammar( const TokenDef& tok ) : LangGrammar::base_type( program ) { using boost::phoenix::val; using boost::phoenix::ref; using boost::phoenix::size; program = +block; block = '{' >> *statement >> '}'; statement = assignment | if_stmt; assignment = ( tok.identifier >> '=' >> expression >> ';' ); if_stmt = ( tok.if_ >> '(' >> expression >> ')' >> block ); expression = ( tok.identifier[ qi::_val = qi::_1 ] | tok.number[ qi::_val = qi::_1 ] ); BOOST_SPIRIT_DEBUG_NODE( program ); BOOST_SPIRIT_DEBUG_NODE( block ); BOOST_SPIRIT_DEBUG_NODE( statement ); BOOST_SPIRIT_DEBUG_NODE( assignment ); BOOST_SPIRIT_DEBUG_NODE( if_stmt ); BOOST_SPIRIT_DEBUG_NODE( expression ); } qi::rule< Iterator, qi::in_state_skipper< Lexer > > program, block, statement; qi::rule< Iterator, qi::in_state_skipper< Lexer > > assignment, if_stmt; typedef boost::variant< double, std::string > expression_type; qi::rule< Iterator, expression_type(), qi::in_state_skipper< Lexer > > expression; }; int main( int argc, char** argv ) { typedef std::string::iterator base_iterator_type; typedef lex::lexertl::token< base_iterator_type, boost::mpl::vector< double, std::string > > token_type; typedef lex::lexertl::lexer< token_type > lexer_type; typedef LangLexer< lexer_type > LangLexer; typedef LangLexer::iterator_type iterator_type; typedef LangGrammar< iterator_type, LangLexer::lexer_def > LangGrammar; LangLexer lexer; LangGrammar grammar( lexer ); std::string str( read_from_file( 1 == argc ? "boostLexTest.dat" : argv[1] ) ); base_iterator_type strBegin = str.begin(); iterator_type tokenItor = lexer.begin( strBegin, str.end() ); iterator_type tokenItorEnd = lexer.end(); std::cout << std::setfill( '*' ) << std::setw(20) << '*' << std::endl << str << std::endl << std::setfill( '*' ) << std::setw(20) << '*' << std::endl; bool result = qi::phrase_parse( tokenItor, tokenItorEnd, grammar, qi::in_state( "WS" )[ lexer.self ] ); if( result ) { std::cout << "Parsing successful" << std::endl; } else { std::cout << "Parsing error" << std::endl; } return( 0 ); } 

Here is the result of doing this (the file read in the line is primarily downloaded first)

 ******************** { a = 5; if( a ){ b = 2; } } ******************** <program> <try>{</try> <block> <try>{</try> <statement> <try></try> <assignment> <try></try> <expression> <try></try> <success>;</success> <attributes>(5)</attributes> </expression> <success></success> <attributes>()</attributes> </assignment> <success></success> <attributes>()</attributes> </statement> <statement> <try></try> <assignment> <try></try> <fail/> </assignment> <if_stmt> <try> if(</try> <fail/> </if_stmt> <fail/> </statement> <fail/> </block> <fail/> </program> Parsing error 
+6
c ++ boost boost-spirit boost-spirit-qi
source share
2 answers

The problem is the sequence in which you added the token definitions to lexer. Your code

 this->self += identifier | number | if_ | else_; 

first adds the identifier token, which will ideally match "if" (and any other keyword). If you change it to

 this->self += if_ | else_ | identifier | number; 

everything starts to work as it should.

This is nothing special for Spirit.Lex. Any tokenizer evaluates the order in which tokens are defined to determine the priority of the match.

+8
source share

Maybe you need to change the mini_c example to use lexer? This will be a more complete example of how to use these two. Most Qi samples (compared to 2.4version) do not use a lexer at all.

Although this serves to illustrate the use of Qi, we tend to try to use dedicated lexer for production projects due to convenience reasons (I can, for example, disable lexer loading for a sub-developer).

+6
source share

All Articles