Reversing the order of sub-rulers inside a rule in boost :: spirit grammar leads to segfault

Question

Reversing the order of sub-rulers inside a rule in boost :: spirit grammar leads to segfault

A warning; while I tried to keep the code to a minimum. I still had to turn on quite a bit to ensure that I had the necessary information.

This code compiles files and launches, resulting in a syntax error;

name = simple_name [ qi::_val = qi::_1 ] | qualified_name [ qi::_val = qi::_1 ] ;

So far it is;

 name = qualified_name [ qi::_val = qi::_1 ] | simple_name [ qi::_val = qi::_1 ] ;

SIGSEGV results, segmentation error

 boost::detail::function::function_obj_invoker4<boost::spirit::qi::detail::parser_binder<boost::spirit::qi::alternative<boost::fusion::cons<boost::spirit::qi::action<boost::spirit::qi::reference<boost::spirit::qi::rule<boost::spirit::lex::lexertl::iterator<boost::spirit::lex::lexertl::functor<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<char*, std::string>, boost::mpl::vector<std::string, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, mpl_::bool_<false>, unsigned long>, boost::spirit::lex::lexertl::detail::data, __gnu_cxx::__normal_iterator<char*, std::string>, mpl_::bool_<true>, mpl_::bool_<false> > >, Ast::name* (), boost::spirit::unused_type, boost::spirit::unused_type, boost::spirit::unused_type> const>, boost::phoenix::actor<boost::proto::exprns_::expr<boost::proto::tagns_::tag::assign, boost::proto::argsns_::list2<boost::proto::exprns_::expr<boost::proto::tagns_::tag::terminal, boost::proto::argsns_::term<boost::spirit::attribute<0> >,0l>,boost::phoenix::actor<boost::spirit::argument<0> > >, 2l> > >,boost::fusion::cons<boost::spirit::qi::action<boost::spirit::qi::reference<boost::spirit::qi::rule<boost::spirit::lex::lexertl::iterator<boost::spirit::lex::lexertl::functor<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<char*, std::string>,boost::mpl::vector<std::string, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, mpl_::bool_<false>,unsigned long>, boost::spirit::lex::lexertl::detail::data, __gnu_cxx::__normal_iterator<char*,std::string>, mpl_::bool_<true>, mpl_::bool_<false> > >, Ast::name* (), ... more to come ...

Where

 simple_name = (tok.identifier) [ qi::_val = build_simple_name_(qi::_1) ];

and

 qualified_name = (name >> qi::raw_token(DOT) >> tok.identifier) [ qi::_val = build_qualified_name_(qi::_1, qi::_2) ] ;

All these rules return Ast::name*() ;

 qi::rule<Iterator, Ast::name*()> name; qi::rule<Iterator, Ast::name*()> simple_name; qi::rule<Iterator, Ast::name*()> qualified_name;

Auxiliary functions are defined as:

 Ast::name* build_simple_name(std::string str) { return (new Ast::name_simple(Ast::identifier(str))); } BOOST_PHOENIX_ADAPT_FUNCTION(Ast::name*, build_simple_name_, build_simple_name, 1)

and

 Ast::name* build_qualified_name(Ast::name* name, std::string str) { std::list<Ast::identifier> qualified_name = Ast::name_to_identifier_list(name); qualified_name.push_back(Ast::identifier(str)); return (new Ast::name_qualified(qualified_name)); } BOOST_PHOENIX_ADAPT_FUNCTION(Ast::name*, build_qualified_name_, build_qualified_name, 2)

The lexer definitions used are defined as:

 lex::token_def<std::string> identifier = "{JAVA_LETTER}{JAVA_LETTER_OR_DIGIT}*";

and

 ('.', DOT)

If the templates {JAVA_LETTER} and {JAVA_LETTER_OR_DIGIT} defined as:

 ("DIGIT", "[0-9]") ("LATIN1_LETTER", "[AZ]|[az]") ("JAVA_LETTER", "{LATIN1_LETTER}|$|_") ("JAVA_LETTER_OR_DIGIT", "{JAVA_LETTER}|{DIGIT}")

My input is a simple string;

 package aD;

What are the lexes for tokens;

 Keywords : package Identifier : a Delimiters : . Identifier : D Delimiters : ;

Where the first example (with the first name simple_name) generates a syntax error like:

 Syntax Error at line 1: package aD; ^^

And the last example just returns a segfault with a previously sent error.

Clearly, the second example is what I want, as it should try to match a complex expression before a simple one.

Does anyone see why the code crashes, or how will I figure it out? - Should it also be in the code review?

+2

segmentation-fault boost boost-spirit boost-spirit-qi boost-spirit-lex

Skeen Sep 04 '13 at 11:00

source share

1 answer

llonesmiz · Accepted Answer · 2013-09-04T12:16:05+0000

The problem is that you have a left recursive grammar and that cannot be used with Boost.Spirit. You have basically:

 name = identifier | name >> dot >> identifier;

As you can see here , to remove left recursion when you have something like:

 A = A >> alpha | beta;

You need to create 2 new "rules":

 A = beta >> A_tail; A_tail = eps | alpha >> A_tail;

In your case:

 A := name alpha := dot >> identifier beta := identifier

So, your "rules":

 name = identifier >> name_tail; name_tail = eps | dot >> identifier >> A_tail;

If you look closely at name_tail , you will see that it literally means: either nothing, or dot >> identifier , followed by either nothing, or dot >> identifier , etc. This means name_tail :

 name_tail = *(dot >> identifier);

So your rule is name :

 name = identifier >> *(dot >> identifier);

All this is correct, but there is a very good chance that it will not work with your attributes.

Reversing the order of sub-rulers inside a rule in boost :: spirit grammar leads to segfault

More articles: