How can I make this recursive rule work?

I want to parse (first of all, recognize only, keeping the characters) LaTeX math. Right now I am having problems with super and indexes, combined with curly braces (like a^{bc} and their combinations, I have a basic a^b that works fine). Minimal example (as short as humanly, while maintaining readability):

 #include <iostream> using std::cout; #include <string> using std::string; #include <boost/spirit/home/x3.hpp> namespace x3 = boost::spirit::x3; using x3::space; using x3::char_; using x3::lit; using x3::repeat; x3::rule<struct scripts, string> scripts = "super- and subscripts"; x3::rule<struct braced_thing, string> braced_thing = "thing optionaly surrounded by curly braces"; x3::rule<struct superscript, string> superscript = "superscript"; x3::rule<struct subscript, string> subscript = "subscript"; // main rule: any number of items with or without braces auto const scripts_def = *braced_thing; // second level main rule: optional braces, and any number of characters or sub/superscripts auto const braced_thing_def = -lit('{') >> *(subscript | superscript | repeat(1)[(char_ - "_^{}")]) >> -lit('}'); // superscript: things of the form a^b where a and b can be surrounded by curly braces auto const superscript_def = braced_thing >> '^' >> braced_thing; // subscript: things of the form a_b where a and b can be surrounded by curly braces auto const subscript_def = braced_thing >> '_' >> braced_thing; BOOST_SPIRIT_DEFINE(scripts) BOOST_SPIRIT_DEFINE(braced_thing) BOOST_SPIRIT_DEFINE(superscript) BOOST_SPIRIT_DEFINE(subscript) int main() { const string input = "a^{b_x y}_z {v_x}^{{x^z}_y}"; string output; // will only contain the characters as the grammar is defined above auto first = input.begin(); auto last = input.end(); const bool result = x3::phrase_parse(first, last, scripts, space, output); if(first != last) std::cout << "partial match only:\n" << output << '\n'; else if(!result) std::cout << "parse failed!\n"; else std::cout << "parsing succeeded:\n" << output << '\n'; } 

It is also available on Coliru .

The problem is that these are segfaults (I'm sure for obvious reasons), and I have no other way, well, expressing this in the grammar of an expression ...

+6
source share
1 answer

I haven't considered the @cv_and_he sentence yet, instead I debugged your grammar myself. I came up with this:

 auto token = lexeme [ +~char_("_^{} \t\r\n") ]; auto simple = '{' >> sequence >> '}' | token; auto expr = lexeme [ simple % char_("_^") ]; auto sequence_def = expr % +space; 

Which led me basically to a phased rethinking / understanding of what a real grammar looks like.

It took me two attempts to think of the correct way to get the parser "ab" (at first I "cracked" it as just another statement in char_(" _^") , but I got the impression that this would not cause AST you expect this. The point is that you used a skipper for space).

There is no AST yet, but we just "reap" the original string used with x3::raw[...] .

Live coliru

 //#define BOOST_SPIRIT_X3_DEBUG #include <iostream> #include <string> #include <boost/spirit/home/x3.hpp> namespace x3 = boost::spirit::x3; namespace grammar { using namespace x3; rule<struct _s> sequence { "sequence" }; auto simple = rule<struct _s> {"simple"} = '{' >> sequence >> '}' | lexeme [ +~char_("_^{} \t\r\n") ]; auto expr = rule<struct _e> {"expr"} = lexeme [ simple % char_("_^") ]; auto sequence_def = expr % +space; BOOST_SPIRIT_DEFINE(sequence) } int main() { for (const std::string input : { "a", "a^b", "a_b", "ab", "{a}^{b}", "{a}_{b}", "{a} {b}", "a^{b_x y}", "a^{b_x y}_z {v_x}^{{x^z}_y}" }) { std::string output; // will only contain the characters as the grammar is defined above auto first = input.begin(), last = input.end(); bool result = x3::parse(first, last, x3::raw[grammar::sequence], output); if (result) std::cout << "Parse success: '" << output << "'\n"; else std::cout << "parse failed!\n"; if (last!=first) std::cout << "remaining unparsed: '" << std::string(first, last) << "'\n"; } } 

Output:

 Parse success: 'a' Parse success: 'a^b' Parse success: 'a_b' Parse success: 'ab' Parse success: '{a}^{b}' Parse success: '{a}_{b}' Parse success: '{a} {b}' Parse success: 'a^{b_x y}' Parse success: 'a^{b_x y}_z {v_x}^{{x^z}_y}' 

Output with debugging information enabled:

 <sequence> <try>a</try> <expr> <try>a</try> <simple> <try>a</try> <success></success> </simple> <success></success> </expr> <success></success> </sequence> Parse success: 'a' <sequence> <try>a^b</try> <expr> <try>a^b</try> <simple> <try>a^b</try> <success>^b</success> </simple> <simple> <try>b</try> <success></success> </simple> <success></success> </expr> <success></success> </sequence> Parse success: 'a^b' <sequence> <try>a_b</try> <expr> <try>a_b</try> <simple> <try>a_b</try> <success>_b</success> </simple> <simple> <try>b</try> <success></success> </simple> <success></success> </expr> <success></success> </sequence> Parse success: 'a_b' <sequence> <try>a b</try> <expr> <try>a b</try> <simple> <try>a b</try> <success> b</success> </simple> <success> b</success> </expr> <expr> <try>b</try> <simple> <try>b</try> <success></success> </simple> <success></success> </expr> <success></success> </sequence> Parse success: 'ab' <sequence> <try>{a}^{b}</try> <expr> <try>{a}^{b}</try> <simple> <try>{a}^{b}</try> <sequence> <try>a}^{b}</try> <expr> <try>a}^{b}</try> <simple> <try>a}^{b}</try> <success>}^{b}</success> </simple> <success>}^{b}</success> </expr> <success>}^{b}</success> </sequence> <success>^{b}</success> </simple> <simple> <try>{b}</try> <sequence> <try>b}</try> <expr> <try>b}</try> <simple> <try>b}</try> <success>}</success> </simple> <success>}</success> </expr> <success>}</success> </sequence> <success></success> </simple> <success></success> </expr> <success></success> </sequence> Parse success: '{a}^{b}' <sequence> <try>{a}_{b}</try> <expr> <try>{a}_{b}</try> <simple> <try>{a}_{b}</try> <sequence> <try>a}_{b}</try> <expr> <try>a}_{b}</try> <simple> <try>a}_{b}</try> <success>}_{b}</success> </simple> <success>}_{b}</success> </expr> <success>}_{b}</success> </sequence> <success>_{b}</success> </simple> <simple> <try>{b}</try> <sequence> <try>b}</try> <expr> <try>b}</try> <simple> <try>b}</try> <success>}</success> </simple> <success>}</success> </expr> <success>}</success> </sequence> <success></success> </simple> <success></success> </expr> <success></success> </sequence> Parse success: '{a}_{b}' <sequence> <try>{a} {b}</try> <expr> <try>{a} {b}</try> <simple> <try>{a} {b}</try> <sequence> <try>a} {b}</try> <expr> <try>a} {b}</try> <simple> <try>a} {b}</try> <success>} {b}</success> </simple> <success>} {b}</success> </expr> <success>} {b}</success> </sequence> <success> {b}</success> </simple> <success> {b}</success> </expr> <expr> <try>{b}</try> <simple> <try>{b}</try> <sequence> <try>b}</try> <expr> <try>b}</try> <simple> <try>b}</try> <success>}</success> </simple> <success>}</success> </expr> <success>}</success> </sequence> <success></success> </simple> <success></success> </expr> <success></success> </sequence> Parse success: '{a} {b}' <sequence> <try>a^{b_x y}</try> <expr> <try>a^{b_x y}</try> <simple> <try>a^{b_x y}</try> <success>^{b_x y}</success> </simple> <simple> <try>{b_x y}</try> <sequence> <try>b_x y}</try> <expr> <try>b_x y}</try> <simple> <try>b_x y}</try> <success>_x y}</success> </simple> <simple> <try>xy}</try> <success> y}</success> </simple> <success> y}</success> </expr> <expr> <try>y}</try> <simple> <try>y}</try> <success>}</success> </simple> <success>}</success> </expr> <success>}</success> </sequence> <success></success> </simple> <success></success> </expr> <success></success> </sequence> Parse success: 'a^{b_x y}' <sequence> <try>a^{b_x y}_z {v_x}^{{</try> <expr> <try>a^{b_x y}_z {v_x}^{{</try> <simple> <try>a^{b_x y}_z {v_x}^{{</try> <success>^{b_x y}_z {v_x}^{{x</success> </simple> <simple> <try>{b_x y}_z {v_x}^{{x^</try> <sequence> <try>b_x y}_z {v_x}^{{x^z</try> <expr> <try>b_x y}_z {v_x}^{{x^z</try> <simple> <try>b_x y}_z {v_x}^{{x^z</try> <success>_x y}_z {v_x}^{{x^z}</success> </simple> <simple> <try>xy}_z {v_x}^{{x^z}_</try> <success> y}_z {v_x}^{{x^z}_y</success> </simple> <success> y}_z {v_x}^{{x^z}_y</success> </expr> <expr> <try>y}_z {v_x}^{{x^z}_y}</try> <simple> <try>y}_z {v_x}^{{x^z}_y}</try> <success>}_z {v_x}^{{x^z}_y}</success> </simple> <success>}_z {v_x}^{{x^z}_y}</success> </expr> <success>}_z {v_x}^{{x^z}_y}</success> </sequence> <success>_z {v_x}^{{x^z}_y}</success> </simple> <simple> <try>z {v_x}^{{x^z}_y}</try> <success> {v_x}^{{x^z}_y}</success> </simple> <success> {v_x}^{{x^z}_y}</success> </expr> <expr> <try>{v_x}^{{x^z}_y}</try> <simple> <try>{v_x}^{{x^z}_y}</try> <sequence> <try>v_x}^{{x^z}_y}</try> <expr> <try>v_x}^{{x^z}_y}</try> <simple> <try>v_x}^{{x^z}_y}</try> <success>_x}^{{x^z}_y}</success> </simple> <simple> <try>x}^{{x^z}_y}</try> <success>}^{{x^z}_y}</success> </simple> <success>}^{{x^z}_y}</success> </expr> <success>}^{{x^z}_y}</success> </sequence> <success>^{{x^z}_y}</success> </simple> <simple> <try>{{x^z}_y}</try> <sequence> <try>{x^z}_y}</try> <expr> <try>{x^z}_y}</try> <simple> <try>{x^z}_y}</try> <sequence> <try>x^z}_y}</try> <expr> <try>x^z}_y}</try> <simple> <try>x^z}_y}</try> <success>^z}_y}</success> </simple> <simple> <try>z}_y}</try> <success>}_y}</success> </simple> <success>}_y}</success> </expr> <success>}_y}</success> </sequence> <success>_y}</success> </simple> <simple> <try>y}</try> <success>}</success> </simple> <success>}</success> </expr> <success>}</success> </sequence> <success></success> </simple> <success></success> </expr> <success></success> </sequence> Parse success: 'a^{b_x y}_z {v_x}^{{x^z}_y}' 
+4
source

All Articles