How to parse this using anchor grammar?

Question

How to parse this using anchor grammar?

I am trying to make a parser using pegjs . I need to parse something like:

blah blah START Lorem ipsum dolor sit amet, consectetur adipiscing elit END foo bar etc.

It's hard for me to write a rule to catch text from "START" to "END" .

+7

parsing peg pegjs

John smith Sep 01 '12 at 19:26

source share

1 answer

ebohlman · Accepted Answer · 2012-09-03T08:47:46+0000

Use negative prediction predicates:

 phrase =(!"START" .)* "START" result:(!"END" .)* "END" .* { for (var i=0;i<result.length;++i) // remove empty element added by predicate matching {result[i]=result[i][1]; } return result.join(""); }

You need to use a negative predicate for END as well as START, because repetition in pegjs is greedy.

~~Alternatively, the action can be written as~~

 {return result.join("").split(',').join("");}

Although this depends on the not necessarily documented behavior of join when working with nested arrays (namely, that it joins subarrays with commas and then concatenates them). C>

[UPDATE] A shorter way to handle empty elements is

 phrase =(!"START" .)* "START" result:(t:(!"END" .){return t[1];})* "END" .* { return result.join(""); }

How to parse this using anchor grammar?

More articles: