UPDATE 2
The original subject: can I avoid using Ragel |**| if I don't need a backlink?
Updated answer: Yes, you can write a simple tokenizer with ()* if you don't need a backlink.
UPDATE 1
I realized that the question of XML tokening is a red herring, because what I am doing is not specific to XML.
END UPDATES
I have a Ragel scanner / tokenizer that just searches for FooBarEntity elements in files like:
<ABC > <XYZ > <FooBarEntity> <Example >Hello world</Example > </FooBarEntity> </XYZ > <XYZ > <FooBarEntity> <Example >sdrastvui</Example > </FooBarEntity> </XYZ > </ABC >
Scanner Version:
%%{ machine simple_scanner; action Emit { emit data[(ts+14)..(te-15)].pack('c*') } foo = '<FooBarEntity>' any+ :>> '</FooBarEntity>'; main := |* foo => Emit; any; *|; }%%
Version without a scanner (i.e. uses ()* instead of |**| )
%%{ machine simple_tokenizer; action MyTs { my_ts = p } action MyTe { my_te = p } action Emit { emit data[my_ts...my_te].pack('c*') my_ts = nil my_te = nil } foo = '<FooBarEntity>' any+ >MyTs :>> '</FooBarEntity>' >MyTe %Emit; main := ( foo | any+ )*; }%%
I realized this and wrote tests for him at https://github.com/seamusabshere/ruby_ragel_examples
You can see the read / buffer code https://github.com/seamusabshere/ruby_ragel_examples/blob/master/lib/simple_scanner.rl and https://github.com/seamusabshere/ruby_ragel_examples/blob/master/lib/simple_tokenizer.rl
source share