I have a text file similar to;
section heading 1:
some words can be any words more words can be anything at all etc. lala
some other header:
as before maybe anything hey is not this fun
I am trying to compose a grammar with pyparser, which will lead to the following list structure when querying the parsed results as a list; (IE, when repeating through parsed.asList () elements, print the following:
['header 1:', [["some words can be anything"], ["more words can be anything at all"], ['etc etc lala]]]
['some other header:', [['as anything could be before "], [' hey is not this fun ']]]
Header names are known in advance, and individual headers may or may not be displayed. If they appear, then there is always at least one line of content.
The problem I am facing is that I am having problems with the gettnig parser to find out where the section heading 1 begins: 'ands and' some other header: '. I end up looking like parsed.asList ();
['header 1:', [['' some words can be anything '], [' more words can be anything at all], ['etc etc lala'], ['some other header'], ['' by -can still be anything '], [' hey is not this fun ']]]
(IE: section 1 title: displayed correctly, but each subsequent one is added to section 1 title, including additional title lines, etc.)
I tried different things, playing with leaveWhitespace () and LineEnd () in different ways, but I can't figure it out.
The basic parser I'm dealing with is this (a far-fetched example - this is actually a class definition, etc.).
header_1_line=Literal('section header 1:') text_line=Group(OneOrMore(Word(printables))) header_1_block=Group(header_1_line+Group(OneOrMore(text_line))) header_2_line=Literal('some other header:') header_2_block=Group(header_2_line+Group(OneOrMore(text_line))) overall_structure=ZeroOrMore(header_1_block|header_2_block)
and called with
parsed=overall_structure.parseFile()
Greetings, Matt.