Python multiline regex

Question

Python multiline regex

I have a file structured as follows:

A: some text B: more text even more text on several lines A: and we start again B: more text more multiline text

I am trying to find a regex that will split my file as follows:

 >>>re.findall(regex,f.read()) [('some text','more text','even more text\non several lines'), ('and we start again','more text', 'more\nmultiline text')]

So far I have received the following:

 >>>re.findall('A:(.*?)\nB:(.*?)\n(.*?)',f.read(),re.DOTALL) [(' some text', ' more text', ''), (' and we start again', ' more text', '')]

Multi-line text is not displayed. I think this is because lazy selection is really lazy and doesn’t catch anything, but I take it out, the regular expression becomes really greedy:

 >>>re.findall('A:(.*?)\nB:(.*?)\n(.*)',f.read(),re.DOTALL) [(' some text', ' more text', 'even more text\non several lines\nA: and we start again\nB: more text\nmore\nmultiline text')]

Does anyone have any ideas? Thanks!

+6

python regex multiline regex-greedy

jmague Oct 9 '12 at 12:28

source share

1 answer

Tim pietzcker · Answer 1 · 2012-10-09T12:31:53+0000

You can say that the regex stops matching on the next line starting with A: (or at the end of the line):

 re.findall(r'A:(.*?)\nB:(.*?)\n(.*?)(?=^A:|\Z)', f.read(), re.DOTALL|re.MULTILINE)

Python multiline regex

More articles: