Python multiline regex

I have a file structured as follows:

A: some text B: more text even more text on several lines A: and we start again B: more text more multiline text 

I am trying to find a regex that will split my file as follows:

 >>>re.findall(regex,f.read()) [('some text','more text','even more text\non several lines'), ('and we start again','more text', 'more\nmultiline text')] 

So far I have received the following:

 >>>re.findall('A:(.*?)\nB:(.*?)\n(.*?)',f.read(),re.DOTALL) [(' some text', ' more text', ''), (' and we start again', ' more text', '')] 

Multi-line text is not displayed. I think this is because lazy selection is really lazy and doesn’t catch anything, but I take it out, the regular expression becomes really greedy:

 >>>re.findall('A:(.*?)\nB:(.*?)\n(.*)',f.read(),re.DOTALL) [(' some text', ' more text', 'even more text\non several lines\nA: and we start again\nB: more text\nmore\nmultiline text')] 

Does anyone have any ideas? Thanks!

+6
source share
1 answer

You can say that the regex stops matching on the next line starting with A: (or at the end of the line):

 re.findall(r'A:(.*?)\nB:(.*?)\n(.*?)(?=^A:|\Z)', f.read(), re.DOTALL|re.MULTILINE) 
+2
source

Source: https://habr.com/ru/post/927251/


All Articles