I struggle with regular expressions. I have problems so that my head wraps around a similar text embedded in a larger text. Perhaps you can help me sort out my thoughts.
Here is an example of a test line:
message msgName { stuff { innerStuff } } \n message mn2 { junk }
I want to output a term (e.g. msgName , mn2 ) and what follows before the following message to get a list like this:
msgName
{stuff {innerStuff} more stuff}
mn2
{junk} '
I am having problems with too greed or without greed to keep the inner brackets, but to split the messages of a higher level.
Here is one program:
import re text = 'message msgName { stuff { innerStuff } more stuff } \n message mn2 { junk }' messagePattern = re.compile('message (.*?) {(.*)}', re.DOTALL) messageList = messagePattern.findall(text) print "messages:\n" count = 0 for message, msgDef in messageList: count = count + 1 print str(count) print message print msgDef
He produces:
messages:
1
msgName
stuff {innerStuff} more stuff}
message mn2 {junk
Here is my next attempt, which makes the inside inanimate:
import re text = 'message msgName { stuff { innerStuff } more stuff } \n message mn2 { junk }' messagePattern = re.compile('message (.*?) {(.*?)}', re.DOTALL) messageList = messagePattern.findall(text) print "messages:\n" count = 0 for message, msgDef in messageList: count = count + 1 print str(count) print message print msgDef
He produces:
messages:
1
msgName
stuff {innerStuff
2
mn2
junk
So I'm losing } more stuff }
I really came across a mental block. Can someone point me in the right direction? I cannot process text in nested brackets. It would be useful to make a proposal for a working regular expression or a simpler example of working with nested similar text.