Given a text file where the character I want to match is limited to single quotes, but can have zero or one escaped single quotation mark, as well as zero or more tabs and newlines (not escaped) - I want to match only the text. Example:
menu_item = 'casserole'; menu_item = 'meat loaf'; menu_item = 'Tony\ magic pizza'; menu_item = 'hamburger'; menu_item = 'Dave\ famous pizza'; menu_item = 'Dave\ lesser-known gyro';
I want to capture only text (and spaces), ignoring tabs / newlines - and I'm really not interested if an escaped quote appears in the results, if it does not affect the match:
casserole meat loaf Tonys magic pizza hamburger Daves famous pizza Dave\ lesser-known gyro # quote is okay if necessary.
I managed to create a regex that almost does this - it processes escaped quotes, but not newlines:
menuPat = r"menu_item = \'(.*)(\\\')?(\t|\n)*(.*)\'" for line in inFP.readlines(): m = re.search(menuPat, line) if m is not None: print m.group()
There are definitely a lot of questions about regex, but most of them use Perl, and if there is one that does what I want, I could not figure it out. And since I use Python, donβt worry if it spreads over several groups, it is easy to recombine them.
Some answers say that they just come with code to parse the text. Although I'm sure I can do it - I'm so close to having a working regular expression :) And it looks like this should be doable.
Update: I only realized that I was doing Python readlines () to get every line, which obviously breaks the lines passed into the regular expression. I am looking at re-recording, but any suggestions on this part will also be very helpful.
John c
source share