Regular expression to match all closed '' (2 single quotes)

I am looking for a regex that will provide me with capture groups for each set of 2 single quotes ( '') in one-shot lines ( 'string') that are part of a comma-separated list. For example, a row 'tom''s'will have one group between mand s. I came close, but I continue to go astray, mistakenly agreeing with closed single quotes or only fixing some of the two single quotes in a string.

Input example

'11','22'',','''33','44''','''55''','6''''6'

Desired groups (7, shown in parens)

 '11','22(''),','('')33','44('')','('')55('')','6('')('')6'

In the context, what I'm ultimately trying to do is replace these two single quotes in a sequence of strings separated by commas with a different value to simplify the subsequent parsing.

Note also that commas can be enclosed in single quotes.

+4
source share
1 answer

You cannot match double single quotes like this with a Python module re. You can simply match the entries with one quotation mark and grab the inside of each entry and use lambda, replace ''the inside with a simple one .replace:

import re
p = re.compile(r"'([^']*(?:''[^']*)*)'")
test_str = "'11','22'',','''33','44''','''55''','6''''6'"
print(p.sub(lambda m: "'{}'".format(m.group(1).replace("''", "&")), test_str))

See IDEONE demo , conclusion:'11','22&,','&33','44&','&55&','6&&6'

Regular expression '([^']*(?:''[^']*)*)':

  • ' - opening '
  • ( - Capture group # 1 launch
  • [^']* - '
  • (?:''[^']*)* - 0+ '', 0+ '
  • ) - # 1
  • ' - '
+3

All Articles