What regular expression will capture multiple instances inside brackets / brackets?

How to use regex to capture, say, each run of spaces \ +inside brackets? For example, in the line, "abc and 123 {foo-bar bar baz } bit {yummi tummie} byte." I have to find four matches inside {}, but nothing else. Assume the Python language is unknown.

EDIT: Also suppose that there are no nested brackets.

+4
source share
3 answers

A looker can check if there is }forward without {between.

\s+(?=[^{]*})
  • \sis short for the space character [ \t\r\n\f]. Match +one or more.

  • (?=[^{]*}) , } { .

regex101

+5
>>> s = 'abc   and 123 {foo-bar     bar baz } bit {yummi tummie} byte.'
>>> inside_braces = re.findall(r'\{.*?\}', s)
>>> spaces_inside_braces = [re.findall(r' +', match) for match in inside_braces]
>>> [match for mlist in spaces_inside_braces for match in mlist]  # flatten list
['     ', ' ', ' ', ' ']
  • ? r'\{.*?\}' .
  • , .
  • . , ( ): .. { .w s. IIRC , look-behind .
+3

, . . .

The module regexsupports access to all previous matches of the capture group , which is necessary for the following actions:

>>> import regex
>>> # The regex behavior version seems to make no difference in this case, so both '(?V0)...' and '(?V1)...' will work.
>>> pattern = r'(?V0)[{]   (?P<u>\s+)?  (?: (?: [^\s}]+ (?P<u>\s+) )*  [^\s}]+ (?P<u>\s+)? )?   [}]'
>>> string = 'abc   and 123 {foo-bar     bar baz } bit {yummi tummie} byte.'
>>> [s for m in regex.finditer(pattern, string, regex.VERBOSE) for s in m.captures('u')]
['     ', ' ', ' ', ' ']

Simply put, this regular expression finds matches in the form '{' blanks? ((nonblanks blanks)* nonblanks blanks?)? '}'and assigns all the empty parts to the same capture group with the name u( (?P<u>...)).

It also works with strings containing unsurpassed {and }:

>>> # Even works with dangling braces:
>>> badstring = '}oo} { ab  a   b}}  xy {xy  x y}cd {{   cd  } e{e }f{ f}  { }{} }{'
>>> # Fully flattened result:
>>> [s for m in regex.finditer(pattern, badstring, regex.VERBOSE) for s in m.captures('u')]
[' ', '  ', '   ', '  ', ' ', '   ', '  ', ' ', ' ', ' ']
>>> # Less flattened (e.g. for verification):
>>> [v for m in regex.finditer(pattern, badstring, regex.VERBOSE) for v in m.capturesdict().values()]
[[' ', '  ', '   '], ['  ', ' '], ['   ', '  '], [' '], [' '], [' '], []]

Tested on Python 3.5.1 x64, regex 2016.3.2.

-1
source

All Articles