Here there is a regular expression that works, being more specific. Iβm not sure that itβs preferable for Karmel to answer, but I decided that I would answer the question as asked. Instead of returning None
first two optional groups return an empty string ''
, which, in my opinion, is pretty close.
Pay attention to the structure of the nested group. The first two external groups are optional, but they require a <br />
tag. Thus, if there are less than two tags <br />
, the last element does not match until the end:
rx = r'''\s+ # verbose mode; escape literal spaces (?: # outer non-capturing group ([^<>]*) # inner capturing group without <> (?:<br\ />) # inner non-capturing group matching br )? # whole outer group is optional (?: ([^<>]*) # all same as above (?:<br\ />) )? (?: # outer non-capturing group (.*?) # non-greedy wildcard match (?:\s+</div>) # inner non-capturing group matching div )''' # final group is not optional
Tested:
>>> re.findall(rx, st, re.VERBOSE) [('192 kbps', '2:41', '3.71 mb'), ('', '', '3.49 mb'), ('128 kbps', '3:31', '3.3 mb')]
Pay attention to the re.VERBOSE
flag, which is necessary if you do not remove all spaces and comments above.
source share