I believe that you only need to solve these two questions in order to separate the detailed regex:
- delete comments at the end of the line
- remove unoccupied spaces
try this which associates 2 with separate regular expressions:
import re def unverbosify_regex_simple(verbose): WS_RX = r'(?<!\\)((\\{2})*)\s+' CM_RX = r'(?<!\\)((\\{2})*)#.*$(?m)' return re.sub(WS_RX, "\\1", re.sub(CM_RX, "\\1", verbose))
The above version is a simplified version that leaves the escaped spaces as is. The result will be a little more difficult to read, but should work on regex platforms.
Alternatively, for a slightly more complex answer that โcancelsโ the spaces (ie '\' => '') and returns what I think most people expect:
import re def unverbosify_regex(verbose): CM1_RX = r'(?<!\\)((\\{2})*)#.*$(?m)' CM2_RX = r'(\\)?((\\{2})*)(#)' WS_RX = r'(\\)?((\\{2})*)(\s)\s*' def strip_escapes(match):
UPDATE: added comments to explain even odd counts. The first group in CM_RX is fixed to save the full "comment" if the number of slashes is odd.
UPDATE 2: Fixed commenting on regex that didn't touch thumbnail hashes properly. Must handle as "\ # #escaped hash" as well as "# comment with \ # escaped hash" and "\\ # comment"
UPDATE 3: A simplified version has been added that does not clear escaped spaces.
UPDATE 4: Further simplification to exclude negative lookbehind with variable length (and reverse / reverse trick)
source share