Python Conditional Regular Expression

This is a question related to conditional regex in python:

I would like to match the string "abc" with

 match(1)="a" match(2)="b" match(3)="c" 

but also matches string " a" with

 match(1)="a" match(2)="" match(3)="" 

The following ALMOST code does this, the problem is that in the first case match(1)="a" but in the second case match(4)="a" (not match(1) as desired).

In fact, if you for g in re.search(myre,teststring2).groups(): over all groups using for g in re.search(myre,teststring2).groups(): you get 6 groups (not 3 as expected).

 import re import sys teststring1 = "abc" teststring2 = " a" myre = '^(?=(\w)(\w)(\w))|(?=\s{2}(\w)()())' if re.search(myre,teststring1): print re.search(myre,teststring1).group(1) if re.search(myre,teststring2): print re.search(myre,teststring2).group(1) 

Any thoughts? (note that this is for Python 2.5)

+4
source share
3 answers

May be...

 import re import sys teststring1 = "abc" teststring2 = " a" myre = '^\s{0,2}(\w)(\w?)(\w?)$' if re.search(myre,teststring1): print re.search(myre,teststring1).group(1) if re.search(myre,teststring2): print re.search(myre,teststring2).group(1) 

This gives a in both cases, as you wish, but perhaps it does not match the way you want, in other cases that you do not show (for example, without spaces in front or spaces and more than one letter afterwards, so the general matched string length != 3 ... but I just assume that you don't want matches in such cases ...?)

+5
source

Each capture group in the expression gets its own index. Try the following:

 r = re.compile("^\s*(\w)(\w)?(\w)?$") abc -> ('a', 'b', 'c') a -> ('a', None, None) 

To break it:

 ^ // anchored at the beginning \s* // Any number of spaces to start with (\w) // capture the first letter, which is required (\w)? // capture the second letter, which is optional (\w)? // capture the third letter, which is optional $ // anchored at the end 
+3
source
 myre = '^(?=\s{0,2}(\w)(?:(\w)(\w))?)' 

This will handle two cases that you describe as you wish, but are not necessarily a general solution. It looks like you came up with a toy problem that represents the real one.

The general solution is very difficult to find, because the processing of later elements depends on the processing of previous and / or reverse. For example, leading spaces should not be there if you have full abc . And if there are leading spaces, you should find only a .

In my opinion, the best way to handle this is with a design | that you originally used. After the match, you can get a code that pulls the groups out of the array and arranges them to your liking.

The rule for groups is that all brackets that do not immediately follow ?: Become a group. This group may be empty because it does not actually correspond to anything, but it will be there.

+1
source

Source: https://habr.com/ru/post/1314042/


All Articles