Standard Regex vs python regex mismatch

I am reading a book, and they give an example of how to match a given string with regular expressions. Here is an example of them:

b*(abb*)*(a|∊) - Strings of a and b with no consecutive a's.

Now I tried to convert it to python as follows:

>> p = re.compile(r'b*(abb*)*(a|)') # OR
>> p = re.compile(r'b*(abb*)*(a|\b)')

# BUT it still doesn't work
>>> p.match('aa')
<_sre.SRE_Match object at 0x7fd9ad028c68>

My question is double:

  • What is the epsilon equivalent in python to make the above example?
  • Can someone explain to me why the theoretical or standard way of doing regular expressions doesn't work in python? Could this have anything to do with the longest and shortest match?

Clarification: for people specifying a standard regex, this is a formal form of language theory: http://en.wikipedia.org/wiki/Regular_expression#Formal_language_theory

+5
7

. , . .

  • ? (- | & epsilon;). , (a | & epsilon;) a?. , :

    b*(abb*)*a?
    

    python :

    p = re.compile(r'^b*(abb*)*a?$')
    
  • , python (.. ) , , python ( $ ^ ), .
    , , :

    s = 'aa'
    

    regex b * (abb *) * a? , a. , python:

    >> p = re.compile(r'b*(abb*)*a?')
    >> bool(p.match(s))
    True
    

    , 'a' 'aa'.
    , python , , , ^ $

    >> p = re.compile(r'^b*(abb*)*a?$')
    >> bool(p.match(s))
    False
    

    , python regex match() , ^ . search() , ^.
     , :

    >> s = 'aa'
    >> p = re.compile(r'b*(abb*)*a?$')
    >> bool(p.match(s))
    False                 # Correct
    >> bool(p.search(s))
    True                  # Incorrect - search ignored the first 'a'
    
+5

, ... . :

>>> p = re.compile('b*(abb*)*a?')
>>> m = p.match('aa')
>>> print m.group(0)
'a'
>>> m = p.match('abbabbabababbabbbbbaaaaa')
>>> print m.group(0)
abbabbabababbabbbbba

, 0 , .

, a b a. , :

>>> p = re.compile('^b*(abb*)*a?$')
>>> m = p.match('aa')
>>> print m
None

^ $ .

, , , :

>>> len(m.group(0)) == len('aa')

:. OT , python . , , python ( ).

+5

1

  • bool(p.match('aa'))

  • p = re.compile('b*(abb*)*a?$')

  • \b ; \w \w ( )

2

Regexp python. , , 100% . , regexp .

\epsilon python. .

a|\epsilon (a|) a?. $ .

+3

, python, , ^.... $ RE. RegExp , , p.match('aa'), "a" (, ). ^... $, ENTIRE, , .

/ reg exps , , , , .

+3

, . . , :

re.compile(r'^(a(?!a)|b)*$')
+1

re epsilon, , epsilon .

, 'a'. , :

  • "b" s ( )
  • "(abb*)" s ( )
  • "a" ( a).

, , ('^') ('$') . python: r'my regex '. , .

+1

, , , :

>>> p = re.compile('b*(abb*)*(a|)')
>>> p.match('c').group(0)
''

re.match , , . $

>>> p = re.compile(r'b*(abb*)*(a|)$')
>>> print p.match('c')
None
>>> p.match('ababababab').group(0)
'ababababab'

ps- you might notice that I used r'pattern 'instead of' pattern 'anymore on this here (first paragraphs)

+1
source

All Articles