Python regex returns part of the match when used with re.findall

I tried to teach myself Python and am currently on regular expressions. The training text I use seems to be aimed at teaching Perl or some other language that is not Python, so I had to adapt expressions to Python a bit. However, I am not very experienced, and I fell into the trap of trying to get the expression to work.

The problem is finding text for price instances expressed either without decimals, $ 500, or with decimals, $ 500.10.

Here is what the text recommends:

\$[0-9]+(\.[0-9][0-9])? 

Replicating the text, I use this code:

 import re inputstring = "$500.01" result = re.findall( r'\$[0-9]+(\.[0-9][0-9])?', inputstring) if result: print(result) else: print("No match.") 

However, the result is not equal to $ 500.01, but rather:

 .01 

I find it strange. If I remove the parentheses and the optional decimal part, it works fine. So using this:

 \$[0-9]+\.[0-9][0-9] 

I get:

 $500.01 

How to get a regular expression to return values ​​with decimal parts and without them?

Thanks.

+5
source share
1 answer

Use a non-capture group:

 result = re.findall( r'\$[0-9]+(?:\.[0-9][0-9])?', inputstring) ^^ 

The re.findall function returns a list of captured texts, if they are in the template, and you have one of them. You need to get rid of it, turning it into not exciting.

re.findall (pattern, string, flags = 0)
If one or more groups are present in the template, return the list of groups; this will be a list of tuples if the template has more than one group.

Update

You can shorten your regex by using the limit quantifier {2} , which requires exactly 2 occurrences of the previous subpattern:

 r'\$[0-9]+(?:\.[0-9]{2})?' ^^^ 

Or replace [0-9] with \d :

 r'\$\d+(?:\.\d{2})?' 
+4
source

All Articles