Removing unwanted characters from a phone number string

I am aiming for a regex code to capture a phone number and remove unnecessary characters.

import re strs = 'dsds +48 124 cat cat cat245 81243!!' match = re.search(r'.[ 0-9\+\-\.\_]+', strs) if match: print 'found', match.group() ## 'found word:cat' else: print 'did not find' 

It returns only:

 +48 124 

How can I return the whole number?

+4
source share
3 answers

You want to use sub() , not search() :

 >>> strs = 'dsds +48 124 cat cat cat245 81243!!' >>> re.sub(r"[^0-9+._ -]+", "", strs) ' +48 124 245 81243' 

[^0-9+._ -] is a negative character class . The ^ sign is significant here - this expression means: "Match characters that are neither a number, nor a plus, a dot, an underscore, a space, or a dash."

+ tells the regex engine to match one or more instances of the previous token.

+4
source

The problem with re.sub() is that you get extra spaces in your last line of the phone number. Wrong expression method that returns the correct phone number (no spaces):

 >>> strs = 'dsds +48 124 cat cat cat245 81243!!' >>> ''.join(x for x in strs if x.isdigit() or x == '+') '+4812424581243' 
+4
source

This is what I use to replace all non-digital digits with a single hyphen, and it seems to work for me:

 # convert sequences of non-digits to a single hyphen fixed_phone = re.sub("[^\d]+","-",raw_phone) 
0
source

All Articles