Removing all non-numeric characters from a string in Python

How to remove all non-numeric characters from a string in Python?

+109
python numbers
Aug 08 '09 at 17:13
source share
8 answers
>>> import re >>> re.sub("[^0-9]", "", "sdkjh987978asd098as0980a98sd") '987978098098098' 
+211
Aug 08 '09 at 17:25
source share

Not sure if this is the most effective way, but:

 >>> ''.join(c for c in "abc123def456" if c.isdigit()) '123456' 

The ''.join means combining all the resulting characters together without any characters in between. Then the rest is an understanding of the list, where (as you probably guessed), we accept only parts of the string that match the isdigit condition.

+75
Aug 08 '09 at 17:16
source share

This should work for strings and Unicode objects:

 # python <3.0 def only_numerics(seq): return filter(type(seq).isdigit, seq) # python ≥3.0 def only_numerics(seq): seq_type= type(seq) return seq_type().join(filter(seq_type.isdigit, seq)) 
+13
Sep 07 '09 at 3:01
source share

The quickest approach, if you need to perform more than one or two such delete operations (or even one, but a very long line!), Is to rely on the translate string method, although it needs preparation:

 >>> import string >>> allchars = ''.join(chr(i) for i in xrange(256)) >>> identity = string.maketrans('', '') >>> nondigits = allchars.translate(identity, string.digits) >>> s = 'abc123def456' >>> s.translate(identity, nondigits) '123456' 

The translate method is different and might be easier to use on Unicode strings than on byte strings, btw:

 >>> unondig = dict.fromkeys(xrange(65536)) >>> for x in string.digits: del unondig[ord(x)] ... >>> s = u'abc123def456' >>> s.translate(unondig) u'123456' 

You might want to use a collation class rather than an actual dict, especially if your Unicode string can contain characters with very large ord values ​​(which will make an excessive dict ;-). For example:

 >>> class keeponly(object): ... def __init__(self, keep): ... self.keep = set(ord(c) for c in keep) ... def __getitem__(self, key): ... if key in self.keep: ... return key ... return None ... >>> s.translate(keeponly(string.digits)) u'123456' >>> 
+5
Aug 08 '09 at 17:35
source share

To add another parameter to the mix, the string module has several useful constants. Although they are more useful in other cases, they can be used here.

 >>> from string import digits >>> ''.join(c for c in "abc123def456" if c in digits) '123456' 

There are several constants in the module, including:

  • ascii_letters (abbreviation)
  • hexdigits (0123456789abcdefABCDEF)

If you use these constants heavily, it may be helpful to hide them until frozenset . This allows you to use O (1) rather than O (n), where n is the constant length for the source strings.

 >>> digits = frozenset(digits) >>> ''.join(c for c in "abc123def456" if c in digits) '123456' 
+5
Sep 07 '12 at 10:37
source share

@ Ned Batchelder and @newacct gave the correct answer, but ...

Just in case, if your line has a comma (,) decimal (.):

 import re re.sub("[^\d\.]", "", "$1,999,888.77") '1999888.77' 
0
Nov 09 '18 at 15:49
source share

I do not have enough reputation, but I tried the solution in the second most popular comment ( https://stackoverflow.com/a/165478/ ). there was feedback, and I corrected it. I suppose there should be a “[]” to understand the list?

 def strip_nonnumerics(s): return ''.join([i for i in s if i.isdigit()]) 
0
May 17 '19 at 14:55
source share
 user = (input): print ("hello") 
-5
Aug 18 '17 at 9:54 on
source share



All Articles