Filtering strings containing only numbers and / or punctuation - python

I need to filter out only lines containing only numbers and / or a set of punctuation corrections.

I tried checking each character and then summing the Boolean conditions to check if it is len(str) . Is there a more pythonic way to do this:

 >>> import string >>> x = ['12,523', '3.46', "this is not", "foo bar 42", "23fa"] >>> [i for i in x if [True if j.isdigit() else False for j in i] ] ['12,523', '3.46', 'this is not', 'foo bar 42'] >>> [i for i in x if sum([True if j.isdigit() or j in string.punctuation else False for j in i]) == len(i)] ['12,523', '3.46'] 
+6
source share
2 answers

Using all with a generator expression, you do not need to count, compare the length:

 >>> [i for i in x if all(j.isdigit() or j in string.punctuation for j in i)] ['12,523', '3.46'] 

By the way, the above and OP code will contain strings containing only punctuation.

 >>> x = [',,,', '...', '123', 'not number'] >>> [i for i in x if all(j.isdigit() or j in string.punctuation for j in i)] [',,,', '...', '123'] 

To handle this, add another condition:

 >>> [i for i in x if all(j.isdigit() or j in string.punctuation for j in i) and any(j.isdigit() for j in i)] ['123'] 

You can do this a little faster by storing the result of string.punctuation in a set.

 >>> puncs = set(string.punctuation) >>> [i for i in x if all(j.isdigit() or j in puncs for j in i) and any(j.isdigit() for j in i)] ['123'] 
+4
source

You can use a precompiled regex to test this.

 import re, string pattern = re.compile("[\d{}]+$".format(re.escape(string.punctuation))) x = ['12,523', '3.46', "this is not", "foo bar 42", "23fa"] print [item for item in x if pattern.match(item)] 

Output

 ['12,523', '3.46'] 

A small time comparison between @falsetru's solution and mine

 import re, string punct = string.punctuation pattern = re.compile("[\d{}]+$".format(re.escape(string.punctuation))) x = ['12,523', '3.46', "this is not", "foo bar 42", "23fa"] from timeit import timeit print timeit("[item for item in x if pattern.match(item)]", "from __main__ import pattern, x") print timeit("[i for i in x if all(j.isdigit() or j in punct for j in i)]", "from __main__ import x, punct") 

Exit on my car

 2.03506183624 4.28856396675 

Thus, the precompiled RegEx approach is twice as fast as the all and any approach.

+3
source

All Articles