Here is a sample code:
$ cat a.py a = '27893jkasnf8u2qrtq2ntkjh8934yt8.298222rwagasjkijw' for i in xrange(1000000): ''.join([c for c in a if c in '1234567890.'])
$ cat b.py import re non_decimal = re.compile(r'[^\d.]+') a = '27893jkasnf8u2qrtq2ntkjh8934yt8.298222rwagasjkijw' for i in xrange(1000000): non_decimal.sub('', a)
$ cat c.py a = '27893jkasnf8u2qrtq2ntkjh8934yt8.298222rwagasjkijw' for i in xrange(1000000): ''.join([c for c in a if c.isdigit() or c == '.'])
$ cat d.py a = '27893jkasnf8u2qrtq2ntkjh8934yt8.298222rwagasjkijw' for i in xrange(1000000): b = [] for c in a: if c.isdigit() or c == '.': continue b.append(c) ''.join(b)
And the synchronization results:
$ time python a.py real 0m24.735s user 0m21.049s sys 0m0.456s $ time python b.py real 0m10.775s user 0m9.817s sys 0m0.236s $ time python c.py real 0m38.255s user 0m32.718s sys 0m0.724s $ time python d.py real 0m46.040s user 0m41.515s sys 0m0.832s
It seems that regex is a winner.
Personally, I find regular expression as easy to read as list comprehension. If you do this just a few times, then you will probably be more affected by regular expression compilation. Do what jives with your code and coding style.
Colin Burnett Jun 03 '09 at 23:44 2009-06-03 23:44
source share