The esmre library does the trick. In your case itβs simpler, esm (part of esmre) is what you want.
https://pypi.python.org/pypi/esmre/
https://code.google.com/p/esmre/
They have good documentation and examples: From their examples:
>>> import esm >>> index = esm.Index() >>> index.enter("he") >>> index.enter("she") >>> index.enter("his") >>> index.enter("hers") >>> index.fix() >>> index.query("this here is history") [((1, 4), 'his'), ((5, 7), 'he'), ((13, 16), 'his')] >>> index.query("Those are his sheep!") [((10, 13), 'his'), ((14, 17), 'she'), ((15, 17), 'he')] >>>
I conducted several performance tests:
import random, timeit, string, esm def uz(lelist, lestring): for x in lelist: if lestring.count(x): return 'Yep. "%s" contains characters from "%s" item.' % (lestring, x) def ab(lelist, lestring): return [e for e in lelist if e in lestring] def use_esm(index, lestring): return index.query(lestring) for TEXT_LEN in [5, 50, 1000]: for SEARCH_LEN in [5, 20]: for N in [5, 50, 1000, 10000]: if TEXT_LEN < SEARCH_LEN: continue print 'TEXT_LEN:', TEXT_LEN, 'SEARCH_LEN:', SEARCH_LEN, 'N:', N lestring = ''.join((random.choice(string.ascii_uppercase + string.digits) for _ in range(TEXT_LEN))) lelist = [''.join((random.choice(string.ascii_uppercase + string.digits) for _ in range(SEARCH_LEN))) for _ in range(N)] index = esm.Index() for i in lelist: index.enter(i) index.fix() t_ab = timeit.Timer("ab(lelist, lestring)", setup="from __main__ import lelist, lestring, ab") t_uz = timeit.Timer("uz(lelist, lestring)", setup="from __main__ import lelist, lestring, uz") t_esm = timeit.Timer("use_esm(index, lestring)", setup="from __main__ import index, lestring, use_esm") ab_time = t_ab.timeit(1000) uz_time = t_uz.timeit(1000) esm_time = t_esm.timeit(1000) min_time = min(ab_time, uz_time, esm_time) print ' ab%s: %f' % ('*' if ab_time == min_time else '', ab_time) print ' uz%s: %f' % ('*' if uz_time == min_time else '', uz_time) print ' esm%s %f:' % ('*' if esm_time == min_time else '', esm_time)
And the results obtained depend mainly on the number of items you are looking for (in my case, "N"):
TEXT_LEN: 1000 SEARCH_LEN: 20 N: 5 ab*: 0.001733 uz: 0.002512 esm 0.126853: TEXT_LEN: 1000 SEARCH_LEN: 20 N: 50 ab*: 0.017564 uz: 0.023701 esm 0.079925: TEXT_LEN: 1000 SEARCH_LEN: 20 N: 1000 ab: 0.370371 uz: 0.489523 esm* 0.133783: TEXT_LEN: 1000 SEARCH_LEN: 20 N: 10000 ab: 3.678790 uz: 4.883575 esm* 0.259605: