How to restrict the search method regex findall ()

Question

How to restrict the search method regex findall ()

Is there a regex equivalent of the BeautifulSoup limit=X argument to the findall method? I mean, how to find the first X words in question, and then disrupt code execution? thanks

+1

python python-2.7 regex

nutship Apr 26 '13 at 11:52

source share

2 answers

Use re.finditer and itertools.islice :

 from itertools import islice import re limit = 2 for x in islice(re.finditer(r'\d+', '1 2 33'), limit): print(x.group())

As a function:

 def findall_limiter(pattern, string, flags=0): return islice(re.finditer(pattern, string, flags), limit)

eg.

 for match in findall_limiter(r'\d+', '1 2 33', 2): # do stuff

+4

gatto Apr 26 '13 at 11:57

source share

Ashwini chaudhary · Accepted Answer · 2013-04-26T11:56:08+0000

You can use re.finditer as it returns an iterator instead of generating all values at once:

 In [21]: strs="12345678" In [22]: it=re.finditer("\d",strs) In [23]: [next(it).group(0) for _ in xrange(4)] #returns only 4 mathces Out[23]: ['1', '2', '3', '4']

Although this can lead to a StopIteration error when the limit is greater than the number of matches. A simple workaround is to use exception handling or use itertools.isclice :

 In [26]: def limiter(strs,pattern,limit): it=re.finditer(pattern,strs) try: for _ in xrange(limit): yield next(it).group(0) except StopIteration: pass ....: In [27]: list(limiter("12345","\d",3)) Out[27]: ['1', '2', '3'] In [28]: list(limiter("12345","\d",6)) Out[28]: ['1', '2', '3', '4', '5'] In [29]: list(limiter("12345","\d",10)) Out[29]: ['1', '2', '3', '4', '5']

help re.finditer :

 In [24]: re.finditer? Type: function String Form:<function finditer at 0xb74114c4> File: /usr/lib/python2.7/re.py Definition: re.finditer(pattern, string, flags=0) Docstring: Return an iterator over all non-overlapping matches in the string. For each match, the iterator returns a match object. Empty matches are included in the result.

How to restrict the search method regex findall ()

More articles: