List of all regular expression words

Suppose I have a line: "Lorem ipsum dolor sit amet" I need a list of all words with a length of more than 3. Can I do this with regular expressions?

eg.

pattern = re.compile(r'some pattern') result = pattern.search('Lorem ipsum dolor sit amet').groups() 

the result contains "Lorem", "ipsum", "dolor" and "amet".

Edition:

Words that I mean can only contain letters and numbers.

+7
source share
4 answers
 >>> import re >>> myre = re.compile(r"\w{4,}") >>> myre.findall('Lorem, ipsum! dolor sit? amet...') ['Lorem', 'ipsum', 'dolor', 'amet'] 

Note that in Python 3, where all lines are Unicode, this will also find words that use letters other than ASCII:

 >>> import re >>> myre = re.compile(r"\w{4,}") >>> myre.findall('Lorem, ipsum! dolรถr sit? amet...') ['Lorem', 'ipsum', 'dolรถr', 'amet'] 

In Python 2 you will need to use

 >>> myre = re.compile(r"\w{4,}", re.UNICODE) >>> myre.findall(u'Lorem, ipsum! dolรถr sit? amet...') [u'Lorem', u'ipsum', u'dol\xf6r', u'amet'] 
+15
source

This is a typical example of using lists in Python, which can filter:

 text = 'Lorem ipsum dolor sit amet' result = [word for word in pattern.findall(text) if len(word) > 3] 
+2
source

pattern = re.compile("\w\w\w(\w+)")
result = pattern.search('Lorem ipsum dolor sit amet').groups()

+2
source
 pattern = re.compile(r'(\S{4,})') pattern.findall('Lorem ipsum dolor sit amet') ['Lorem', 'ipsum', 'dolor', 'amet'] 
0
source

All Articles