List of all regular expression words

Question

Suppose I have a line: "Lorem ipsum dolor sit amet" I need a list of all words with a length of more than 3. Can I do this with regular expressions?

eg.

pattern = re.compile(r'some pattern') result = pattern.search('Lorem ipsum dolor sit amet').groups()

the result contains "Lorem", "ipsum", "dolor" and "amet".

Edition:

Words that I mean can only contain letters and numbers.

+7

szaman Jan 4 '11 at 13:33

source share

4 answers

This is a typical example of using lists in Python, which can filter:

 text = 'Lorem ipsum dolor sit amet' result = [word for word in pattern.findall(text) if len(word) > 3]

+2

jsbueno Jan 4 '11 at 13:36

source share

pattern = re.compile("\w\w\w(\w+)") result = pattern.search('Lorem ipsum dolor sit amet').groups()

+2

krakover Jan 4 '11 at 13:38

source share

 pattern = re.compile(r'(\S{4,})') pattern.findall('Lorem ipsum dolor sit amet') ['Lorem', 'ipsum', 'dolor', 'amet']

0

albertov Jan 4 '11 at 13:43

source share

Tim pietzcker · Accepted Answer · 2011-01-04T13:41:35+0000

 >>> import re >>> myre = re.compile(r"\w{4,}") >>> myre.findall('Lorem, ipsum! dolor sit? amet...') ['Lorem', 'ipsum', 'dolor', 'amet']

Note that in Python 3, where all lines are Unicode, this will also find words that use letters other than ASCII:

 >>> import re >>> myre = re.compile(r"\w{4,}") >>> myre.findall('Lorem, ipsum! dolör sit? amet...') ['Lorem', 'ipsum', 'dolör', 'amet']

In Python 2 you will need to use

 >>> myre = re.compile(r"\w{4,}", re.UNICODE) >>> myre.findall(u'Lorem, ipsum! dolör sit? amet...') [u'Lorem', u'ipsum', u'dol\xf6r', u'amet']