Regular expression in Python sentence extractor

I have a script that gives me suggestions containing one of the specified keywords. A sentence is defined as something between two periods.

Now I want to use it to select the whole sentence like “Put 1.5 grams of powder”, where, if the powder was the keyword, he would get the whole sentence, and not “5 grams of powder”

I am trying to figure out how to express that sentence is between sequences of period, then space. My new filter:

def iterphrases(text):
    return ifilter(None, imap(lambda m: m.group(1), finditer(r'([^\.\s]+)', text)))

However, now I no longer print any sentences of only pieces / phrases of words (including my keyword). I am very confused by what I am doing wrong.

+4
source share
3

, re.split ( ):

re.split(r'\.\s', text)

, . ( text ), :

re.split(r'\.\s', re.sub(r'\.\s*$', '', text))

Python - RegEx (-)

, nltk.tokenize

nltk.tokenize.sent_tokenize(text)
+3

. . , ( ) , , .

import re
sentence = re.compile("\w.*?\.(?= |$)", re.MULTILINE)
def iterphrases(text):
    return (match.group(0) for match in sentence.finditer(text))
+2

If you are sure that you are .not using anything other than sentence separators, and that each relevant sentence ends for a period, then the following may be useful:

matches = re.finditer('([^.]*?(powder|keyword2|keyword3).*?)\.', text)
result = [m.group() for m in matches]
0
source

All Articles