Python extract sentence containing the word

I am trying to extract the whole sentence containing the specified word from the text.

txt="I like to eat apple. Me too. Let go buy some apples." txt = "." + txt re.findall(r"\."+".+"+"apple"+".+"+"\.", txt) 

but he returns me:

 [".I like to eat apple. Me too. Let go buy some apples."] 

instead:

 [".I like to eat apple., "Let go buy some apples."] 

Any help please?

+7
source share
6 answers
 In [3]: re.findall(r"([^.]*?apple[^.]*\.)",txt) Out[4]: ['I like to eat apple.', " Let go buy some apples."] 
+9
source

No need for regular expression:

 >>> txt = "I like to eat apple. Me too. Let go buy some apples." >>> [sentence + '.' for sentence in txt.split('.') if 'apple' in sentence] ['I like to eat apple.', " Let go buy some apples."] 
+16
source
 In [7]: import re In [8]: txt=".I like to eat apple. Me too. Let go buy some apples." In [9]: re.findall(r'([^.]*apple[^.]*)', txt) Out[9]: ['I like to eat apple', " Let go buy some apples"] 

But note that the @jamylak split based solution is faster:

 In [10]: %timeit re.findall(r'([^.]*apple[^.]*)', txt) 1000000 loops, best of 3: 1.96 us per loop In [11]: %timeit [s+ '.' for s in txt.split('.') if 'apple' in s] 1000000 loops, best of 3: 819 ns per loop 

The difference in speed is smaller, but still significant for large strings:

 In [24]: txt = txt*10000 In [25]: %timeit re.findall(r'([^.]*apple[^.]*)', txt) 100 loops, best of 3: 8.49 ms per loop In [26]: %timeit [s+'.' for s in txt.split('.') if 'apple' in s] 100 loops, best of 3: 6.35 ms per loop 
+7
source

You can use str.split ,

 >>> txt="I like to eat apple. Me too. Let go buy some apples." >>> txt.split('. ') ['I like to eat apple', 'Me too', "Let go buy some apples."] >>> [ t for t in txt.split('. ') if 'apple' in t] ['I like to eat apple', "Let go buy some apples."] 
+3
source
 r"\."+".+"+"apple"+".+"+"\." 

This line is a bit odd; why compress so many single lines? You could just use r '.. + apple. +. '.

In any case, the problem with your regular expression is its greed. By default, x+ will match x as often as possible. This way your .+ Will match as many characters as possible (any characters); including dots and apple s.

What you want to use instead is a non-greedy expression; can you usually do this by adding ? at the end:. .+? .

This will force you to get the following result:

 ['.I like to eat apple. Me too.'] 

As you can see, you no longer get both apple offers, but still Me too. . This is due to the fact that you are still consistent . after apple , which also makes it impossible to capture the next sentence.

The working regular expression will be: r'\.[^.]*?apple[^.]*?\.'

Here you do not look at any characters, but only at those characters that are not points. We also allow not matching any characters at all (because after apple there are no inaccurate characters in the first sentence). Using this expression leads to the following:

 ['.I like to eat apple.', ". Let go buy some apples."] 
+2
source

Obviously the sample in question is extract sentence containing substring instead of extract sentence containing word . How to solve the extract sentence containing word problem through python is as follows:

The word may be at the beginning | in the middle of the sentence. Not limited to the example in the question, I would provide a general function for finding a word in a sentence:

 def searchWordinSentence(word,sentence): pattern = re.compile(' '+word+' |^'+word+' | '+word+' $') if re.search(pattern,sentence): return True 

limited to an example in the question, we can decide how:

 txt="I like to eat apple. Me too. Let go buy some apples." word = "apple" print [ t for t in txt.split('. ') if searchWordofSentence(word,t)] 

Corresponding output:

 ['I like to eat apple'] 
0
source

All Articles