Open file and read sentence

I want to open a file and get offers. Sentences in a file go line by line, for example:

"He said, 'I'll pay you five pounds a week if I can have it on my own
terms.'  I'm a poor woman, sir, and Mr. Warren earns little, and the
money meant much to me.  He took out a ten-pound note, and he held it
out to me then and there. 

I am currently using this code:

text = ' '.join(file_to_open.readlines())
sentences = re.split(r' *[\.\?!][\'"\)\]]* *', text)

readlinesresolves offers, is there a good way to solve this problem to get only offers? (without NLTK)

Thanks for attention.

Current issue:

file_to_read = 'test.txt'

with open(file_to_read) as f:
    text = f.read()

import re
word_list = ['Mrs.', 'Mr.']     

for i in word_list:
    text = re.sub(i, i[:-1], text)

What I will return (in the test case) is that Ms. has changed to Mr., while Mr. is simply Mr.. I tried a few other things but it doesn't seem to work. The answer is probably simple, but I miss it

+4
source share
2 answers

Your regex works on the text above if you do this:

with open(filename) as f:
    text = f.read()

sentences = re.split(r' *[\.\?!][\'"\)\]]* *', text)

, "Mr." , / .

, , , :

text = re.sub(r'(M\w{1,2})\.', r'\1', text) # no for loop needed for this, like there was before

"", 1, 2 - (\ w {1,3}), . , "\ 1" ( 1, ). , - - , - , - - , .

:

sentences = re.split(r' *[\.\?!][\'"\)\]]* *', text)

, .

+2

, text-sentence.

:

>>> from text_sentence import Tokenizer
>>> t = Tokenizer()
>>> list(t.tokenize("This is first sentence. This is second one!And this is third, is it?"))
[T('this'/sent_start), T('is'), T('first'), T('sentence'), T('.'/sent_end),
 T('this'/sent_start), T('is'), T('second'), T('one'), T('!'/sent_end),
 T('and'/sent_start), T('this'), T('is'), T('third'), T(','/inner_sep),
 T('is'), T('it'), T('?'/sent_end)]

, NLTK/punkt.

+1

All Articles