Add a new line character at the end of a sentence

I have a line that is a fragment of a book (about 1 chapter) this line is one line. I would like to make a new line at the end of each sentence

I solved it with not very complicated code

text = text.replaceAll("\\.","\\.\n"); //same for ? same for ! 

and, of course, this does not give very pleasant results. I don't need this to be perfect, but better I can get it better.

I want to at least check the following before creating a new line character:

 the word before the . is longer then 2 characters there are no dots before the . in the same "word" the character before the . is not a number the character after the dot (and possibly a whitespace after that dot) is not a ( 

Any other suggestions would be really appreciated, as well as the actual code that will do this.

Similar question: Here

Update:

Although my list of priorities is small, because my book does not contain many direct quotes or direct speeches, but the rule that processes sentences inside them will also be fine so that sentences from the same qoute do not include new lines

+4
source share
3 answers

The Stanford CoreNLP toolkit has a class that performs offer segmentation. More details here .

If you say new DocumentPreprocessor(new StringReader(s)).iterator() , where s is a string containing text, it will return you an iterator of sentences.

Please note that this will also make the offer tokenized. If you want the sentence to look like it started, you can simply use this output as a guide for sharing or run the PTBTokenizer -untok (see the same link as above) so that each token sentence looks normal again.

This will almost certainly work better than your list of rules, as your rules do not take into account many important cases.

+3
source

If I understand your requirements correctly, try something like this:

 text = text.replaceAll("[^\\.]{1,}\\D\\.\\s?[^\\(]","\\.\n"); 
+1
source
 String newline = System.getProperty("line.separator"); String yourLine = yourLine.append(newline); 
0
source

Source: https://habr.com/ru/post/1413015/


All Articles