How to split a paragraph into sentences

I tried to use:

$string="The Dr. is here!!! I am glad I'm in the USA for the Dr. quality is great!!!!!!"; preg_match_all('~.*?[?.!]~s',$string,$sentences); print_r($sentences); 

But this does not work for Dr., USA, etc.

Does anyone have any better suggestions?

+2
split php regex text-segmentation
source share
3 answers

there is no easy solution for this. you need to do some natural language processing (NLP) in your application and recognize each sentence. there is something that calls OpenNLP , it is a JAVA-based NLP analyzer tool. Or the Stanford NLP parser in Ruby. you can find something similar for php.

here I found a set of classes for processing natural language in PHP.

+12
source share

hmmm might try something like $sentences = preg_split('/.*?[?.!]+\s+/', $string);

+1
source share

This is almost impossible, since your example clearly indicates punctuation characters that can be used, for example, Dr., USA, etc. They do not allow you to find out where the sentence begins / ends.

You need to search for the following characters to decide whether the next (starting after) follows the new punctuation word.

0
source share

All Articles