Find some suggestions

I would like to find a good way to find some (let it be two) sentences in some text. Which is better - use regexp or split-method? Your ideas

At the request of Jeremy Stein - there are some examples

<strong> Examples:

Input:

The first thing to do is create a comment model. Well create this in the usual way, but with a slight difference. If we just created comments for the wed article, an integer field called article_id in the model to hold the foreign key, but in this case something more abstract would be needed.

The first two sentences:

The first thing to do is create a comment model. Well create this in the usual way, but with a slight difference.

Input:

Mr. T is one middle dude. I would not want to fight him.

The first two sentences:

Mr. T is one middle dude. I would not want to fight him.

Input:

. . 9:11 . ET.

:

. . 9:11 . ET.

:

, "... ". , , .

:

, "... ". , , .

, - .: (

+5
7

, , . , . Punkt NLP, Python Natural Language Toolkit, . .

, , , Punkt: Kiss, Tibor and Strunk, Jan ( 2006): . 32: 485-525.

, .

+4
 your_string = "First sentence. Second sentence. Third sentence"
 sentences = your_string.split(".")
 => ["First sentence", " Second sentence", " Third sentence"]

.

. , , , , . NLP , .

, , :

  • : dd.mm.yyyy
  • : ; " , . ...". .
  • : 138 . .

, .

+3
irb(main):005:0> a = "The first sentence. The second sentence. And the third"
irb(main):006:0> a.split(".")[0...2]
=> ["The first sentence", " The second sentence"]
irb(main):007:0>

EDIT: , " ...... . ..." :

irb(main):001:0> a = "This is the first sentence ....... And the second. Let not forget the third"
=> "This is the first sentence ....... And the second. Let not forget the thir
d"
irb(main):002:0> a.split(/\.+/)
=> ["This is the first sentence ", " And the second", " Let not forget the thi    rd"]

..., 2.

+1

.

/\S(?:(?![.?!]+\s).)*[.?!]+(?=\s|$)/m

.

+1

. .

+1

, , Regex

((YOUR SENTENCE HERE)|(YOUR OTHER SENTENCE)){1}

Split, , , , ( , ), Regex , ( , )

0

, , , , . !, ? . ( , , . , ).

, , , , - , , - . . 100% , , , 100% .

I suggest looking in the literature on sentence segmentation methods and looking at the various tools for processing natural language that are there. I haven't found it for Ruby yet, but I like OpenNLP (which is in Java).

0
source

All Articles