Regular expression to find the end of sentences

I am doing a regular expression to find the end of sentences in the text. Here I assume that any sentence could end either.!? Sometimes, although people love two, write !!!!!! by their sentence. So I want to replace any duplicate dots, exclamation points or question marks. But I want to allow the use of "...". How to enable this exception? Please advise, thanks!

Pattern p = null;
    try {
    //([!?.] with optional spaces), followed by ([!?.] with optional spaces) repeated 1 or more times
        p = Pattern.compile("([!?.]\\s*)([!?.]\\s*)+");
    }
    catch (PatternSyntaxException pex) {
        pex.printStackTrace();
        System.exit(0);
    }

    //get the matcher
    Matcher m = p.matcher(this.sentence);
    int index = 0;
    while(m.find(index))
    {
        System.out.println(this.sentence);
        System.out.println(p.toString());
        String toReplace = sentence.substring(m.start(), m.end());
        toReplace = toReplace.replaceAll("\\.","\\\\.");
        toReplace =toReplace.replaceAll("\\?","\\\\?");
        String replacement = ""+sentence.charAt(m.start());
        this.sentence = this.sentence.replaceAll(toReplace, replacement);
        System.out.println("");
        index = m.end();
        System.out.println(this.sentence);
    }
+5
source share
4 answers

Disclaimer: my answer will be disconnected from the topic (without using regular expressions).

, Apache OpenNLP. " ". .

:

String sentences[] = sentenceDetector.sentDetect("  First sentence. Second sentence. ");

Strings. " ", - " ".

, , .

+2

, , "..." char, , , ascii.

.

+ char, "..." ( "..." )

, char "...".

Java, java , - , , split/join, .

- :

str.split("...").join("<special char>")
0

"..." - :

someString.split("(\\.{1,2})|(\\.{4,})|(\\?+)|(!+)");

, , , .

0

- . , ( ) , [.?!], a) , ( ) b) , . , , , . / , , "" ; . . RegEx , NLP .

0

All Articles