I am doing a regular expression to find the end of sentences in the text. Here I assume that any sentence could end either.!? Sometimes, although people love two, write !!!!!! by their sentence. So I want to replace any duplicate dots, exclamation points or question marks. But I want to allow the use of "...". How to enable this exception? Please advise, thanks!
Pattern p = null;
try {
p = Pattern.compile("([!?.]\\s*)([!?.]\\s*)+");
}
catch (PatternSyntaxException pex) {
pex.printStackTrace();
System.exit(0);
}
Matcher m = p.matcher(this.sentence);
int index = 0;
while(m.find(index))
{
System.out.println(this.sentence);
System.out.println(p.toString());
String toReplace = sentence.substring(m.start(), m.end());
toReplace = toReplace.replaceAll("\\.","\\\\.");
toReplace =toReplace.replaceAll("\\?","\\\\?");
String replacement = ""+sentence.charAt(m.start());
this.sentence = this.sentence.replaceAll(toReplace, replacement);
System.out.println("");
index = m.end();
System.out.println(this.sentence);
}
source
share