How to save separator when using RegEx?

I asked about punctuation and regular expression, but that was not clear.

I believe that I have this text:

String text = "wor.d1, :word2. wo,rd3? word4!"; 

I'm doing it:

 String parts[] = text.split(" "); 

And I have this:

 wor.d1, | :word2. | wor,d3? | word4!; 

What do I need to do to have this? (keep the characters on the borders, but only I specify: .,!?: , not all).

 wor,d1 | , | : | word2 | . | wor,d3 | ? | word4 | ! 

UPDATE

I get good results with this regex, but it gives an empty char before everything splits into punctuation at the beginning of the word.

Is there a way to not have this empty char at the beginning?

Is this regular expression good, or is there an easier way?

 public static final String PUNCTUATION_SEPARATOR = "(" + "(" + "(?=^[\"'!?.,;:(){}\\[\\]]+)" + "|" + "(?<=^[\"'!?.,;:(){}\\[\\]]+)" + ")" + "|" + "(" + "(?=[\"'!?.,;:(){}\\[\\]]+($|\n))" + "|" + "(?<=[\"'!?.,;:(){}\\[\\]]+($|\n))" + ")" + ")"; 
+7
source share
5 answers
 public static final String PUNCTUATION_SEPARATOR = "(" + "(" + "(?=^[\"'!?.,;:(){}\\[\\]-]+)" + "|" + "(?<=^[\"'!?.,;:(){}\\[\\]-]+)" + ")" + "|" + "(" + "(?=[\"'!?.,;:(){}\\[\\]-]+($|\n))" + "|" + "(?<=[\"'!?.,;:(){}\\[\\]-]+($|\n))" + ")" + ")"; 
0
source

Are you sure you want to use regex? There is a faster implementation for splitting into a single char: StringTokenizer. And this can return the delimiters.

 String str= "word1, word2. word3? word4!"; String delim = ",.!?"; StringTokenizer st = new StringTokenizer(str, delim, true); while (st.hasMoreTokens()) { String token = st.nextToken(); ... // token will be: "word1", ",", " word2", ".", etc... } 
+2
source

For simple delimiters, I recommend StringTokenizer. But here is a solution using regex and another auxiliary delimiter:

 String s = "one,two, three four , five"; s = s.replaceAll("([,\\s]+)", "#$1#"); Pattern p = Pattern.compile("#"); String[] result = p.split(s); 
+1
source

Here is a regex that I think will work:

 /\s|(?=[\.,:?!](\W|$))|(?<=\W[\.:?!])/ 
+1
source

In my opinion, you want this . First you blow up your line and the second step using the implode function.

0
source

All Articles