Replace multiple capture groups with regexp with java

I have this requirement - for an input string like the one below

8This8 is &reallly& a #test# of %repl%acing% %mul%tiple 9matched9 9pairs 

I would like to remove the matching word boundaries (where the matching pair is 8 or or%, etc.) and will result in the following

 This is really a test of repl%acing %mul%tiple matched 9pairs 

This list of characters, which is used for pairs, may vary, for example. 8.9,%, #, etc., And only those words corresponding to the beginning and end of each type will be deprived of these characters, with the same character embedded in the remaining word, where it is.

Using Java, I can make a pattern like \\b8([^\\s]*)8\\b and replace as $ 1 to capture and replace all occurrences of 8 ... 8, but how to do this for all types of pairs?

I can provide a template such as \\b8([^\\s]*)8\\b|\\b9([^\\s]*)9\\b .. etc. which will match all types of matching pairs * 8.9, ..), but how to specify the substitution group 'variable' -

eg. if the match is 9 ... 9, the replacement should be equal to $ 2.

I can, of course, run it through several of them, each of which replaces a certain type of pair, but I wonder if there is a more elegant way.

Or is there a completely different way to approach this problem?

Thanks.

+7
java regex
source share
2 answers

You can use the following regular expression, and then replace the matching characters with the characters inside the index of group 2.

 (?<!\S)(\S)(\S+)\1(?=\s|$) 

OR

 (?<!\S)(\S)(\S*)\1(?=\s|$) 

Java regex will be,

 (?<!\\S)(\\S)(\\S+)\\1(?=\\s|$) 

Demo

 String s1 = "8This8 is &reallly& a #test# of %repl%acing% %mul%tiple 9matched9 9pairs"; System.out.println(s1.replaceAll("(?<!\\S)(\\S)(\\S+)\\1(?=\\s|$)", "$2")); 

Output:

 This is reallly a test of repl%acing %mul%tiple matched 9pairs 

Explanation:

  • (?<!\\S) Negative lookbehind, states that the match will not be preceded by a non-spatial character.
  • (\\S) Captures the first non-spatial character and stores it in the index of group 1.
  • (\\S+) Capture one or more non-spatial characters.
  • \\1 Refers to the character inside the first captured group.
  • (?=\\s|$) And the match should be followed by a space or the end of the string binding.
  • This ensures that the first character and last character of the string must be the same. If so, then it replaces the entire match with the characters that are present inside the group 2 index.

In this particular case, you can change the above expression as

 String s1 = "8This8 is &reallly& a #test# of %repl%acing% %mul%tiple 9matched9 9pairs"; System.out.println(s1.replaceAll("(?<!\\S)([89&#%])(\\S+)\\1(?=\\s|$)", "$2")); 

Demo

+3
source share
 (?<![a-zA-Z])[8&#%9](?=[a-zA-Z])([^\s]*?)(?<=[a-zA-Z])[8&#%9](?![a-zA-Z]) 

Try it. Replace with $1 or \1 . See the demo.

https://regex101.com/r/qB0jV1/15

 (?<![a-zA-Z])[^a-zA-Z](?=[a-zA-Z])([^\s]*?)(?<=[a-zA-Z])[^a-zA-Z](?![a-zA-Z]) 

Use this if you have many delimiters.

+1
source share

All Articles