Avoid matching regular expression matching in Java

For some reason, this piece of Java code gives me matching matches:

Pattern pat = Pattern.compile("(" + leftContext + ")" + ".*" + "(" + rightContext + ")", Pattern.DOTALL); 

any way / option to avoid overlapping detection? e.g. leftContext rightContext rightContext should be 1 match instead of 2

Here is the full code:

 public static String replaceWithContext(String input, String leftContext, String rightContext, String newString){ Pattern pat = Pattern.compile("(" + leftContext + ")" + ".*" + "(" + rightContext + ")", Pattern.DOTALL); Matcher matcher = pat.matcher(input); StringBuffer buffer = new StringBuffer(); while (matcher.find()) { matcher.appendReplacement(buffer, ""); buffer.append(matcher.group(1) + newString + matcher.group(2)); } matcher.appendTail(buffer); return buffer.toString(); } 

So, the final answer, using a negative look, my bad for not being aware * was greedy:

 Pattern pat = Pattern.compile("(" + leftContext + ")" + "(?:(?!" + rightContext + ").)*" + "(" + rightContext + ")", Pattern.DOTALL); 
+4
source share
2 answers

Your use of the word "overlap" is confused. Apparently, you meant that the regex is too greedy, matching everything from the first leftContext to the last rightContext . It seems you already understood this, and came up with a better approach, but there is still at least one potential problem.

You said that leftContext and rightContext are โ€œsimple stringsโ€ that I assume that you meant that they should not be interpreted as regular expressions, but they will. You need to avoid them, or any regular expression metacharacters that they contain will lead to incorrect results or runtime exceptions. The same goes for your replacement string, although only $ , and the backslash has special meanings. Here is an example (note the non-greedy .*? ):

 public static String replaceWithContext(String input, String leftContext, String rightContext, String newString){ String lcRegex = Pattern.quote(leftContext); String rcRegex = Pattern.quote(rightContext); String replace = Matcher.quoteReplacment(newString); Pattern pat = Pattern.compile("(" + lcRegex + ").*?(" + rcRegex + ")", Pattern.DOTALL); 

One more thing: if you do not do the processing after matching the matching text, you can use replaceAll instead of rewinding your own with appendReplacement and appendTail :

 return input.replaceAll("(?s)(" + lcRegex + ")" + "(?:(?!" + rcRegex + ").)*" + "(" + rcRegex + ")", "$1" + replace + "$2"); 
+2
source

Depending on what you really need, there are several possibilities.

You can add $ to the end of your regular expression, for example:

 "(" + leftContext + ")" + ".*" + "(" + rightContext + ")$" 

so if rightContext not the last, your regular expression will not match.

Then you can do everything after rightContext :

 "(" + leftContext + ")" + ".*" + "(" + rightContext + ")(.*)" 

and then drop everything in your third matching group.

But, since we donโ€™t know what really is leftContext and rightContext , maybe your problem lies with them.

+1
source

All Articles