Java Regex pattern matching efficiency for long string

I have a regex that works great (500 nanoseconds) when a match is found, but takes a long time (more than 3 seconds) when there is no match. I suspect this may be due to indentation. I tried several options, for example, converting .*in (.*)?based on some documentation, but that did not help.

Input: a very long string - 5k characters in some cases.

Regular expression: .*substring1.*substring2.*

I precompile the template and reuse the helper, what else can I try?

Here is my piece of code - I will call this method millions of different input lines, but only a few regex patterns.

private static HashMap<String, Pattern> patternMap = new HashMap<String, Pattern>();
private static HashMap<String, Matcher> matcherMap = new HashMap<String, Matcher>();

Here is my method:

public static Boolean regex_match(String line, String regex) {
    if (regex == null || line == null) {
      return null;
    }
    if (!patternMap.containsKey(regex)) {
      patternMap.put(regex, Pattern.compile(regex));
      matcherMap.put(regex,patternMap.get(regex).matcher(""));
    }
    return matcherMap.get(regex).reset(line).find(0);
 }
+4
4

, , . , .* , substring1. substring2. substring2 , .* , substring2 , . , substring1 , , substring2.

pattern.find(), .*. .* .*? , .

: substring1.*?substring2

+2

, , indexOf():

int pos1 = str.indexOf("substring1");
int pos2 = str.indexOf("substring2", pos1);

if(pos1 != -1 && pos2 != -1){
  // regex
}

, . , , , , . .* , , .

: substring1 substring2........50000 more characters......, .*?. , (.*)? .*?.

, . : substring1........50000 more characters...... substring2, .*, .

+2

String.indexOf() , Regex, , . :

public static boolean containsStrings(String source, String string1, String string2) {
  long pos1, pos2;
  pos1 = source.indexOf(string1);
  if(pos1 > -1) {
    pos2 = source.indexOf(string2,pos1 + string1.length);
    if(pos2 > pos1 && source.indexOf(string1,pos2 + string2.length) < -1) {
      return true;
    }
  }
  return false;
}

, , string2 string1, , .

+1

^((?!substring1).)*substring1((?!substring2).)*substring2.*?\Z

I have to do this because a line containing one substring several times, but not in order, will not push the ad away from nausea. You can delete. *? \ Z at the end, if you do not need a connector to complete at the end of input.

0
source

All Articles