Wildcard Matching in Java

I am writing a simple debugging program that takes as input simple strings that may contain stars pointing to a wildcard - any

*.wav // matches <anything>.wav (*, a) // matches (<anything>, a) 

I thought I would just take this pattern, avoid any special regular expression characters in it, and then replace any \\* with .* . And then use regex.

But I can not find any Java function to avoid regex. The best match I could find was Pattern.quote , which, however, just puts \Q and \E at the beginning and end of the line.

Is there anything in Java that allows you to simply perform wildcard matching without running the algorithm from scratch?

+7
java regex wildcard
source share
6 answers

Using Simple Regular Expression

One of the advantages of this method is that we can easily add tokens in addition to * (see Adding tokens below).

Search: [^*]+|(\*)

  • Left side | matches any characters that are not asterisks
  • The right side captures all the stars in group 1
  • If group 1 is empty: replace it with \Q + Match + E
  • If group 1 is set: replace with .*

Here is the working code (see the output of the online demo ).

Input: audio*2012*.wav

Output: \Qaudio\E.*\Q2012\E.*\Q.wav\E

 String subject = "audio*2012*.wav"; Pattern regex = Pattern.compile("[^*]+|(\\*)"); Matcher m = regex.matcher(subject); StringBuffer b= new StringBuffer(); while (m.find()) { if(m.group(1) != null) m.appendReplacement(b, ".*"); else m.appendReplacement(b, "\\\\Q" + m.group(0) + "\\\\E"); } m.appendTail(b); String replaced = b.toString(); System.out.println(replaced); 

Adding Tokens

Suppose we also want to convert a wildcard ? , which denotes one character, dot. We simply add a capture group to the regular expression and exclude it from matchall on the left:

Search: [^*?]+|(\*)|(\?)

In the replacement function, add something like:

 else if(m.group(2) != null) m.appendReplacement(b, "."); 
+8
source share

Just avoid everything - there will be no harm.

  String input = "*.wav"; String regex = ("\\Q" + input + "\\E").replace("*", "\\E.*\\Q"); System.out.println(regex); // \Q\E.*\Q.wav\E System.out.println("abcd.wav".matches(regex)); // true 

Or you can use character classes:

  String input = "*.wav"; String regex = input.replaceAll(".", "[$0]").replace("[*]", ".*"); System.out.println(regex); // .*[.][w][a][v] System.out.println("abcd.wav".matches(regex)); // true 

It is easier to β€œrun away” from the characters by placing them in the character class, since almost all characters lose their special meaning when in the character class. If you do not expect strange file names, this will work.

+13
source share

There is a small utility method in the Apache Commons-IO library: org.apache.commons.io.FilenameUtils # wildcardMatch (), which you can use without the complexity of a regular expression.

API documentation can be found at: https://commons.apache.org/proper/commons-io/javadocs/api-2.5/org/apache/commons/io/FilenameUtils.html#wildcardMatch(java.lang.String,% 20java.lang.String)

+4
source share

You can also use Quotation escape characters: \\Q and \\E - everything between them is treated as a literal and is not considered part of the regular expression that needs to be evaluated. So this code should work:

  String input = "*.wav"; String regex = "\\Q" + input.replace("*", "\\E.*?\\Q") + "\\E"; // regex = "\\Q\\E.*?\\Q.wav\\E" 

Note that your * wildcard can also be best matched only with word characters, using \ w depending on how you want your wildcard to behave (?)

+1
source share

Lucene has classes that provide this feature, with additional backslash support as an escape character. ? matches one character, 1 matches 0 or more characters, \ executes the next character. Supports Unicode code codes. It was supposed to be fast, but I did not test.

 CharacterRunAutomaton characterRunAutomaton; boolean matches; characterRunAutomaton = new CharacterRunAutomaton(WildcardQuery.toAutomaton(new Term("", "Walmart"))); matches = characterRunAutomaton.run("Walmart"); // true matches = characterRunAutomaton.run("Wal*mart"); // false matches = characterRunAutomaton.run("Wal\\*mart"); // false matches = characterRunAutomaton.run("Waldomart"); // false characterRunAutomaton = new CharacterRunAutomaton(WildcardQuery.toAutomaton(new Term("", "Wal*mart"))); matches = characterRunAutomaton.run("Walmart"); // true matches = characterRunAutomaton.run("Wal*mart"); // true matches = characterRunAutomaton.run("Wal\\*mart"); // true matches = characterRunAutomaton.run("Waldomart"); // true characterRunAutomaton = new CharacterRunAutomaton(WildcardQuery.toAutomaton(new Term("", "Wal\\*mart"))); matches = characterRunAutomaton.run("Walmart"); // false matches = characterRunAutomaton.run("Wal*mart"); // true matches = characterRunAutomaton.run("Wal\\*mart"); // false matches = characterRunAutomaton.run("Waldomart"); // false 
0
source share

Regex when placing DOS / Windows path

Implementing Quotation \Q and \E escape characters is probably the best approach. However, since the backslash is typically used as a DOS / Windows file delimiter, the sequence " \E " in the path can affect the pairing of \Q and \E When accounting for wildcard tokens * and ? this backslash situation can be resolved as follows:

Search: [^*?\\]+|(\*)|(\?)|(\\)

In the function of replacing the element β€œUse a simple example” two new lines were added to accommodate a new search template. The code will still be Linux-friendly. As a method, it can be written as follows:

 public String wildcardToRegex(String wildcardStr) { Pattern regex=Pattern.compile("[^*?\\\\]+|(\\*)|(\\?)|(\\\\)"); Matcher m=regex.matcher(wildcardStr); StringBuffer sb=new StringBuffer(); while (m.find()) { if(m.group(1) != null) m.appendReplacement(sb, ".*"); else if(m.group(2) != null) m.appendReplacement(sb, "."); else if(m.group(3) != null) m.appendReplacement(sb, "\\\\\\\\"); else m.appendReplacement(sb, "\\\\Q" + m.group(0) + "\\\\E"); } m.appendTail(sb); return sb.toString(); } 

The code to demonstrate the implementation of this method can be written as follows:

 String s = "C:\\Temp\\Extra\\audio??2012*.wav"; System.out.println("Input: "+s); System.out.println("Output: "+wildcardToRegex(s)); 

These will be the generated results:

 Input: C:\Temp\Extra\audio??2012*.wav Output: \QC:\E\\\QTemp\E\\\QExtra\E\\\Qaudio\E..\Q2012\E.*\Q.wav\E 
0
source share

All Articles