Is there an equivalent java.util.regex for glob type templates?

Is there a standard (preferably Apache Commons or similar non-virus) library for doing glob matches in Java? When I had to do this in Perl once, I just changed everything “ . ” To “ \. ”, “ * ” To “ .* ” And “ ? ” To “ . ” And the like, but I wonder if anyone did -Never work for me.

Similar question: Create regex from glob expression

+68
java glob
Aug 08 '09 at 2:13
source share
11 answers

There is nothing built-in, but it's pretty easy to convert something global to a regular expression:

 public static String createRegexFromGlob(String glob) { String out = "^"; for(int i = 0; i < glob.length(); ++i) { final char c = glob.charAt(i); switch(c) { case '*': out += ".*"; break; case '?': out += '.'; break; case '.': out += "\\."; break; case '\\': out += "\\\\"; break; default: out += c; } } out += '$'; return out; } 

this works for me, but I'm not sure if it covers the global "standard", if any :)

Update from Paul Tomblin: I found a perl program that does the glob conversion, and adapting it to Java, I get:

  private String convertGlobToRegEx(String line) { LOG.info("got line [" + line + "]"); line = line.trim(); int strLen = line.length(); StringBuilder sb = new StringBuilder(strLen); // Remove beginning and ending * globs because they're useless if (line.startsWith("*")) { line = line.substring(1); strLen--; } if (line.endsWith("*")) { line = line.substring(0, strLen-1); strLen--; } boolean escaping = false; int inCurlies = 0; for (char currentChar : line.toCharArray()) { switch (currentChar) { case '*': if (escaping) sb.append("\\*"); else sb.append(".*"); escaping = false; break; case '?': if (escaping) sb.append("\\?"); else sb.append('.'); escaping = false; break; case '.': case '(': case ')': case '+': case '|': case '^': case '$': case '@': case '%': sb.append('\\'); sb.append(currentChar); escaping = false; break; case '\\': if (escaping) { sb.append("\\\\"); escaping = false; } else escaping = true; break; case '{': if (escaping) { sb.append("\\{"); } else { sb.append('('); inCurlies++; } escaping = false; break; case '}': if (inCurlies > 0 && !escaping) { sb.append(')'); inCurlies--; } else if (escaping) sb.append("\\}"); else sb.append("}"); escaping = false; break; case ',': if (inCurlies > 0 && !escaping) { sb.append('|'); } else if (escaping) sb.append("\\,"); else sb.append(","); break; default: escaping = false; sb.append(currentChar); } } return sb.toString(); } 

I am editing this answer instead of making my own, because this answer has set me on the right path.

+33
Aug 08 '09 at 11:35
source share
— -

Globalization is also planned for , implemented in Java 7.

See FileSystem.getPathMatcher(String) and the Find Files tutorial .

+43
Aug 10 '09 at 10:08
source share

Thanks to everyone for their input. I wrote a more complete conversion than any of the previous answers:

 /** * Converts a standard POSIX Shell globbing pattern into a regular expression * pattern. The result can be used with the standard {@link java.util.regex} API to * recognize strings which match the glob pattern. * <p/> * See also, the POSIX Shell language: * http://pubs.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02_13_01 * * @param pattern A glob pattern. * @return A regex pattern to recognize the given glob pattern. */ public static final String convertGlobToRegex(String pattern) { StringBuilder sb = new StringBuilder(pattern.length()); int inGroup = 0; int inClass = 0; int firstIndexInClass = -1; char[] arr = pattern.toCharArray(); for (int i = 0; i < arr.length; i++) { char ch = arr[i]; switch (ch) { case '\\': if (++i >= arr.length) { sb.append('\\'); } else { char next = arr[i]; switch (next) { case ',': // escape not needed break; case 'Q': case 'E': // extra escape needed sb.append('\\'); default: sb.append('\\'); } sb.append(next); } break; case '*': if (inClass == 0) sb.append(".*"); else sb.append('*'); break; case '?': if (inClass == 0) sb.append('.'); else sb.append('?'); break; case '[': inClass++; firstIndexInClass = i+1; sb.append('['); break; case ']': inClass--; sb.append(']'); break; case '.': case '(': case ')': case '+': case '|': case '^': case '$': case '@': case '%': if (inClass == 0 || (firstIndexInClass == i && ch == '^')) sb.append('\\'); sb.append(ch); break; case '!': if (firstIndexInClass == i) sb.append('^'); else sb.append('!'); break; case '{': inGroup++; sb.append('('); break; case '}': inGroup--; sb.append(')'); break; case ',': if (inGroup > 0) sb.append('|'); else sb.append(','); break; default: sb.append(ch); } } return sb.toString(); } 

And the module checks if this works:

 /** * @author Neil Traft */ public class StringUtils_ConvertGlobToRegex_Test { @Test public void star_becomes_dot_star() throws Exception { assertEquals("gl.*b", StringUtils.convertGlobToRegex("gl*b")); } @Test public void escaped_star_is_unchanged() throws Exception { assertEquals("gl\\*b", StringUtils.convertGlobToRegex("gl\\*b")); } @Test public void question_mark_becomes_dot() throws Exception { assertEquals("gl.b", StringUtils.convertGlobToRegex("gl?b")); } @Test public void escaped_question_mark_is_unchanged() throws Exception { assertEquals("gl\\?b", StringUtils.convertGlobToRegex("gl\\?b")); } @Test public void character_classes_dont_need_conversion() throws Exception { assertEquals("gl[-o]b", StringUtils.convertGlobToRegex("gl[-o]b")); } @Test public void escaped_classes_are_unchanged() throws Exception { assertEquals("gl\\[-o\\]b", StringUtils.convertGlobToRegex("gl\\[-o\\]b")); } @Test public void negation_in_character_classes() throws Exception { assertEquals("gl[^an!pz]b", StringUtils.convertGlobToRegex("gl[!an!pz]b")); } @Test public void nested_negation_in_character_classes() throws Exception { assertEquals("gl[[^an]!pz]b", StringUtils.convertGlobToRegex("gl[[!an]!pz]b")); } @Test public void escape_carat_if_it_is_the_first_char_in_a_character_class() throws Exception { assertEquals("gl[\\^o]b", StringUtils.convertGlobToRegex("gl[^o]b")); } @Test public void metachars_are_escaped() throws Exception { assertEquals("gl..*\\.\\(\\)\\+\\|\\^\\$\\@\\%b", StringUtils.convertGlobToRegex("gl?*.()+|^$@%b")); } @Test public void metachars_in_character_classes_dont_need_escaping() throws Exception { assertEquals("gl[?*.()+|^$@%]b", StringUtils.convertGlobToRegex("gl[?*.()+|^$@%]b")); } @Test public void escaped_backslash_is_unchanged() throws Exception { assertEquals("gl\\\\b", StringUtils.convertGlobToRegex("gl\\\\b")); } @Test public void slashQ_and_slashE_are_escaped() throws Exception { assertEquals("\\\\Qglob\\\\E", StringUtils.convertGlobToRegex("\\Qglob\\E")); } @Test public void braces_are_turned_into_groups() throws Exception { assertEquals("(glob|regex)", StringUtils.convertGlobToRegex("{glob,regex}")); } @Test public void escaped_braces_are_unchanged() throws Exception { assertEquals("\\{glob\\}", StringUtils.convertGlobToRegex("\\{glob\\}")); } @Test public void commas_dont_need_escaping() throws Exception { assertEquals("(glob,regex),", StringUtils.convertGlobToRegex("{glob\\,regex},")); } } 
+19
Jun 28 '13 at 16:58
source share

There are several libraries that perform Glob pattern matching that are more modern than the ones listed:

Theres Ants Directory Scanner And Springs AntPathMatcher

I recommend it as other solutions, as Ant Globbing Style has largely become the standard glob syntax in the Java world (Hudson, Spring, Ant and I think Maven).

+8
Oct 27 2018-10-27
source share

This is a simple Glob implementation that handles * and? in the template

 public class GlobMatch { private String text; private String pattern; public boolean match(String text, String pattern) { this.text = text; this.pattern = pattern; return matchCharacter(0, 0); } private boolean matchCharacter(int patternIndex, int textIndex) { if (patternIndex >= pattern.length()) { return false; } switch(pattern.charAt(patternIndex)) { case '?': // Match any character if (textIndex >= text.length()) { return false; } break; case '*': // * at the end of the pattern will match anything if (patternIndex + 1 >= pattern.length() || textIndex >= text.length()) { return true; } // Probe forward to see if we can get a match while (textIndex < text.length()) { if (matchCharacter(patternIndex + 1, textIndex)) { return true; } textIndex++; } return false; default: if (textIndex >= text.length()) { return false; } String textChar = text.substring(textIndex, textIndex + 1); String patternChar = pattern.substring(patternIndex, patternIndex + 1); // Note the match is case insensitive if (textChar.compareToIgnoreCase(patternChar) != 0) { return false; } } // End of pattern and text? if (patternIndex + 1 >= pattern.length() && textIndex + 1 >= text.length()) { return true; } // Go on to match the next character in the pattern return matchCharacter(patternIndex + 1, textIndex + 1); } } 
+5
Nov 13 '09 at 9:58
source share

I recently had to do this and use \Q and \E to avoid the glob pattern:

 private static Pattern getPatternFromGlob(String glob) { return Pattern.compile( "^" + Pattern.quote(glob) .replace("*", "\\E.*\\Q") .replace("?", "\\E.\\Q") + "$"); } 
+5
Sep 01 '10 at 14:25
source share

GlobCompiler / GlobEngine , from Jakarta ORO , looks promising. It is available under the Apache license.

+3
Aug 08 '09 at 2:25
source share

Like Tony Edgecombe's answer , here's a short and simple globber that supports * and ? without using regex if someone needs it.

 public static boolean matches(String text, String glob) { String rest = null; int pos = glob.indexOf('*'); if (pos != -1) { rest = glob.substring(pos + 1); glob = glob.substring(0, pos); } if (glob.length() > text.length()) return false; // handle the part up to the first * for (int i = 0; i < glob.length(); i++) if (glob.charAt(i) != '?' && !glob.substring(i, i + 1).equalsIgnoreCase(text.substring(i, i + 1))) return false; // recurse for the part after the first *, if any if (rest == null) { return glob.length() == text.length(); } else { for (int i = glob.length(); i <= text.length(); i++) { if (matches(text.substring(i), rest)) return true; } return false; } } 
+3
Sep 10 '10 at 17:57
source share

I do not know about the "standard" implementation, but I know about the sourceforge project, released under the BSD license, which implemented glob mapping for files. It implemented in a single file , perhaps you can adapt it to your requirements.

+2
Aug 08 '09 at 2:25
source share

Once upon a time I did massive globe-based text filtering, so I wrote a small piece of code (15 lines of code, no dependencies outside of the JDK). It only handles "*" (it was enough for me), but can be easily expanded to "?". It is several times faster than a precompiled regular expression, does not require preliminary compilation (in essence, this is a comparison of strings and strings with each match of the pattern).

the code:

  public static boolean miniglob(String[] pattern, String line) { if (pattern.length == 0) return line.isEmpty(); else if (pattern.length == 1) return line.equals(pattern[0]); else { if (!line.startsWith(pattern[0])) return false; int idx = pattern[0].length(); for (int i = 1; i < pattern.length - 1; ++i) { String patternTok = pattern[i]; int nextIdx = line.indexOf(patternTok, idx); if (nextIdx < 0) return false; else idx = nextIdx + patternTok.length(); } if (!line.endsWith(pattern[pattern.length - 1])) return false; return true; } } 

Using:

  public static void main(String[] args) { BufferedReader in = new BufferedReader(new InputStreamReader(System.in)); try { // read from stdin space separated text and pattern for (String input = in.readLine(); input != null; input = in.readLine()) { String[] tokens = input.split(" "); String line = tokens[0]; String[] pattern = tokens[1].split("\\*+", -1 /* want empty trailing token if any */); // check matcher performance long tm0 = System.currentTimeMillis(); for (int i = 0; i < 1000000; ++i) { miniglob(pattern, line); } long tm1 = System.currentTimeMillis(); System.out.println("miniglob took " + (tm1-tm0) + " ms"); // check regexp performance Pattern reptn = Pattern.compile(tokens[1].replace("*", ".*")); Matcher mtchr = reptn.matcher(line); tm0 = System.currentTimeMillis(); for (int i = 0; i < 1000000; ++i) { mtchr.matches(); } tm1 = System.currentTimeMillis(); System.out.println("regexp took " + (tm1-tm0) + " ms"); // check if miniglob worked correctly if (miniglob(pattern, line)) { System.out.println("+ >" + line); } else { System.out.println("- >" + line); } } } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } 

Copy / paste from here

0
May 03 '10 at 15:42
source share

By the way, it seems like you did it in Perl

This does the trick in Perl:

 my @files = glob("*.html") # Or, if you prefer: my @files = <*.html> 
-one
Sep 01 '09 at 1:43
source share



All Articles