How to translate this Perl regular expression in Java?

How would you translate this Perl regex into Java?

/pattern/i 

While compiling, it does not match "PattErn" for me, it fails

 Pattern p = Pattern.compile("/pattern/i"); Matcher m = p.matcher("PattErn"); System.out.println(m.matches()); // prints "false" 
+4
source share
3 answers

How would you translate this Perl regex into Java?

 /pattern/i 

You can not.

There are many reasons for this. Here are a few:

  • Java does not support as expressive a regular expression language as Perl. It lacks grapheme support (for example, \X) and full support for properties (for example, \p{Sentence_Break=SContinue} ), it does not have a Unicode character with names, it does not have an operator (?|...|...|) branch reset , does not have named capture groups or logical \x{...} escape prior to Java 7, does not have recursive regular expressions, etc. etc. etc. I could write a book about what is missing in Java: getting used to the primitive is very and inconvenient to use the regex engine with what you are used to.

  • Another even worse problem is that you have faux amis like, for example \w and \b and \s , and even \p{alpha} and \p{lower} , which behave differently in Java by Compared to Perl in some cases, Java versions are completely unusable and inefficient. Thats because Perl follows UTS # 18 , but before Java 7 Java did not. You must add the UNICODE_CHARACTER_CLASSES flag from Java 7 so that they no longer break. If you can't use Java 7, give up now because there were many many other Unicode errors in Java before Java 7, and itโ€™s just not worth it to handle them.

  • Java processes strings using ^ and $ and . but Perl expects Unicode strings to be \R You must look at UNIX_LINES to understand what is going on there.

  • Java by default does not use any random Unicode format. Be sure to add the UNICODE_CASE flag to your compilation. Otherwise, you will not receive such things as various Greek sigma, all coinciding with each other.

  • Finally, this is different from the fact that at best, Java only makes a simple framework, while Perl always does the full phrase. This means that you will not get \xDF so that it does not case-sensitive "SS" in Java, and similar related problems.

In general, the closest you can get is compilation with flags

  CASE_INSENSITIVE | UNICODE_CASE | UNICODE_CHARACTER_CLASSES 

which is equivalent to the built-in "(?iuU)" in the template string.

And remember that matching in Java does not mean matching as much as possible.


EDIT

And here is the rest of the story ...

While compiling, it does not match "PattErn" for me, it fails

  Pattern p = Pattern.compile("/pattern/i"); Matcher m = p.matcher("PattErn"); System.out.println(m.matches()); // prints "false" 

You should not have traces around the template.

The best you can do is translate

 $line = "I have your PaTTerN right here"; if ($line =~ /pattern/i) { print "matched.\n"; } 

in this way

 import java.util.regex.*; String line    = "I have your PaTTerN right here"; String pattern = "pattern"; Pattern regcomp = Pattern.compile(pattern, CASE_INSENSITIVE                                        | UNICODE_CASE                // comment next line out for legacy Java \b\w\s breakage                                        | UNICODE_CHARACTER_CLASSES                                 ); Matcher regexec = regcomp.matcher(line); if (regexec.find()) {    System.out.println("matched"); } 

There, look how much easier it is not? :)

Another thing you lose with Java, because Java really doesnโ€™t know the regex from a double-linked list from a hole in the head, compiling templates at compile time . I always found compilation time the best time to compile, but try saying Java. Java makes it very difficult to understand that it is a very simple measure for checking a program, what you really need to do in every program all the time. This design flaw is a royal pain in the butt, because halfway through your program you are making an exception for something that should have been detected at compile time when the compilation of the rest of your program was compiled. Almost as annoying as adultery of coitus, because you were on your way to doing your business, and BANG is all ruined.

I have not implemented a solution to this annoying annoyance in my code above, but you can fake it with some static initialization.

+13
source

Pearl equivalent:

 /pattern/i 

in Java will be:

 Pattern p = Pattern.compile("(?i)pattern"); 

Or just do:

 System.out.println("PattErn".matches("(?i)pattern")); 

Note that "string".matches("pattern") checks the pattern for the entire input string. In other words, the following returns false:

 "foo pattern bar".matches("pattern") 
+1
source

Java regex has no delimiters and uses a separate argument for modifications:

  Pattern p = Pattern.compile("pattern", Pattern.CASE_INSENSITIVE); 
+1
source

All Articles