Cannot match string using regex

I am working on some regex and I wonder why this regex

"(?<=(.*?id(( *)=)\\s[\"\']))g" 

does not match the string

 <input id = "g" /> 

in java?

+7
java regex
source share
4 answers

Not only does Java not allow unlimited lookbehind, it should throw an exception if you try. The fact that you do not see this exception is a mistake in itself .

In any case, you should not use lookbehind. If you want to map the value of a specific attribute, the easiest, least unpleasant approach is to map the entire attribute and use the capture group to retrieve the value. For example:

 String source = "<input id = \"g\" />"; Pattern p = Pattern.compile("\\bid\\s*=\\s*\"([^\"]*)\""); Matcher m = p.matcher(source); if (m.find()) { System.out.printf("Found 'id' attribute '%s' at position %d%n", m.group(1), m.start()); } 

Output:

 Found 'id' attribute 'g' at position 7 

Do yourself a favor and forget a little about the looks. They are complex, even if they are not buggy, and they are really not as useful as you might expect.

+2
source share

Java.util.regex does not support infinite appearance, as described in RegexBuddy :

The bad news is that most regular expressions flavors do not allow any regular expression to be used inside lookbehind, because they cannot apply the regular expression in the opposite direction. Therefore, the regular expression engine should be able to figure out how many steps to back off before checking lookbehind.

To add a little explanation from the documentation:

Therefore, many flavors of regular expressions, including those used by Perl and Python, allow only a fixed string length. You can use any regular expression whose length can be equal to a predefined value. This means that you can use literal text and character classes. You cannot use repetition or optional Items. You can use striping, but only if all parameters in the striping are the same length.

Some regular expressions, such as PCRE and Java, support the above, plus alternating with chains of different lengths. Each part of the rotation should still have a finite maximum length. This means that you still cannot use a star or plus, but you can use a question mark and curly braces with the specified maximum parameter. These regular expression flavors recognize the fact that the final repetition can be rewritten as alternating lines with different but fixed lengths. Unfortunately, JDK 1.4 and 1.5 have some errors when you use striping inside lookbehind. These were fixed in JDK 1.6.

+6
source share

So, a few people have explained why your regex doesn't work (and it's really fatal, Java regexes can't do what you need). However, you may be wondering how you should now parse this ...

It looks like the string you are trying to parse is XML. Regex is really not suitable for parsing XML; there is a mismatch between what can be encoded in XML and what can be matched using regular expressions. Therefore, if this is part of some XML text, perhaps consider splitting it into an XML parser, which you can then query for different elements.

For a calm and reasonable discussion of this issue, see this classic stackoverflow: RegEx post on matching open tags except XHTML stand-alone tags .

Hope this helps!

+2
source share

java.util.regex does not support endless repetition inside lookbehind

0
source share

All Articles