Regular expressions Match multiple single characters
In this line:
"<0> <1 → <2 → <3 <4>"
I want to match all instances of "<\ d {1,2}>", except for those that I escaped with an extra set of triangular brackets, for example, I want to match 0,2,3,4, but not 1, for example:
" <0> <1 → <2 >> <3> 4> "
I want to do this in one regex, but the best I can get is:
(^ | [^ \ <]) \ <(<? 1> \ d {1,2})> ([^>] | $)
Which will correspond to 0,3,4, but not 2, for example:
" <0> <1 → <2 → <3> <4> "
Does anyone know how this can be done with a single regex?
Assuming with a set of input
"<0> <<1>> <2>> <3> <4><<5>" we want to match 0, 2, 3, 4 and 5.
The problem is that you need to use zero width in the opposite direction and with zero width, but there are three cases for matching: '<', '>' and '', and one does not match '<>'. Also, if you want to get highlighted expressions so that you can assign a match to an array, you need to avoid labeling things you don't need. So I ended up with no elegant
use Data::Dumper; my $a = "<0> <<1>> <2>> <3> <4><<5>"; my $brace_pair = qr/<[^<>]+>/; my @matches = $a =~ /(?:(?<!<)$brace_pair(?!>))|(?:$brace_pair(?!>))|(?:(?<!<)$brace_pair)/g; print Dumper(\@a); If you want to insert this into one expression - you could.
If you use a regular expression flavor (like Java) that supports search queries but not conditional expressions, here is another approach:
(?=(<\d{1,2}>))(?!(?<=<)\1(?=>))\1 The first lookahead ensures that you are at the beginning of the tag and grab it for later use. The subexpression in the second lookahead matches the tag again, but only if it is preceded by < , and then > . By making it a negative look, you get the semantics of NOT (x AND y) that you are looking for. Finally, the second \1 matches the tag again, this time for the real one (i.e. Not in search).
By the way, I could use > instead of (?=>) In the second view, but I think this way is easier to read and express my intentions better.