Playing with regular expressions, especially a balanced .NET flavor match, I came to the conclusion that I did not understand how the internal engine works as well as I thought. I would appreciate any evidence on why my templates behave the way they do! But the fist ...
Disclaimer: This question is purely theoretical, and any result obtained here will never be used, modified, or used in production code for HTML analysis. Ever. I promise. I'm scared of the pony. =)
Now to my problem. I will try to match the letter A if it is not surpassed by # . To demonstrate, I always use the string ..A..#..A.. Here the first A must be matched. Of course, this is a fairly simple task using "A(?<!^.*#.*)" , But I want to use the conventions here, as they can be used for balanced comparisons and other interesting things.
What I tried
"A(?<=^(#(?<q>)|[^#])*(?(q)(?!)))"
The way I interpret this: when the engine collides with “A”, it goes back to the beginning of the line, and for each character add an empty match to the capture group q if the character is #. Then it must fail if q contains a match. I do not understand why this expression matches as in my sample line.
When I just delete lookbehind and match the entire string, this works:
"^(
matches the entire line up to first A, even if the first group quantifier is greedy. Inserting a "#" at the beginning will also result in a match failure (optional).
So: how to browse groups called capture groups inside them and conditional expressions together?
Thanks!
Edit: This problem is easier to see in (?<=(?<q>)(?(q)(?!))). , which does not have to match any character, but matches all.
c # regex theory
Jens
source share