Alternative RegEx engine for .NET supporting recursion

I ran into a parsing problem that would solve a fairly small regular expression, except that the template works, it should be recursive. Example:

{([^{}]*(?:{(?1)})?) 

What I want it to match is a specific RTF header, but for this I need it to be recursive.

 {\rtf1\ansi\ansicpg1252\deff0\deflang1031{\fonttbl{\f0\fnil\fcharset0 Tahoma;}} 

Is there some kind of invalid version for RegEx-like engines for .NET that would allow finding matches with these types of templates (possibly even with different syntax)?

Update:

I really appreciate everyone who informed me about the Balancing Group in the .NET regex implementation, especially Qtax , which provided a very comprehensive link as a comment below, which helped me figure this out and also post the answer to my specific example. If you are reading this, and it also helped you, be sure to return this answer.
However ... This did not answer the general question about the possibility of recursion in a .NET-Regex-like engine. This example, fortunately (I like problems), is far from the only one I met. And other situations cannot be resolved using this solution, but only because of the ability to refer not to a match, but to reusing the template sequence to the point at which recursion would be possible.

+4
source share
2 answers

In your example, using a balancing group will work.

You can use an expression like:

 { [^{}]* (?:({)[^{}]*)* (?'-1'})* (?(1)(?!)) } 

Example:

 string re = @"{[^{}]*(?:({)[^{}]*)*(?'-1'})*(?(1)(?!))}"; string str = "foo {bar} baz {foo{bar{baz}}} {f{o{o}}{bar}baz} {foo{bar}baz}"; Console.WriteLine("Input: \"{0}\"", str); foreach (Match m in Regex.Matches(str, re)) { Console.WriteLine("Match: \"{0}\"", m); } 

Output:

 Input: "foo {bar} baz {foo{bar{baz}}} {f{o{o}}{bar}baz} {foo{bar}baz}" Match: "{bar}" Match: "{foo{bar{baz}}}" Match: "{o{o}}" Match: "{bar}" Match: "{bar}" 
+3
source

Even the Qtax example is very nice and straightforward, it does not fully correspond to me, because it returns {o{o}} instead of {f{o{o}}{bar}baz} .

After finding the time, my solution (using almost the same example):

Entrance:

 string re = @"{(((?<Counter>{)*[^{}]*)*((?<-Counter>})*[^{}]*)*)*(?(Counter)(?!))}"; string str = "foo {bar} baz {foo{bar{{baz}a{a{b}}}}} {f{o{o}}{bar{a{b{c}}{d}}}baz} {foo{bar}baz}"; Console.WriteLine("Input: \"{0}\"", str); foreach (Match m in Regex.Matches(str, re)) { Console.WriteLine("Match: \"{0}\"", m); } 

Output:

 Input: "foo {bar} baz {foo{bar{{baz}a{a{b}}}}} {f{o{o}}{bar{a{b{c}}{d}}}baz} {foo{bar}baz}" Match: "{bar}" Match: "{foo{bar{{baz}a{a{b}}}}}" Match: "{f{o{o}}{bar{a{b{c}}{d}}}baz}" Match: "{foo{bar}baz}" 

Some explanation, I increase the counter for each { and decrease the counter on each } . And finally, the regex matches only if the counter is empty ( (?(Counter)(?!)) ).

This seems to work for deep recursion, as well as with an alternative bracket.

See this site , which will also help me create this regular expression.

Hope this helps.

PS: If you want to match the string with the forgotten one} at the end, use:

 string re = @"{(((?<Counter>{)*[^{}]*)*((?<-Counter>(}|$))*[^{}]*)*)*(?(Counter)(?!))(}|$)"; string str = "foo {bar} baz {foo{bar{{baz}a{a{b}}}}} {f{o{o}}{bar{a{b{c}}{d}}}baz} {foo{bar}b{az"; 
+3
source

All Articles