In .NET RegEx, can I get a collection of groups from a Capture object?

.NET offers the Capture collection in its implementation of RegularExpression, so you can get all instances of this repeating group, not just the last instance. This is great, but I have a repeating group with subgroups, and I'm trying to get into the subgroups because they are connected in a group and cannot find a way. Any suggestions?

I looked at a number of other questions, for example:

  • Select multiple items in regex
  • Regex.NET attached group .
  • How to get regular expression groups for a given capture?

but I did not find any answer either affirmative ("Yes, here's how"), or negative ("No, it can't be done").

For a far-fetched example, let's say that I have an input line:

abc dx 1 2 x 3 x 5 6 e fgh 

where "abc" and "fgh" represent the text that I want to ignore in a larger document, "d" and "e" wrap the region of interest and within this region of interest "xn [n]" can be repeated any number of times. These are the pairs of numbers in the "x" areas that interest me.

So, I am parsing it with this regex pattern:

 .*d (?<x>x ((?<fir>\d+) )?((?<sec>\d+) )?)*?e.* 

which will find exactly one match in the document, but capture the group "x" many times. Here are three pairs that I would like to extract in this example:

  • 12
  • 3
  • 5, 6

but how can i get them? I could do the following (in C #):

 using System; using System.Text; using System.Text.RegularExpressions; string input = "abc dx 1 2 x 3 x 5 6 e fgh"; string pattern = @".*d (?<x>x ((?<fir>\d+) )?((?<sec>\d+) )?)*?e.*"; foreach (var x in Regex.Match(input, pattern).Groups["x"].Captures) { MessageBox.Show(x.ToString()); } 

and since I am referring to the group "x", I get the following lines:

  • x 1 2
  • x 3
  • x 5 6

But this does not lead me to the numbers themselves. Therefore, I could do "spruce" and "sec" independently, and not just "x":

 using System; using System.Text; using System.Text.RegularExpressions; string input = "abc dx 1 2 x 3 x 5 6 e fgh"; string pattern = @".*d (?<x>x ((?<fir>\d+) )?((?<sec>\d+) )?)*?e.*"; Match m = Regex.Match(input, pattern); foreach (var f in m.Groups["fir"].Captures) { MessageBox.Show(f.ToString()); } foreach (var s in m.Groups["sec"].Captures) { MessageBox.Show(s.ToString()); } 

To obtain:

  • one
  • 3
  • 5
  • 2
  • 6

but then I don’t know that this is the second pair that misses “4”, and not one of the other pairs.

So what to do? I know that I could easily parse this in C # or even with the second regular expression in the "x" group, but since the first launch of RegEx has already done all the work and the results that were known, it seems like this should be a way to manipulate Match object to get what I need.

And remember that this is a contrived example, the case of the real world is somewhat more complicated, so just throwing extra C # code at it will be a pain. But if existing .NET objects cannot do this, then I just need to know this, and I will continue on my way.

Thoughts?

+8
regex capture
source share
4 answers

I do not know how to completely build a solution and could not find it after a quick search, but this does not exclude the possibility that it exists.

My best suggestion is to use the Index and Length properties to find matching grips. It doesn't seem very elegant, but you could come up with some pretty good code after writing some extension methods.

 var input = "abc dx 1 2 x 3 x 5 6 e fgh"; var pattern = @".*d (?<x>x ((?<fir>\d+) )?((?<sec>\d+) )?)*?e.*"; var match = Regex.Match(input, pattern); var xs = match.Groups["x"].Captures.Cast<Capture>(); var firs = match.Groups["fir"].Captures.Cast<Capture>(); var secs = match.Groups["sec"].Captures.Cast<Capture>(); Func<Capture, Capture, Boolean> test = (inner, outer) => (inner.Index >= outer.Index) && (inner.Index < outer.Index + outer.Length); var result = xs.Select(x => new { Fir = firs.FirstOrDefault(f => test(f, x)), Sec = secs.FirstOrDefault(s => test(s, x)) }) .ToList(); 

Here is one possible solution using the following extension method.

 internal static class Extensions { internal static IEnumerable<Capture> GetCapturesInside(this Match match, Capture capture, String groupName) { var start = capture.Index; var end = capture.Index + capture.Length; return match.Groups[groupName] .Captures .Cast<Capture>() .Where(inner => (inner.Index >= start) && (inner.Index < end)); } } 

Now you can rewrite the code as follows.

 var input = "abc dx 1 2 x 3 x 5 6 e fgh"; var pattern = @".*d (?<x>x ((?<fir>\d+) )?((?<sec>\d+) )?)*?e.*"; var match = Regex.Match(input, pattern); foreach (Capture x in match.Groups["x"].Captures) { var fir = match.GetCapturesInside(x, "fir").SingleOrDefault(); var sec = match.GetCapturesInside(x, "sec").SingleOrDefault(); } 
+5
source share

Will there always be a pair versus a single? You can use separate capture groups. Of course, you lose order of items using this method.

 var input = "abc dx 1 2 x 3 x 5 6 e fgh"; var re = new Regex(@"d\s(?<x>x\s((?<pair>\d+\s\d+)|(?<single>\d+))\s)*e"); var m = re.Match(input); foreach (Capture s in m.Groups["pair"].Captures) { Console.WriteLine(s.Value); } foreach (Capture s in m.Groups["single"].Captures) { Console.WriteLine(s.Value); } 1 2 5 6 3 

If you need an order, I would probably go with Blam's suggestion to use a second regex.

+3
source share

I suggest you study the .net regex unique for Balanced Groups.

Here is a regular expression that uses this to stop a match when a group (either not a digit or X) is found to close the group. Then access to the mock-ups is carried out using grips as necessary:

 string data = "abc dx 1 2 x 3 x 5 6 e fgh"; string pattern = @"(?xn) # Specify options in the pattern # x - to comment (IgnorePatternWhitespace) # n - Explicit Capture to ignore non named matches (?<X>x) # Push the X on the balanced group ((\s)(?<Numbers>\d+))+ # Load up on any numbers into the capture group (?(Paren)(?!)) # Stop any match that has an X #(the end of the balance group)"; var results = Regex.Matches(data, pattern) .OfType<Match>() .Select ((mt, index) => string.Format("Match {0}: {1}", index, string.Join(", ", mt.Groups["Numbers"] .Captures .OfType<Capture>() .Select (cp => cp.Value)))) ; results.ToList() .ForEach( result => Console.WriteLine ( result )); /* Output Match 0: 1, 2 Match 1: 3 Match 2: 5, 6 */ 
+2
source share

I saw OmegaMan answer and know that you prefer C # code instead of regex solution. But I still wanted to introduce one alternative.

In .NET, you can reuse named groups. Each time something is captured with this group, it is pushed onto the stack (as OmegaMan referred to "balancing groups"). You can use this to push an empty stack onto the stack for each x found:

 string pattern = @"d (?<x>x(?<d>) (?:(?<d>\d+) )*)*e"; 

So now, after matching x , (?<d>) pops an empty stack onto the stack. Here is the output of Console.WriteLine (one line for each capture):

 1 2 3 5 6 

Therefore, when you go through Regex.Match(input, pattern).Groups["d"].Captures and notice the empty lines, you know that a new group of numbers has begun.

+1
source share

All Articles