.NET offers the Capture collection in its implementation of RegularExpression, so you can get all instances of this repeating group, not just the last instance. This is great, but I have a repeating group with subgroups, and I'm trying to get into the subgroups because they are connected in a group and cannot find a way. Any suggestions?
I looked at a number of other questions, for example:
- Select multiple items in regex
- Regex.NET attached group .
- How to get regular expression groups for a given capture?
but I did not find any answer either affirmative ("Yes, here's how"), or negative ("No, it can't be done").
For a far-fetched example, let's say that I have an input line:
abc dx 1 2 x 3 x 5 6 e fgh
where "abc" and "fgh" represent the text that I want to ignore in a larger document, "d" and "e" wrap the region of interest and within this region of interest "xn [n]" can be repeated any number of times. These are the pairs of numbers in the "x" areas that interest me.
So, I am parsing it with this regex pattern:
.*d (?<x>x ((?<fir>\d+) )?((?<sec>\d+) )?)*?e.*
which will find exactly one match in the document, but capture the group "x" many times. Here are three pairs that I would like to extract in this example:
but how can i get them? I could do the following (in C #):
using System; using System.Text; using System.Text.RegularExpressions; string input = "abc dx 1 2 x 3 x 5 6 e fgh"; string pattern = @".*d (?<x>x ((?<fir>\d+) )?((?<sec>\d+) )?)*?e.*"; foreach (var x in Regex.Match(input, pattern).Groups["x"].Captures) { MessageBox.Show(x.ToString()); }
and since I am referring to the group "x", I get the following lines:
But this does not lead me to the numbers themselves. Therefore, I could do "spruce" and "sec" independently, and not just "x":
using System; using System.Text; using System.Text.RegularExpressions; string input = "abc dx 1 2 x 3 x 5 6 e fgh"; string pattern = @".*d (?<x>x ((?<fir>\d+) )?((?<sec>\d+) )?)*?e.*"; Match m = Regex.Match(input, pattern); foreach (var f in m.Groups["fir"].Captures) { MessageBox.Show(f.ToString()); } foreach (var s in m.Groups["sec"].Captures) { MessageBox.Show(s.ToString()); }
To obtain:
but then I don’t know that this is the second pair that misses “4”, and not one of the other pairs.
So what to do? I know that I could easily parse this in C # or even with the second regular expression in the "x" group, but since the first launch of RegEx has already done all the work and the results that were known, it seems like this should be a way to manipulate Match object to get what I need.
And remember that this is a contrived example, the case of the real world is somewhat more complicated, so just throwing extra C # code at it will be a pain. But if existing .NET objects cannot do this, then I just need to know this, and I will continue on my way.
Thoughts?