Regular expression with non-exciting group in C #

I am using the following regex

JOINTS.*\s*(?:(\d*\s*\S*\s*\S*\s*\S*)\r\n\s*)* 

for the following data type:

  JOINTS DISPL.-X DISPL.-Y ROTATION 1 0.000000E+00 0.975415E+01 0.616921E+01 2 0.000000E+00 0.000000E+00 0.000000E+00 

The idea is to extract two groups, each of which contains a string (starting with Joint Number, 1, 2, etc.). C # code is as follows:

 string jointPattern = @"JOINTS.*\s*(?:(\d*\s*\S*\s*\S*\s*\S*)\r\n\s*)*"; MatchCollection mc = Regex.Matches(outFileSection, jointPattern ); foreach (Capture c in mc[0].Captures) { JointOutput j = new JointOutput(); string[] vals = c.Value.Split(); j.Joint = int.Parse(vals[0]) - 1; j.XDisplacement = float.Parse(vals[1]); j.YDisplacement = float.Parse(vals[2]); j.Rotation = float.Parse(vals[3]); joints.Add(j); } 

However, this does not work: instead of returning two captured groups (inside the group), it returns one group: the entire block, including the column headings. Why is this happening? Does C # deal with non-captured groups in different ways?

Finally, RegExes the best way to do this? (I really feel that I have two problems now.)

+8
c # regex string-parsing
source share
4 answers

mc[0].Captures equivalent to mc[0].Groups[0].Captures . Groups[0] always refers to the entire match, so there will only be a capture associated with it. The part you are looking for is captured in group # 1, so you should use mc[0].Groups[1].Captures .

But your regular expression is designed to match the whole input in one try, so the Matches() method always returns a MatchCollection with only one match (assuming the match is successful). You can use Match() instead:

  Match m = Regex.Match(source, jointPattern); if (m.Success) { foreach (Capture c in m.Groups[1].Captures) { Console.WriteLine(c.Value); } } 

exit:

 1 0.000000E+00 0.975415E+01 0.616921E+01 2 0.000000E+00 0.000000E+00 0.000000E+00 
+8
source share

I would just not use Regex for hard work and text analysis.

 var data = @" JOINTS DISPL.-X DISPL.-Y ROTATION 1 0.000000E+00 0.975415E+01 0.616921E+01 2 0.000000E+00 0.000000E+00 0.000000E+00"; var lines = data.Split('\r', '\n').Where(s => !string.IsNullOrWhiteSpace(s)); var regex = new Regex(@"(\S+)"); var dataItems = lines.Select(s => regex.Matches(s)).Select(m => m.Cast<Match>().Select(c => c.Value)); 

enter image description here

+2
source share

There are two problems: the repeating part (?:...) does not fit correctly; and .* is greedy and consumes the entire input, so the repeating part never matches, even if it's possible.

Use this instead:

 JOINTS.*?[\r\n]+(?:\s*(\d+\s*\S*\s*\S*\s*\S*)[\r\n\s]*)* 

This is an unwanted leading part, ensures that the line matching part starts with a new line (not in the middle of the header) and uses [\r\n\s]* in case the new line is not quite what you expect.

Personally, I would use regular expressions for this, but I like regular expressions :-) If you know that the line structure will always be [title] [newline] [newline] [lines], then perhaps this is more straightforward (if they are less flexible) to simply split into new lines and process accordingly.

Finally, you can use regex101.com or one of many other regex testing sites to help debug your regular expressions.

+1
source share

Why not just commit the values ​​and ignore the rest. Here is a regular expression that gets values.

 string data = @"JOINTS DISPL.-X DISPL.-Y ROTATION 1 0.000000E+00 0.975415E+01 0.616921E+01 2 0.000000E+00 0.000000E+00 0.000000E+00"; string pattern = @"^ \s+ (?<Joint>\d+) \s+ (?<ValX>[^\s]+) \s+ (?<ValY>[^\s]+) \s+ (?<Rotation>[^\s]+)"; var result = Regex.Matches(data, pattern, RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture) .OfType<Match>() .Select (mt => new { Joint = mt.Groups["Joint"].Value, ValX = mt.Groups["ValX"].Value, ValY = mt.Groups["ValY"].Value, Rotation = mt.Groups["Rotation"].Value, }); /* result is IEnumerable<> (2 items) Joint ValX ValY Rotation 1 0.000000E+00 0.975415E+01 0.616921E+01 2 0.000000E+00 0.000000E+00 0.000000E+00 */ 
+1
source share

All Articles