How to match both numbers and a range of numbers in a CSV-like string with a regular expression?

I usually like the problems of regular expressions, and even better is their solution.
But it seems that I have a case that I cannot understand.

I have a string of values ​​separated by a semicolon, such as a CSV string, which might look like this: 123-234;FOO-456;45-67;FOO-FOO;890;FOO-123;11-22;123;123;44-55;098-567;890;123-FOO;

On this line, I would like to match all integers and integer ranges to extract them later. It is possible that only one value (without a semi-colony).

After a great search, I managed to write this expression:
(?:^|;)(?<range>\d+-\d+)(?:$|;)|(?:^|;)(?<integer>\d+)(?:$|;)

Used test lines:

  • 123
  • 123-234;FOO-456;45-67;FOO-FOO;890;FOO-123;11-22;123;123;44-55;098-567;890;123-FOO;
  • 123-456
  • 123-FOO
  • FOO-123
  • FOO-FOO

Lines 1 and 3 match correctly, but lines 4,5 6 do not.
In line 2, only one of the two values ​​is correctly matched.

Here is a link to regex101.com that illustrates this: https://regex101.com/r/zA7uI9/5

I will also need to select integers and ranges separately (in different groups).

Note. I found a question that could help me and tried my answer (adapting it), but that didn't work. Regular expression to match numbers and number ranges

Do you have any ideas on what I am missing?

The language that will "use" this regular expression is C #, but I don’t know if this is useful information for my problem.

added by barlop

Below are matches to the current regex as shown at this link regex101.com

and for this test line it is 123-234;FOO-456;45-67;FOO-FOO;890;FOO-123;11-22;123;123;44-55;098-567;89

 123-234 45-67 890 11-22 123 098-567 

therefore its regular expression seems to miss one of the 123s, and 44-45 and 89 at the end.

+6
source share
3 answers

C # CSV String Parsing

Use the built-in CSV analyzer and check each field separately:

 using Microsoft.VisualBasic.FileIO; .... var str = "123-234;FOO-456;45-67;FOO-FOO;890;FOO-123;11-22;123;123;44-55;098-567;890;123-FOO;"; var csv_parser = new TextFieldParser(new StringReader(str)); csv_parser.HasFieldsEnclosedInQuotes = false; // Fields are not enclosed with quotes csv_parser.SetDelimiters(";"); // Setting delimiter string[] fields; var range_fields = new List<string>(); var integer_fields = new List<string>(); while (!csv_parser.EndOfData) { fields = csv_parser.ReadFields(); foreach (var field in fields) { if (!string.IsNullOrWhiteSpace(field) && field.All(x => Char.IsDigit(x))) { integer_fields.Add(field); Console.WriteLine(string.Format("Intger field: {0}", field)); } else if (!string.IsNullOrWhiteSpace(field) && Regex.IsMatch(field, @"\d+-\d+")) { range_fields.Add(field); Console.WriteLine(string.Format("Range field: {0}", field)); } } } csv_parser.Close(); 

Results:

 Range field: 123-234 Range field: 45-67 Intger field: 890 Range field: 11-22 Intger field: 123 Intger field: 123 Range field: 44-55 Range field: 098-567 Intger field: 890 

Fix Regular Expression Approach

The reason your regular expression crashes is because you actually consume delimiters with non-capture groups (i.e. (?:^|;) And (?:$|;) Still match the text, this text is added to the match value, and the regular expression index to the position after ; start / end of line).

What you need to use lookarounds . They do not consume text, they simply check whether any text that matches the search pattern can or cannot be found before or after the current position. Thus, you get the opportunity to get matching matches, and this is one of the scenarios when the images are very convenient.

 (?<=^|;)((?<range>\d+-\d+)|(?<integer>\d+))(?=$|;) 

regex demo for .NET regex in .NET regex syntax supporting RegexStorm

And the nice-to-have chart:

enter image description here

Pay attention to the use of RegexOptions.ExplicitCapture flag : in this way we avoid getting the subpatterns captured with numbered (i.e. unnamed) group RegexOptions.ExplicitCapture and only get named records (just what we need).

C # demo :

 var s = "123-234;FOO-456;45-67;FOO-FOO;890;FOO-123;11-22;123;123;44-55;098-567;890;123-FOO;"; var rx = new Regex(@"(?<=^|;)((?<range>\d+-\d+)|(?<integer>\d+))(?=$|;)", RegexOptions.ExplicitCapture); var result = rx.Matches(s) .Cast<Match>() .Select(x => x.Groups["range"].Success ? x.Groups["range"].Value : x.Groups["integer"].Value ).ToList(); foreach (var x in result) Console.WriteLine(x); 
+6
source

I can't easily see capture groups in regex101, so the parts may need some tweaking, but these are all correct matches, and it captures. Hope someone post an improved answer, but in the meantime.

 (^\d+(?=;|$))|((?<=;)\d+$)|(?<=;)\d+(?=;)|\d+-\d+ 

graph like pic added by ro yo

Regular expression visualization

enter image description here

Logics:

Corresponds if (^\d+(?=;|$)) OR ((?<=;)\d+$) OR (?<=;)\d+(?=;) OR \d+-\d+

i.e. for example 123 at the beginning (or separately), 123 at the end, 123 in the middle or range anywhere.

I can't get regex101.com to list matches, but regex works

 C:\blah>echo 123-234;FOO-456;45-67;FOO-FOO;890;FOO-123;11-22;123;123;44-55;098-567;89| grep -oP "(^\d+(?=;))|((?<=;)\d+$)|(?<=;)\d+(?=;)|\d+-\d+" 123-234 45-67 890 11-22 123 123 44-55 098-567 89 
+2
source

Description

 (?<=;|^)[0-9]+(?:-[0-9]+|(?=;|$)) 

Regular expression visualization

This regex will do the following:

  • matching colon-delimited values ​​with colon-delimiters
  • from the values ​​he will extract single integers, such as 123 , or a range of integers, for example 123-456

Example

Live demo

https://regex101.com/r/oL1cN2/2

Sample text

 123 123-234;FOO-456;45-67;FOO-FOO;890;FOO-123;11-22;123;123;44-55;098-567;890;123-FOO; 123-456 123-FOO FOO-123 FOO-FOO 

Matching Examples

 123 123-234 45-67 890 11-22 123 123 44-55 098-567 890 123-456 
+1
source

All Articles