If you need a Linq approach, you can try adding a named capture group to the regular expression, and then filter out the elements that match the regular expression, group by the captured number, and finally get only the first line for each number. I like the readability of the solution, but I wonβt be surprised if there is a more efficient way to eliminate duplicates, let's see if anyone has a different approach.
Something like that:
list.Where(s => regex.IsMatch(s)) .GroupBy(s => regex.Match(s).Groups["num"].Value) .Select(g => g.First())
You can try with this example:
public class Program { private static readonly Regex regex = new Regex(@"^(?<num>\d+)\.", RegexOptions.Compiled); public static void Main() { var list = new [] { "1.one", "2. two", "no number", "2.duplicate", "300. three hundred", "4-ignore this" }; var distinctWithNumbers = list.Where(s => regex.IsMatch(s)) .GroupBy(s => regex.Match(s).Groups["num"].Value) .Select(g => g.First()); distinctWithNumbers.ToList().ForEach(Console.WriteLine); Console.ReadKey(); } }
You can try using it in this fiddle.
As pointed out by @orad comments, there is a Linq DistinctBy() extension in MoreLinq that can be used instead of grouping, and then getting the first element in the group to eliminate duplicates:
var distinctWithNumbers = list.Where(s => regex.IsMatch(s)) .DistinctBy(s => regex.Match(s).Groups["num"].Value);
Try this fiddle
EDIT
If you want to use your comparator, you need to implement GetHashCode so that it also uses the expression:
public int GetHashCode(T obj) { return _expr.Invoke(obj).GetHashCode(); }
Then you can use a comparator with a lambda function that takes a string and gets a number using a regular expression:
var comparer = new GenericCompare<string>(s => regex.Match(s).Groups["num"].Value); var distinctWithNumbers = list.Where(s => regex.IsMatch(s)).Distinct(comparer);
I created another fiddle using this approach.
Using lookahead regex
You can use either of these two approaches with the regular expression @"^\d+(?=\.)" .
Just change the lambda expressions to get the group "num" s => regex.Match(s).Groups["num"].Value with an expression that gets the regular expression s => regex.Match(s).Value
Updated fiddle here .