How to quickly find duplicates in the <T> list and update the original collection

Let me start by reading the following questions: 1 and 2 , and I understand that I can write code to find duplicates in my list, but my problem is that I want to update the original list, and not just query and print duplicates.

I know that I can’t update the collection that the request returns, since this is not a view, it is an anonymous IEnumerable<T> .

I want to find duplicates in my list and mark the property I created with the name State , which will be used later in the application.

Has anyone encountered this problem and could you point me in the right direction?

ps The approach that I use ATM is a cycle like sorting bubbles to go through a list item by position and compare key fields. Obviously, this is not the fastest method.

EDIT:

To view an item in the duplicate list, there are three fields that must match. We will call them Field1, Field2 and Field3

I have an overloaded Equals () method in a base class that compares these fields.

The only time I skip an object in my MarkDuplicates() method is the state of UNKNOWN or ERROR objects, otherwise I am testing it.

Let me know if you need more information.

Thanks again!

+4
source share
3 answers

I think the easiest way is to start by writing an extension method that finds duplicates in the list of objects. Since you use .Equals () objects, they can be compared in most regular collections.

 public static IEnumerable<T> FindDuplicates<T>(this IEnumerable<T> enumerable) { var hashset = new HashSet<T>(); foreach ( var cur in enumerable ) { if ( !hashset.Add(cur) ) { yield return cur; } } } 

Now it will be quite easy for you to update the collection for duplicates. For instance,

 List<SomeType> list = GetTheList(); list .FindDuplicates() .ToList() .ForEach(x => x.State = "DUPLICATE"); 

If you already have the ExtEsion ForEach method defined in your code, you can avoid .ToList.

+8
source

Your objects have a peculiar state property. You are likely to find duplicates based on another property or set of properties. Why not:

 List<obj> keys = new List<object>(); foreach (MyObject obj in myList) { if (keys.Contains(obj.keyProperty)) obj.state = "something indicating a duplicate here"; else keys.add(obj.keyProperty) } 
+1
source
 IEnumerable<T> oldList; IEnumerable<T> list; foreach (var n in oldList.Intersect(list)) n.State = "Duplicate"; 

Edit: I need lrn2read. this code is for 2 lists. My bad.

+1
source

All Articles