Search for duplicate object data in the object list

Using C # 3 and .Net Framework 3.5, I have a Person object

public Person { public int Id { get; set; } public string FirstName { get; set; } public string LastName { get; set; } public int SSN { get; set; } } 

and I have a List of them:

 List<Person> persons = GetPersons(); 

How can I get all Person objects in faces where the SSN is not unique in the list and remove them from the list of faces and ideally add them to another list called " List<Person> dupes "?

The original list might look something like this:

 persons = new List<Person>(); persons.Add(new Person { Id = 1, FirstName = "Chris", LastName="Columbus", SSN=111223333 }); // Is a dupe persons.Add(new Person { Id = 1, FirstName = "EE", LastName="Cummings", SSN=987654321 }); persons.Add(new Person { Id = 1, FirstName = "John", LastName="Steinbeck", SSN=111223333 }); // Is a dupe persons.Add(new Person { Id = 1, FirstName = "Yogi", LastName="Berra", SSN=123456789 }); 

And the end result will be Cummings and Berra on the list of original people and will have Columbus and Steinbeck on the list called cheating.

Many thanks!

+7
c #
source share
7 answers

This will allow you to duplicate the SSN:

 var duplicatedSSN = from p in persons group p by p.SSN into g where g.Count() > 1 select g.Key; 

The duplicate list will look like this:

 var duplicated = persons.FindAll( p => duplicatedSSN.Contains(p.SSN) ); 

And then just iterate over duplicates and delete them.

 duplicated.ForEach( dup => persons.Remove(dup) ); 
+19
source share

Thanks to gcores for starting down the right path. Here is what I did:

 var duplicatedSSN = from p in persons group p by p.SSN into g where g.Count() > 1 select g.Key; var duplicates = new List<Person>(); foreach (var dupeSSN in duplicatedSSN) { foreach (var person in persons.FindAll(p => p.SSN == dupeSSN)) duplicates.Add(person); } duplicates.ForEach(dup => persons.Remove(dup)); 
+2
source share
 List<Person> actualPersons = persons.Distinct().ToList(); List<Person> duplicatePersons = persons.Except(actualPersons).ToList(); 
+1
source share

Well, if you implement IComparable as follows:

 int IComparable<Person>.CompareTo(Person person) { return this.SSN.CompareTo(person.SSN); } 

then the following comparison will work:

 for (Int32 i = 0; i < people.Count; i++) { for (Int32 j = 1; j < items.Count; j++) { if (i != j && items[i] == items[j]) { // duplicate } } } 
0
source share

Go through the list and save the Hashtable from SSN / count pairs. Then list the table and delete the elements corresponding to the SSN, where SSN count> 0.

 Dictionary<string, int> ssnTable = new Dictionary<string, int>(); foreach (Person person in persons) { try { int count = ssnTable[person.SSN]; count++; ssnTable[person.SSN] = count; } catch(Exception ex) { ssnTable.Add(person.SSN, 1); } } // traverse ssnTable here and remove items where value of entry (item count) > 1 
0
source share

Should persons be List<Person> ? What if it was a Dictionary<int, Person> ?

 var persons = new Dictionary<int, Person>(); ... // For each person you want to add to the list: var person = new Person { ... }; if (!persons.ContainsKey(person.SSN)) { persons.Add(person.SSN, person); } // If you absolutely, positively got to have a List: using System.Linq; List<Person> personsList = persons.Values.ToList(); 

If you work with unique instances of Person (unlike different instances that may have the same properties), you can get better performance with a HashSet .

0
source share

Based on @gcores recommendation above.

If you want to add one duplicated SSN object back to the list of persons, add the following line:

 IEnumerable<IGrouping<string, Person>> query = duplicated.GroupBy(d => d.SSN, d => d); foreach (IGrouping<string, Person> duplicateGroup in query) { persons.Add(duplicateGroup .First()); } 

My assumption is that you can only remove duplicate values ​​minus the original value from which the duplicates are derived.

0
source share

All Articles