How to find all duplicates from the <string> list?
I have a List<string> that has multiple words duplicated. I need to find all words that are duplicates.
Any trick to get them all?
In the .NET Framework 3.5 and above, you can use Enumerable.GroupBy , which returns an enumerated list of enumerated duplicate keys, and then filters out any of the enumerations that have Count <= 1, and then select their keys to get back to one enumerated:
var duplicateKeys = list.GroupBy(x => x) .Where(group => group.Count() > 1) .Select(group => group.Key); If you use LINQ, you can use the following query:
var duplicateItems = from x in list group x by x into grouped where grouped.Count() > 1 select grouped.Key; or, if you prefer it without syntactic sugar:
var duplicateItems = list.GroupBy(x => x).Where(x => x.Count() > 1).Select(x => x.Key); This groups all elements that are the same, and then filters only those groups with more than one element. Finally, he selects only the key from these groups, since you do not need an account.
If you prefer not to use LINQ, you can use this extension method:
public void SomeMethod { var duplicateItems = list.GetDuplicates(); … } public static IEnumerable<T> GetDuplicates<T>(this IEnumerable<T> source) { HashSet<T> itemsSeen = new HashSet<T>(); HashSet<T> itemsYielded = new HashSet<T>(); foreach (T item in source) { if (!itemsSeen.Add(item)) { if (itemsYielded.Add(item)) { yield return item; } } } } This keeps track of the elements that he saw and gave. If he has not seen an element before, he adds it to the list of noticed elements, otherwise he ignores it. If he has not given an element before, he gives it, otherwise he ignores it.
and without LINQ:
string[] ss = {"1","1","1"}; var myList = new List<string>(); var duplicates = new List<string>(); foreach (var s in ss) { if (!myList.Contains(s)) myList.Add(s); else duplicates.Add(s); } // show list without duplicates foreach (var s in myList) Console.WriteLine(s); // show duplicates list foreach (var s in duplicates) Console.WriteLine(s); If you are looking for a more general method:
public static List<U> FindDuplicates<T, U>(this List<T> list, Func<T, U> keySelector) { return list.GroupBy(keySelector) .Where(group => group.Count() > 1) .Select(group => group.Key).ToList(); } EDIT: Here's an example:
public class Person { public string Name {get;set;} public int Age {get;set;} } List<Person> list = new List<Person>() { new Person() { Name = "John", Age = 22 }, new Person() { Name = "John", Age = 30 }, new Person() { Name = "Jack", Age = 30 } }; var duplicateNames = list.FindDuplicates(p => p.Name); var duplicateAges = list.FindDuplicates(p => p.Age); foreach(var dupName in duplicateNames) { Console.WriteLine(dupName); // Will print out John } foreach(var dupAge in duplicateAges) { Console.WriteLine(dupAge); // Will print out 30 } Using LINQ, of course. In the code below, you will add an element dictionary as a string and the number of each element in your sourc list.
var item2ItemCount = list.GroupBy(item => item).ToDictionary(x=>x.Key,x=>x.Count()); I assume that each line in your list contains several words, let me know if this is not true.
List<string> list = File.RealAllLines("foobar.txt").ToList(); var words = from line in list from word in line.Split(new[] { ' ', ';', ',', '.', ':', '(', ')' }, StringSplitOptions.RemoveEmptyEntries) select word; var duplicateWords = from w in words group w by w.ToLower() into g where g.Count() > 1 select new { Word = g.Key, Count = g.Count() } For what it's worth, here is my way:
List<string> list = new List<string>(new string[] { "cat", "Dog", "parrot", "dog", "parrot", "goat", "parrot", "horse", "goat" }); Dictionary<string, int> wordCount = new Dictionary<string, int>(); //count them all: list.ForEach(word => { string key = word.ToLower(); if (!wordCount.ContainsKey(key)) wordCount.Add(key, 0); wordCount[key]++; }); //remove words appearing only once: wordCount.Keys.ToList().FindAll(word => wordCount[word] == 1).ForEach(key => wordCount.Remove(key)); Console.WriteLine(string.Format("Found {0} duplicates in the list:", wordCount.Count)); wordCount.Keys.ToList().ForEach(key => Console.WriteLine(string.Format("{0} appears {1} times", key, wordCount[key]))); lblrepeated.Text = ""; string value = txtInput.Text; char[] arr = value.ToCharArray(); char[] crr=new char[1]; int count1 = 0; for (int i = 0; i < arr.Length; i++) { int count = 0; char letter=arr[i]; for (int j = 0; j < arr.Length; j++) { char letter3 = arr[j]; if (letter == letter3) { count++; } } if (count1 < count) { Array.Resize<char>(ref crr,0); int count2 = 0; for(int l = 0;l < crr.Length;l++) { if (crr[l] == letter) count2++; } if (count2 == 0) { Array.Resize<char>(ref crr, crr.Length + 1); crr[crr.Length-1] = letter; } count1 = count; } else if (count1 == count) { int count2 = 0; for (int l = 0; l < crr.Length; l++) { if (crr[l] == letter) count2++; } if (count2 == 0) { Array.Resize<char>(ref crr, crr.Length + 1); crr[crr.Length - 1] = letter; } count1 = count; } } for (int k = 0; k < crr.Length; k++) lblrepeated.Text = lblrepeated.Text + crr[k] + count1.ToString(); I use this method to check for duplicate entries in a string:
public static IEnumerable<string> CheckForDuplicated(IEnumerable<string> listString) { List<string> duplicateKeys = new List<string>(); List<string> notDuplicateKeys = new List<string>(); foreach (var text in listString) { if (notDuplicateKeys.Contains(text)) { duplicateKeys.Add(text); } else { notDuplicateKeys.Add(text); } } return duplicateKeys; } This may not be the shortest or most elegant way, but I think it is very readable.
this code works var duplicateKeys = list.GroupBy (x => x) .Where (group => group.Count ()> 1) .Select (group => group.Key) .ToString ();