The most efficient way to remove multiple elements from an IList <T>

What is the most efficient way to remove multiple elements from an IList<T> object. Suppose I have an IEnumerable<T> all the elements that I want to remove, in the same order of appearance as in the original list.

The only thing I have in mind is:

 IList<T> items; IEnumerable<T> itemsToDelete; ... foreach (var x in itemsToDelete) { items.Remove(x); } 

But I think this is not effective, because it must go through the list from the very beginning every time the Remove method is called.

+7
generics c # ienumerable ilist
source share
3 answers

As the number of items to remove becomes larger, you will probably find that moving the list and checking each item for a hash of “items to delete” is more efficient. This extension method may help:

 static void RemoveAll<T>(this IList<T> iList, IEnumerable<T> itemsToRemove) { var set = new HashSet<T>(itemsToRemove); var list = iList as List<T>; if (list == null) { int i = 0; while (i < iList.Count) { if (set.Contains(iList[i])) iList.RemoveAt(i); else i++; } } else { list.RemoveAll(set.Contains); } } 

I compared this little program below. (Note that it uses an optimized path if IList<T> is actually a List<T> .)

On my machine (and using my test data) this extension method took 1.5 seconds to execute vs 17 seconds for the code in your question. However, I have not tested data of different sizes. I am sure that to remove just a few elements RemoveAll2 will be faster.

 static class Program { static void RemoveAll<T>(this IList<T> iList, IEnumerable<T> itemsToRemove) { var set = new HashSet<T>(itemsToRemove); var list = iList as List<T>; if (list == null) { int i = 0; while (i < iList.Count) { if (set.Contains(iList[i])) iList.RemoveAt(i); else i++; } } else { list.RemoveAll(set.Contains); } } static void RemoveAll2<T>(this IList<T> list, IEnumerable<T> itemsToRemove) { foreach (var item in itemsToRemove) list.Remove(item); } static void Main(string[] args) { var list = Enumerable.Range(0, 10000).ToList(); var toRemove = new[] { 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997}; list.RemoveAll(toRemove); // JIT //list.RemoveAll2(toRemove); // JIT var sw = Stopwatch.StartNew(); for (int i = 0; i < 10000; i++) { list.RemoveAll(toRemove); //list.RemoveAll2(toRemove); } sw.Stop(); Console.WriteLine("Elapsed: {0} ms", sw.ElapsedMilliseconds); Console.ReadKey(); } } 

UPDATE (for @KarmaEDV's comment below): If you need to use your own comparisons mapper, the extension method may have an overload that accepts such a comparator:

 public static void RemoveAll<T>(this IList<T> iList, IEnumerable<T> itemsToRemove, IEqualityComparer<T> comparer) { var set = new HashSet<T>(itemsToRemove, comparer); var list = iList as List<T>; if (list == null) { int i = 0; while (i < iList.Count) { if (set.Contains(iList[i])) iList.RemoveAt(i); else i++; } } else { list.RemoveAll(set.Contains); } } 
+9
source share

If the IList<T> reference refers to an instance of List<T> , then casting to this type and using RemoveAll can lead to better performance than any other approach that does not depend on the specifics of its implementation.

Otherwise, although the optimal approach will depend on the relative proportion of items to be removed and the nature of IList<T> , I would suggest that it is best to copy IList<T> to the new List<T> , clear it and selectively add items . Even if the items in the list do not contribute to efficient hashing, the fact that the items in IEnumerable<T> are in the same sequence as the items in IList<T> will make it unnecessary. Start by reading the element from IEnumerable<T> . Then copy the elements from the array to the list until it is found. Then read the next element from IEnumerable<T> and copy from the array to the list until it is found, etc. After running out of IEnumerable<T> copy the balance of the array to List<T> .

This approach will be fast in many implementations of IList<T> . However, it has one major flaw: the fact that it removes and re-adds each item can have unwanted side effects for things like watchlists. If the list can be observable, you may need to use the slower N ^ 2 algorithm to ensure correctness. [BTW, it seems to me that IList<T> has a Remove(T) method but does not have a much more useful RemoveAll(Func<T,bool>) method RemoveAll(Func<T,bool>) . Remove(T) is largely redundant with IndexOf and RemoveAt , while RemoveAll allows the O (N) implementation of many O (N ^ 2) operations in its absence if it is not allowed to remove and re-add elements.

+4
source share

Maybe this helps. Other ideas of the same type may be included.

 IList<T> items; IEnumerable<T> itemsToDelete; ... { if(items.Equals(itemsToDelete)) //Equal lists? { items.Clear(); return true; } if( (double) items.Count/itemsToDelete.Count < 1){ /* It is faster to iterate the small list first. */ foreach (var x in items) { if(itemsToDelete.Contains(x)){/**/} } } else{ foreach (var x in itemsToDelete) { items.Remove(x); } } } 
+1
source share

All Articles