A faster way to do a list is <T> .Contains ()

Question

A faster way to do a list is <T> .Contains ()

I am trying to do what I consider to be a "de-intersection" (I'm not sure what my own name is, but what Tim Sweeney from EpicGames called it in the old UnrealEd)

// foo and bar have some identical elements (given a case-insensitive match) List‹string› foo = GetFoo(); List‹string› bar = GetBar(); // remove non matches foo = foo.Where(x => bar.Contains(x, StringComparer.InvariantCultureIgnoreCase)).ToList(); bar = bar.Where(x => foo.Contains(x, StringComparer.InvariantCultureIgnoreCase)).ToList();

Then later I do another thing when I subtract the result from the original to see which items I deleted. This is a super-fast use of .Except (), so there is no problem.

There should be a faster way to do this, because this one works rather poorly with ~ 30,000 items (rows) in any of the lists. Preferably, a way to take this step and one later in one fell swoop would be enjoyable. I tried using .Exists () instead of .Contains (), but it is a bit slower. I feel a little thick, but I think it should be possible with some combination of .Except () and .Intersect () and / or .Union ().

+7

contains list c # extension-methods

Jf Mar 12 '09 at 23:37

source share

5 answers

This operation can be called a symmetric difference.

You need a different data structure, such as a hash table. Add the intersection of both sets to it, then divide the intersection with each set.

UPDATE:

I have some time to try this in code. I used a HashSet<T> with a set of 50,000 lines from 2 to 10 characters long with the following results:

Original : 79499 ms
Hashset : 33 ms

By the way, there is a method in a HashSet called SymmetricExceptWith that I thought would work for me, but it actually adds different elements from both sets to the set that the method is called on. Perhaps this is what you want, rather than leaving the original two sets intact, and the code will be more elegant.

Here is the code:

 using System; using System.Collections.Generic; using System.Diagnostics; using System.Linq; class Program { static void Main(string[] args) { // foo and bar have some identical elements (given a case-insensitive match) var foo = getRandomStrings(); var bar = getRandomStrings(); var timer = new Stopwatch(); timer.Start(); // remove non matches var f = foo.Where(x => !bar.Contains(x)).ToList(); var b = bar.Where(x => !foo.Contains(x)).ToList(); timer.Stop(); Debug.WriteLine(String.Format("Original: {0} ms", timer.ElapsedMilliseconds)); timer.Reset(); timer.Start(); var intersect = new HashSet<String>(foo); intersect.IntersectWith(bar); var fSet = new HashSet<String>(foo); var bSet = new HashSet<String>(bar); fSet.ExceptWith(intersect); bSet.ExceptWith(intersect); timer.Stop(); var fCheck = new HashSet<String>(f); var bCheck = new HashSet<String>(b); Debug.WriteLine(String.Format("Hashset: {0} ms", timer.ElapsedMilliseconds)); Console.WriteLine("Sets equal? {0} {1}", fSet.SetEquals(fCheck), bSet.SetEquals(bCheck)); //bSet.SetEquals(set)); Console.ReadKey(); } static Random _rnd = new Random(); private const int Count = 50000; private static List<string> getRandomStrings() { var strings = new List<String>(Count); var chars = new Char[10]; for (var i = 0; i < Count; i++) { var len = _rnd.Next(2, 10); for (var j = 0; j < len; j++) { var c = (Char)_rnd.Next('a', 'z'); chars[j] = c; } strings.Add(new String(chars, 0, len)); } return strings; } }

+6

codekaizen Mar 12 '09 at 23:44

source share

If the items are unique on each list, you should consider using a HashSet.

The HashSet (T) class provides a high level of performance operation. A set is a collection that does not contain duplicate elements and whose elements are missing a specific order.

+1

Luca martinetti Mar 12 '09 at 23:47

source share

With a sorted list, you can use binary search.

+1

user79829 Mar 19 '09 at 2:15

source share

Contains the operation O (N) in the list. If you had a different data structure, such as a sorted list or dictionary, you would significantly reduce your time. Access to a key in a sorted list is usually O (log N), and in a hash, usually O (1).

0

Robert P Mar 12 '09 at 23:45

source share

gcores · Accepted Answer · 2009-03-12T23:52:02+0000

With the intersection, this will be done as follows:

 var matches = ((from f in foo select f) .Intersect( from b in bar select b, StringComparer.InvariantCultureIgnoreCase))

A faster way to do a list is <T> .Contains ()

More articles: