What is the fastest way to create a unique set in .net 2

I have what is essentially a jagged array of name value pairs - I need to create a set of unique name values โ€‹โ€‹from this. the notch array is about 86,000 x 11 values. It doesn't matter to me how I store a couple of name values โ€‹โ€‹(one line "name = value" or a specialized class, for example KeyValuePair).
Additional Information: There are 40 different names and more different values โ€‹โ€‹- possibly in the area of โ€‹โ€‹10,000 values.

I am using C # and .NET 2.0 (and the performance is so poor. I think it is best to push my entire jagged array into the sql database and make the selection different from it).

Below is the current Im code using:

List<List<KeyValuePair<string,string>>> vehicleList = retriever.GetVehicles(); this.statsLabel.Text = "Unique Vehicles: " + vehicleList.Count; Dictionary<KeyValuePair<string, string>, int> uniqueProperties = new Dictionary<KeyValuePair<string, string>, int>(); foreach (List<KeyValuePair<string, string>> vehicle in vehicleList) { foreach (KeyValuePair<string, string> property in vehicle) { if (!uniqueProperties.ContainsKey(property)) { uniqueProperties.Add(property, 0); } } } this.statsLabel.Text += "\rUnique Properties: " + uniqueProperties.Count; 
+6
performance collections c #
source share
6 answers

It works for me in 0.34 seconds down from 9+ minutes

The problem is comparing KeyValuePair structures. I worked on it by writing a comparator object and passed its instance to the dictionary.

From what I can determine, KeyValuePair.GetHashCode () returns the hash code of this Key object (in this example, the least unique object).

As the dictionary adds (and checks for the existence) of each element, it uses the Equals and GetHashCode functions, but must rely on the Equals function when the hash code is less unique.

Providing the more unique GetHashCode function, it performs the Equals function much less frequently. I also optimized the Equals function to compare more unique values โ€‹โ€‹before less significant keys.

86,000 * 11 objects with 10,000 unique properties work after 0.34 seconds using the comparison object below (without the comparison object, it takes 9 minutes 22 seconds)

Hope this helps :)

  class StringPairComparer : IEqualityComparer<KeyValuePair<string, string>> { public bool Equals(KeyValuePair<string, string> x, KeyValuePair<string, string> y) { return x.Value == y.Value && x.Key == y.Key; } public int GetHashCode(KeyValuePair<string, string> obj) { return (obj.Key + obj.Value).GetHashCode(); } } 

EDIT : if it was only one line (instead of KeyValuePair, where string = Name + Value), it will be about twice as fast. This is a good interesting problem, and I spent too much time on it (I learned a little quietly)

+12
source share

if you donโ€™t need any specific correlation between each key / value pair and the unique values โ€‹โ€‹you generate, can you just use a GUID? I assume the problem is that your current โ€œkeyโ€ is not unique in this notch array.

 Dictionary<System.Guid, KeyValuePair<string, string>> myDict = new Dictionary<Guid, KeyValuePair<string, string>>(); foreach of your key values in their current format myDict.Add(System.Guid.NewGuid(), new KeyValuePair<string, string>(yourKey, yourvalue)) 

It seems like it will save what you need, but I donโ€™t know how you could extract data from this, since there would be no semantic relationship between the Guid generator and what you originally had ...

Can you provide more information in your question?

0
source share

Use KeyValuePair as a wrapper class and then create a dictionary to create a collection, perhaps? Or implement your own wrapper that overrides Equals and GetHashCode.

 Dictionary<KeyValuePair, bool> mySet; for(int i = 0; i < keys.length; ++i) { KeyValuePair kvp = new KeyValuePair(keys[i], values[i]); mySet[kvp] = true; } 
0
source share

Instead of using Dictionary why not extend KeyedCollection<TKey, TItem> ? According to the documentation:

Provides an abstract base class for a collection whose keys are embedded in values.

Then you need to override the protected TKey GetKeyForItem(TItem item) function. Since this is a hybrid between IList<T> and IDictionary<TKey, TValue> I think it will be pretty fast.

0
source share

What about:

 Dictionary<NameValuePair,int> hs = new Dictionary<NameValuePair,int>(); foreach (i in jaggedArray) { foreach (j in i) { if (!hs.ContainsKey(j)) { hs.Add(j, 0); } } } IEnumerable<NameValuePair> unique = hs.Keys; 

Of course, if you used C # 3.0, .NET 3.5:

 var hs = new HashSet<NameValuePair>(); hs.UnionWith(jaggedArray.SelectMany(item => item)); 

would do the trick.

0
source share

Have you profiled your code? Are you sure that foreach loops are a bottleneck, not a retriever. GetVehicles ()?

I created a small test project in which I fake a retriever and let it return the values โ€‹โ€‹of 86.000 X 11. My first attempt was made after 5 seconds, creating data.

I used the same value for the key and the value, where the first key was "0 # 0" and the last "85999 # 10".

Then I switched to guides. The same result.

Then I made the key longer, for example:

  var s = Guid.NewGuid().ToString(); return s + s + s + s + s + s + s+ s + s + s; 

Now it took almost 10 seconds.

Then I made the keys insanely long and got an exception in memory. I do not have a page file on my computer, so I got this exception immediately.

How long are your keys? Is your virtual memory consumption causing poor performance?

0
source share

All Articles