How to create a HashSet <List <Int>> with various elements?
I have a HashSet that contains several lists of integers, i.e. HashSet<List<int>>
To preserve my uniqueness, I need to do two things: 1. Manually loop the existing lists, looking for duplicates using SequenceEquals . 2. Sort individual lists so that SequenceEquals is currently running.
Is there a better way to do this? Is there an existing IEqualityComparer that I can provide a HashSet so that HashSet.Add() will automatically handle uniqueness?
var hashSet = new HashSet<List<int>>(); for(/* some condition */) { List<int> list = new List<int>(); ... /* for eliminating duplicate lists */ list.Sort(); foreach(var set in hashSet) { if (list.SequenceEqual(set)) { validPartition = false; break; } } if (validPartition) newHashSet.Add(list); } Here is a possible comparator that compares IEnumerable<T> over its elements. You need to sort manually before adding.
It was possible to collect sorting in a companion, but I do not think that this is a wise choice. Adding a canonical list form seems wiser.
This code will only work in .net 4 since it uses common variance. If you need earlier versions, you need to either replace IEnumerable with List , or add a second general parameter for the collection type.
class SequenceComparer<T>:IEqualityComparer<IEnumerable<T>> { public bool Equals(IEnumerable<T> seq1,IEnumerable<T> seq2) { return seq1.SequenceEqual(seq2); } public int GetHashCode(IEnumerable<T> seq) { int hash=1234567; foreach(T elem in seq) hash=hash*37+elem.GetHashCode(); return hash; } } void Main() { var hashSet = new HashSet<List<int>>(new SequenceComparer<int>()); List<int> test=new int[]{1,3,2}.ToList(); test.Sort(); hashSet.Add(test); List<int> test2=new int[]{3,2,1}.ToList(); test2.Sort(); hashSet.Contains(test2).Dump(); } This does not start correctly, it must be a HashSet<ReadOnlyCollection<>> , because you cannot allow lists to modify and invalidate the set predicate. You can then compute the hash code in O (n) while adding the collection to the collection. And the O (n) test to check if it is already in the set with the very unusual O (n ^ 2) worst case, if all the hashes are equal. Store the computed hash in the collection.
Is there a reason why you are not just using an array? int[] will work better. I also assume that the lists contain duplicates, otherwise you just use sets and you will not have a problem.
It seems that their contents will not change (a lot) once they are added to the HashSet . At the end of the day, you will have to use a comparator that returns to SequenceEqual . But you do not need to do this every time. Instead of doing an exponential number of sequences (for example, as the hash grows, making a SequenceEqual against every existing member) - if you create a good hash code in front, you may need to make very few such comparisons. Although the overhead of creating a good hash code is probably about the same as when executing SequenceEqual , you only do this once for each list.
So, the first time a certain List<int> triggered, you must generate a hash based on an ordered sequence of numbers and cache it. Then, the next time you match the list, you can use the cached value. I'm not sure how you could do this by comparing with the top of my head (maybe a static dictionary?), But you could implement a List wrapper that makes this easy.
Here is the basic idea. You need to be careful to make sure that it is not fragile (for example, make sure you strip the cache hash code when members change), but it does not look like this will be a typical situation for how you use it.
public class FasterComparingList<T>: IList<T>, IList, ... /// whatever you need to implement { // Implement your interfaces against InnerList // Any methods that change members of the list need to // set _LongHash=null to force it to be regenerated public List<T> InnerList { ... lazy load a List } public int GetHashCode() { if (_LongHash==null) { _LongHash=GetLongHash(); } return (int)_LongHash; } private int? _LongHash=null; public bool Equals(FasterComparingList<T> list) { if (InnerList.Count==list.Count) { return true; } // you could also cache the sorted state and skip this if a list hasn't // changed since the last sort // not sure if native `List` does list.Sort(); InnerList.Sort(); return InnerList.SequenceEqual(list); } protected int GetLongHash() { return ..... // something to create a reasonably good hash code -- which depends on the // data. Adding all the numbers is probably fine, even if it fails a couple // percent of the time you're still orders of magnitude ahead of sequence // compare each time } } If the lists do not change after adding, this should be very fast. Even in situations where lists can change frequently, creating a new hash code is unlikely (even if it is longer at all) than comparing the sequence.
If you do not specify IEQualityComparer then the default types will be used, so I think that you will need to create your own implementation of IEQualityComparer and pass this to the constructor of your HashSet. Here is a good example .