Parallel HashSet <T> in .NET Framework?

I have the following class.

class Test{ public HashSet<string> Data = new HashSet<string>(); } 

I need to change the "Data" field from different threads, so I would like some feedback on my current thread-safe implementation.

 class Test{ public HashSet<string> Data = new HashSet<string>(); public void Add(string Val){ lock(Data) Data.Add(Val); } public void Remove(string Val){ lock(Data) Data.Remove(Val); } } 

Is there a better solution to go directly to the field and protect it from concurrent access by multiple threads?

+85
multithreading c # thread-safety locking mutex
Sep 20 '13 at 17:57
source share
5 answers

Your implementation is correct. Unfortunately, the .NET Framework does not provide a built-in parallel hashset type. However, there are some workarounds.

ConcurrentDictionary (recommended)

This is the first example of using the ConcurrentDictionary<TKey, TValue> in the System.Collections.Concurrent namespace. In this case, the value is meaningless, so we can use a simple byte (1 byte in memory).

 private ConcurrentDictionary<string, byte> _data; 

This is the recommended parameter because the type is thread safe and provides you with the same advantages as HashSet<T> , except for the key and value - different objects.

Source: Social MSDN

Concurrentbag

If you don't mind duplicate entries, you can use the ConcurrentBag<T> class in the same namespace of the previous class.

 private ConcurrentBag<string> _data; 

Self-realization

Finally, like you, you can implement your own data type using locks or other ways that .NET provides you with thread safety. Here is a great example: How to implement ConcurrentHashSet in .Net

The only drawback of this solution is that the HashSet<T> does not have official parallel access even for read operations.

I am quoting the code for a related post (originally written by Ben Mosher ).

 using System.Collections.Generic; using System.Threading; namespace BlahBlah.Utilities { public class ConcurrentHashSet<T> : IDisposable { private readonly ReaderWriterLockSlim _lock = new ReaderWriterLockSlim(LockRecursionPolicy.SupportsRecursion); private readonly HashSet<T> _hashSet = new HashSet<T>(); #region Implementation of ICollection<T> ...ish public bool Add(T item) { _lock.EnterWriteLock(); try { return _hashSet.Add(item); } finally { if (_lock.IsWriteLockHeld) _lock.ExitWriteLock(); } } public void Clear() { _lock.EnterWriteLock(); try { _hashSet.Clear(); } finally { if (_lock.IsWriteLockHeld) _lock.ExitWriteLock(); } } public bool Contains(T item) { _lock.EnterReadLock(); try { return _hashSet.Contains(item); } finally { if (_lock.IsReadLockHeld) _lock.ExitReadLock(); } } public bool Remove(T item) { _lock.EnterWriteLock(); try { return _hashSet.Remove(item); } finally { if (_lock.IsWriteLockHeld) _lock.ExitWriteLock(); } } public int Count { get { _lock.EnterReadLock(); try { return _hashSet.Count; } finally { if (_lock.IsReadLockHeld) _lock.ExitReadLock(); } } } #endregion #region Dispose public void Dispose() { Dispose(true); GC.SuppressFinalize(this); } protected virtual void Dispose(bool disposing) { if (disposing) if (_lock != null) _lock.Dispose(); } ~ConcurrentHashSet() { Dispose(false); } #endregion } } 

EDIT:. Move input blocking methods with try blocks, because they can throw an exception and execute the instructions contained in finally blocks.

+97
Sep 20 '13 at 18:04 on
source share

Instead of wrapping a ConcurrentDictionary or locking over a HashSet I created the actual ConcurrentHashSet based on the ConcurrentDictionary .

This implementation supports basic operations for each element without a HashSet specifying operations, since they are less important in parallel scripts. IMO:

 var concurrentHashSet = new ConcurrentHashSet<string>( new[] { "hamster", "HAMster", "bar", }, StringComparer.OrdinalIgnoreCase); concurrentHashSet.TryRemove("foo"); if (concurrentHashSet.Contains("BAR")) { Console.WriteLine(concurrentHashSet.Count); } 

Output: 2

You can get it from NuGet here and see the source on GitHub here .

+11
Feb 10 '17 at 5:43 on
source share

Since no one mentioned this, I suggest an alternative approach that may or may not be suitable for your specific purpose:

Microsoft Unlimited Collections

From a blog post from the MS team:

While creating and running is both easier than ever, one of the main problems still exists: mutable shared state. Reading from multiple threads is usually very simple, but as soon as the state needs to be updated, it becomes much more difficult, especially in projects that require locking.

An alternative to locking is to use an immutable state. Immutable data structures are guaranteed to never change and, thus, can be freely transferred between different streams, without worrying about stepping on someone elses toes.

This project creates a new problem: how do you manage state changes without copying the entire state every time? This is especially difficult when collections are involved.

This place includes immutable collections.

These collections include ImmutableHashSet <T> and ImmutableList <T> .

Performance

Because immutable collections use data tree structures to provide structural exchange, their performance characteristics are different from mutable collections. When compared to a blocking volatile collection, the results will depend on contention conflicts and access patterns. However, taken from another blog post about immutable collections:

Q: I heard that immutable collections are slow. Is it different? Can I use them when performance or memory is important?

A: These immutable collections were highly tuned to have competitive performance characteristics for mutable collections while balancing memory sharing. In some cases, they are almost as fast as volatile collections, both algorithmically and in real time, sometimes even faster, while in other cases they are algorithmically more complex. However, in many cases the difference will be negligible. Typically, you should use the simplest code to complete the task, and then tune for performance as needed. Immutable collections help you write simple code, especially if you need to consider thread safety.

In other words, in many cases the difference will not be noticeable, and you should go for a simpler choice - which for parallel sets will use ImmutableHashSet<T> , since you do not have an existing lock implementation ! ImmutableHashSet<T>

+10
Apr 6 '16 at 10:05
source share

I prefer complete solutions, so I did this: remember that my graph is implemented differently, because I do not understand why it is forbidden to read the hashset when trying to calculate its values.

@Zen, Thanks for starting.

 [DebuggerDisplay("Count = {Count}")] [Serializable] public class ConcurrentHashSet<T> : ICollection<T>, ISet<T>, ISerializable, IDeserializationCallback { private readonly ReaderWriterLockSlim _lock = new ReaderWriterLockSlim(LockRecursionPolicy.SupportsRecursion); private readonly HashSet<T> _hashSet = new HashSet<T>(); public ConcurrentHashSet() { } public ConcurrentHashSet(IEqualityComparer<T> comparer) { _hashSet = new HashSet<T>(comparer); } public ConcurrentHashSet(IEnumerable<T> collection) { _hashSet = new HashSet<T>(collection); } public ConcurrentHashSet(IEnumerable<T> collection, IEqualityComparer<T> comparer) { _hashSet = new HashSet<T>(collection, comparer); } protected ConcurrentHashSet(SerializationInfo info, StreamingContext context) { _hashSet = new HashSet<T>(); // not sure about this one really... var iSerializable = _hashSet as ISerializable; iSerializable.GetObjectData(info, context); } #region Dispose public void Dispose() { Dispose(true); GC.SuppressFinalize(this); } protected virtual void Dispose(bool disposing) { if (disposing) if (_lock != null) _lock.Dispose(); } public IEnumerator<T> GetEnumerator() { return _hashSet.GetEnumerator(); } ~ConcurrentHashSet() { Dispose(false); } public void OnDeserialization(object sender) { _hashSet.OnDeserialization(sender); } public void GetObjectData(SerializationInfo info, StreamingContext context) { _hashSet.GetObjectData(info, context); } IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); } #endregion public void Add(T item) { _lock.EnterWriteLock(); try { _hashSet.Add(item); } finally { if(_lock.IsWriteLockHeld) _lock.ExitWriteLock(); } } public void UnionWith(IEnumerable<T> other) { _lock.EnterWriteLock(); _lock.EnterReadLock(); try { _hashSet.UnionWith(other); } finally { if (_lock.IsWriteLockHeld) _lock.ExitWriteLock(); if (_lock.IsReadLockHeld) _lock.ExitReadLock(); } } public void IntersectWith(IEnumerable<T> other) { _lock.EnterWriteLock(); _lock.EnterReadLock(); try { _hashSet.IntersectWith(other); } finally { if (_lock.IsWriteLockHeld) _lock.ExitWriteLock(); if (_lock.IsReadLockHeld) _lock.ExitReadLock(); } } public void ExceptWith(IEnumerable<T> other) { _lock.EnterWriteLock(); _lock.EnterReadLock(); try { _hashSet.ExceptWith(other); } finally { if (_lock.IsWriteLockHeld) _lock.ExitWriteLock(); if (_lock.IsReadLockHeld) _lock.ExitReadLock(); } } public void SymmetricExceptWith(IEnumerable<T> other) { _lock.EnterWriteLock(); try { _hashSet.SymmetricExceptWith(other); } finally { if (_lock.IsWriteLockHeld) _lock.ExitWriteLock(); } } public bool IsSubsetOf(IEnumerable<T> other) { _lock.EnterWriteLock(); try { return _hashSet.IsSubsetOf(other); } finally { if (_lock.IsWriteLockHeld) _lock.ExitWriteLock(); } } public bool IsSupersetOf(IEnumerable<T> other) { _lock.EnterWriteLock(); try { return _hashSet.IsSupersetOf(other); } finally { if (_lock.IsWriteLockHeld) _lock.ExitWriteLock(); } } public bool IsProperSupersetOf(IEnumerable<T> other) { _lock.EnterWriteLock(); try { return _hashSet.IsProperSupersetOf(other); } finally { if (_lock.IsWriteLockHeld) _lock.ExitWriteLock(); } } public bool IsProperSubsetOf(IEnumerable<T> other) { _lock.EnterWriteLock(); try { return _hashSet.IsProperSubsetOf(other); } finally { if (_lock.IsWriteLockHeld) _lock.ExitWriteLock(); } } public bool Overlaps(IEnumerable<T> other) { _lock.EnterWriteLock(); try { return _hashSet.Overlaps(other); } finally { if (_lock.IsWriteLockHeld) _lock.ExitWriteLock(); } } public bool SetEquals(IEnumerable<T> other) { _lock.EnterWriteLock(); try { return _hashSet.SetEquals(other); } finally { if (_lock.IsWriteLockHeld) _lock.ExitWriteLock(); } } bool ISet<T>.Add(T item) { _lock.EnterWriteLock(); try { return _hashSet.Add(item); } finally { if (_lock.IsWriteLockHeld) _lock.ExitWriteLock(); } } public void Clear() { _lock.EnterWriteLock(); try { _hashSet.Clear(); } finally { if (_lock.IsWriteLockHeld) _lock.ExitWriteLock(); } } public bool Contains(T item) { _lock.EnterWriteLock(); try { return _hashSet.Contains(item); } finally { if (_lock.IsWriteLockHeld) _lock.ExitWriteLock(); } } public void CopyTo(T[] array, int arrayIndex) { _lock.EnterWriteLock(); try { _hashSet.CopyTo(array, arrayIndex); } finally { if (_lock.IsWriteLockHeld) _lock.ExitWriteLock(); } } public bool Remove(T item) { _lock.EnterWriteLock(); try { return _hashSet.Remove(item); } finally { if (_lock.IsWriteLockHeld) _lock.ExitWriteLock(); } } public int Count { get { _lock.EnterWriteLock(); try { return _hashSet.Count; } finally { if(_lock.IsWriteLockHeld) _lock.ExitWriteLock(); } } } public bool IsReadOnly { get { return false; } } } 
+3
Oct 22 '14 at 17:39
source share

The ISet<T> part of creating an ISet<T> concurrent is that the established methods (union, intersection, difference) are iterative in nature. At least you need to iterate over all n elements of one of the sets involved in the operation, while simultaneously blocking both sets.

You lose the benefits of ConcurrentDictionary<T,byte> when you need to lock the entire set during iteration. Without blocking, these operations are not thread safe.

Given the added overhead of ConcurrentDictionary<T,byte> , it's probably wiser to use a lighter weight HashSet<T> and just surround everything in locks.

If you do not need the specified operations, use ConcurrentDictionary<T,byte> and just use default(byte) as the value when adding keys.

+3
Apr 14 '15 at 1:14
source share



All Articles