Using a hash table inside Parallel.ForEach?

I have a Parallel.ForEach loop performing an intensive operation inside the body.

An operation can use a hashtable to store values ​​and can be reused for other consecutive elements of the loop. I add to the Hashtable after the intensive operation is completed, the next element of the loop can search in the Hashtable and reuse the object, instead of performing the intensive operation again.

However, since I use Parallel.ForEach, an unsafe problem arises, as a result of which the calls to Hashtable.Add and ContainsKey (key) go out of sync, as they can be executed in parallel. The use of interlocks can cause perforation problems.

Here is a sample code:

Hashtable myTable = new Hashtable; Parallel.ForEach(items, (item, loopState) => { // If exists in myTable use it, else add to hashtable if(myTable.ContainsKey(item.Key)) { myObj = myTable[item.Key]; } else { myObj = SomeIntensiveOperation(); myTable.Add(item.Key, myObj); // Issue is here : breaks with exc during runtime } // Do something with myObj // some code here } 

Inside the TPL library, there must be some API, a property parameter, that this script could handle. There is?

+7
c # task-parallel-library parallel-extensions
source share
4 answers

You are looking for System.Collections.Concurrent.ConcurrentDictionary<TKey, TValue> . New parallel collections use significantly improved locking mechanisms and should perform parallel algorithms perfectly.

Edit: The result may look like this:

 ConcurrentDictionary<T,K> cache = ...; Parallel.ForEach(items, (item, loopState) => { K value; if (!cache.TryGetValue(item.Key, out value)) { value = SomeIntensiveOperation(); cache.TryAdd(item.Key, value); } // Do something with value } ); 

A word of warning: if the items in items do not all have a unique item.Key , then SomeIntensiveOperation can be called twice for this key. In this example, the key is not passed to SomeIntensiveOperation , but this means that the code β€œDo something with a value” can execute key / value A pairs and key / value B pairs, and only one result will be stored in the cache (not necessarily the first one, computed by SomeIntensiveOperation). For this you will need a parallel lazy factory if this is a problem. Also, for obvious reasons, SomeIntensiveOperation should be thread safe.

+18
source share

check the System.Collections.Concurrent namespace, I think you need ConcurrentDictionary

+4
source share

Use ReaderWriterLock, it has good performance for work, which has many reads and few records that have a short duration. Your problem seems to fit this specification.

All read operations will be performed quickly and will be blocked, the only time that someone will be blocked is when the recording occurs, and that the recording is performed as long as it is required to push something into the Hashtable.

ReaderWriterLockSlim on MSDN

I think I'll throw the code away ...

 ReaderWriterLockSlim cacheLock = new ReaderWriterLockSlim(); Hashtable myTable = new Hashtable(); Parallel.ForEach(items, (item, loopState) => { cacheLock.EnterReadLock(); MyObject myObj = myTable.TryGet(item.Key); cacheLock.ExitReadLock(); // If the object isn't cached, calculate it and cache it if(myObj == null) { myObj = SomeIntensiveOperation(); cacheLock.EnterWriteLock(); try { myTable.Add(item.Key, myObj); } finally { cacheLock.ExitWriteLock(); } } // Do something with myObj // some code here } static object TryGet(this Hashtable table, object key) { if(table.Contains(key)) return table[key] else return null; } 
+3
source share

I see no other right choice but to use (more or less explicit) locks (a synchronized Hashtable just cancels all methods using locks).

Another option would be to let the dictionary go out of sync. The race condition will not spoil the dictionary, but simply require the code to perform some extra calculations. Profile the code to see if locking or missing notes have effects.

+1
source share

All Articles