A multi-threaded array of arrays?

I have a data structure consisting of 1000 array elements, each array element is a smaller array of 8 ints:

std::array<std::array<int, 8>, 1000> 

The data structure contains two "pointers" that track the largest and smallest populated elements of the array (in the "external" array with 1000 elements). So, for example, they can be:

 min = 247 max = 842 

How can I read and write to this data structure from multiple threads? I am worried about the race conditions between pressing / appearing elements and maintaining two "pointers". My main mode of operation:

 // Pop element from current index // Calculate new index // Write element to new index // Update min and max "pointers" 
+6
source share
1 answer

You are right that your current algorithm is not thread safe, there are many places where a conflict may arise.

This cannot be optimized without additional information. You need to know where the slowdown occurs before you can improve it, and for this you need indicators. Profile your code and find out which bits actually take time, because you can only get it by parallelizing these bits, and even then you may find that it is actually memory or something else, which is the limiting factor, not the CPU.

The easiest way is to simply lock the entire structure for the complete process. This will only work if threads do most of the other processing, unless you lose performance compared to a single thread.

After that, you can consider locking separately for different sections of the data structure. You will need to properly analyze what you use, when and where and how to work out what would be useful for separation. For example, you may have pieces of auxiliary arrays, each of which has its own lock.

Be careful of deadlocks in this situation, although you may have a 32 requirement for the thread, then you need 79, and another thread already has 79, and then wants 32. Make sure you always require the lock in the same order.

The fastest option (if possible) may even be to provide each thread with its own copy of the data structure, each of which processes 1 / N of the work, and then combines the results at the end. Thus, no synchronization is required at all during processing.

But again, everything returns to metrics and profiling. This is not an easy problem.

+1
source

All Articles