How can I make code that reads and modifies an array that is clearly defined without blocking at the same time?

I am writing a program that calculates the endbame tablebase for a chess variation. The algorithm for filling the table base works as follows:

  • Start with a huge array of unsigned char , each element represents one position (we always assume that this is a white rotation). A member of the array, even if the position is lost, is odd if it is won, 0xff if it is invalid, 0xfe if it is a tie.
  • Iterates through the array, denoting each illegal position with 0xff , each position in which white is matched with 0x00 and all other positions with 0x0fe .
  • Iterate over the array, taking into account only the marked 0xfe positions. Check if there is a movement that leads to a position whose array element is even, if any, write one plus the number of this position in the element corresponding to the current position. If all movements result in positions indicated by odd numbers (i.e. This is a lost position), mark this position as one plus the highest of these numbers to indicate how long the strongest defense takes.
  • Repeat step three until the array no longer changes.

For speed, I would like to parallelize the third step. A thorough reading gives that at each iteration we only write one value (the number of iterations) to the array. The following strategy:

  • Divide the array into n parts, let one thread work with each part. Let the current iteration be i.
  • If a stream must read a member from an array and is equal to i, treat it as if it were set to 0xfe , because this means that the element was simply written at the same time by another stream.

Now it’s obvious that there is a race condition in this program, but it doesn’t matter, since we always get the correct result if there are no pink elephants (which cannot exist if a char written atomically). However, since there is a race condition on paper, the C compiler can declare my program undefined and format my hard drive.

What can I do to parallelize this algorithm without violating any restrictions of the C memory model and without causing mass slowdown (for example, by adding locks)?

Simplified problem description

Here is a simplified algorithm demonstrating the same concept, but devoid of all non-essential material:

  • Start with the unsigned char a[n] array. Each element of the array is 0 or 1.
  • For each member of the array that is set to 0: if some other members of the array are 0 or 2, set this member of the array to 2. The elements to be checked are dependent on the index of the array member that we want to update. There is no simple connection between the index and the array elements that we need to check, this is essentially random.

Since we only change the value 0 to 2, it does not matter in which order we process the array entries, even if there is a technical condition for the race, if we do it in parallel (since we are reading / writing the same object at the same time). How can I tell the compiler that he should not care about the state of the race without sacrificing success?

+7
c concurrency race-condition chess tearing
source share
1 answer

This is what is for a classifier like _Atomic for C11. You declare your array as

 _Atomic unsigned char a[n]; 

which means that each element of the array can be read or written atomically.

Prior to C11, there is no standard way to do this, but in general, depending on the implementation, some data types will be atomic for reading and writing. To find out what it is, you will need to look at the implementation documentation that you are using.


Note that the default storage order for C11 _Atomic calls is memory_order_seq_cst (sequential consistency), and if you don't need it, you can use the atomic_load_explicit and atomic_store_explicit with a weaker memory order (i.e. memory_order_relaxed in your example)

+5
source share

All Articles