I am writing a program that calculates the endbame tablebase for a chess variation. The algorithm for filling the table base works as follows:
- Start with a huge array of
unsigned char , each element represents one position (we always assume that this is a white rotation). A member of the array, even if the position is lost, is odd if it is won, 0xff if it is invalid, 0xfe if it is a tie. - Iterates through the array, denoting each illegal position with
0xff , each position in which white is matched with 0x00 and all other positions with 0x0fe . - Iterate over the array, taking into account only the marked
0xfe positions. Check if there is a movement that leads to a position whose array element is even, if any, write one plus the number of this position in the element corresponding to the current position. If all movements result in positions indicated by odd numbers (i.e. This is a lost position), mark this position as one plus the highest of these numbers to indicate how long the strongest defense takes. - Repeat step three until the array no longer changes.
For speed, I would like to parallelize the third step. A thorough reading gives that at each iteration we only write one value (the number of iterations) to the array. The following strategy:
- Divide the array into n parts, let one thread work with each part. Let the current iteration be i.
- If a stream must read a member from an array and is equal to i, treat it as if it were set to
0xfe , because this means that the element was simply written at the same time by another stream.
Now itβs obvious that there is a race condition in this program, but it doesnβt matter, since we always get the correct result if there are no pink elephants (which cannot exist if a char written atomically). However, since there is a race condition on paper, the C compiler can declare my program undefined and format my hard drive.
What can I do to parallelize this algorithm without violating any restrictions of the C memory model and without causing mass slowdown (for example, by adding locks)?
Simplified problem description
Here is a simplified algorithm demonstrating the same concept, but devoid of all non-essential material:
- Start with the
unsigned char a[n] array. Each element of the array is 0 or 1. - For each member of the array that is set to 0: if some other members of the array are 0 or 2, set this member of the array to 2. The elements to be checked are dependent on the index of the array member that we want to update. There is no simple connection between the index and the array elements that we need to check, this is essentially random.
Since we only change the value 0 to 2, it does not matter in which order we process the array entries, even if there is a technical condition for the race, if we do it in parallel (since we are reading / writing the same object at the same time). How can I tell the compiler that he should not care about the state of the race without sacrificing success?
c concurrency race-condition chess tearing
fuz
source share