I write to the memory area (s memcpy) in one thread and copy it to a new location with memcpyin another. Sometimes these operations can overlap, which leads to data race. Programs with data races cause undefined behavior and are invalid.
In this case, I check after the copy that the copied data was valid (that there was no race). If a race has occurred, I discard the copied data. However, AFAIK, this does not allow me to sit behind UB. I think that whether I still use UB is the result of a data race.
Now I could write my own procedure memcpyin the assembly (or just copy and paste it from libc), which would be on the side of the whole UB problem. The assembly is not C ++, and everything that happens in the assembly will not give a compiler license to call nasal daemons [1] . By the way, is this true for inline asm, as well as for external compilation and binding of asm? Although memcpyalready compiled in any modern libc, it can also be handled specifically by the compiler, which often makes optimizations, such as a small one built-in memcpyfor known sizes and alignments, which can again cause nasal daemons.
Am I overdoing it here? It is hard to imagine the compiler so godlike that it can detect a data race during compilation - and at the same time so stupid that the optimizer uses it to generate bad code, and not to report it. But compilers have recently had a way to push both of these limitations - so I feel the need to seek advice here in Stack Overflow.
[Edit] Since there is a lot of curiosity about how I synchronize the situation, let me explain. The pointer to the copied memory is shared between threads. He gained access to the atomic load(mo_acquire). Then the memory is copied to a new location. Then a LoadLoad barrier, and then the second load(mo_relaxed)pointer. If the pointers do not match, the copy result is discarded because another thread may have participated in this thread during copying. The stream that writes to memory first updates the pointer to null using atomic store(mo_relaxed), followed by StoreStore barrierracing memcpy. So although two callsmemcpyin different streams can represent a data race - in fact, it is always detected, and the result is always discarded in this case. I call this scheme for reading and use it to allow the resurrection of objects in the cache after they are expelled, but before the memory has been reused without any interference or “strong” synchronization.
[1]: I am aiming for a more civilized time when compilers tell UB instead of abusing it for optimization, which may contradict the behavior that the programmer expects.