What are the guarantees for streams of modern C and C ++ compilers?

I am wondering what are the guarantees that compilers make to ensure that streaming writes to memory have visible effects in other threads.

I know countless cases where this is problematic, and I'm sure that if you are interested in an answer, you know that too, but please focus on the cases that I will present.

More precisely, I am concerned about circumstances that could lead to threads losing memory updates made by other threads. I don’t care (at the moment) if the updates are non-atomic or highly synchronized: as long as the corresponding threads notice the changes, I will be happy.

I hope compilers make a distinction between two types of access variables:

  • Access to variables that must have an address;
  • Access to variables that do not necessarily have an address.

For example, if you take this snippet:

void sleepingbeauty() { int i = 1; while (i) sleep(1); } 

Since i is local, I assume that my compiler can optimize it and just let the sleeping beauty fall to eternal sleep.

 void onedaymyprincewillcome(int* i); void sleepingbeauty() { int i = 1; onedaymyprincewillcome(&i); while (i) sleep(1); } 

Since i is local, but its address is taken and passed to another function, I assume that my compiler now finds out that it is an “addressable” variable and generates read memory to ensure that perhaps someday the prince will come.

 int i = 1; void sleepingbeauty() { while (i) sleep(1); } 

Since i is global, I assume that my compiler knows that the variable has an address and will generate a read for it, not a value caching.

 void sleepingbeauty(int* ptr) { *ptr = 1; while (*ptr) sleep(1); } 

I hope the dereference operator is clear enough so that my compiler generates memory readable at each iteration of the loop.

I am sure that this is the memory access model used by each C and C ++ compiler, but I do not think there are any guarantees. In fact, C ++ 03 is even blind to the existence of threads, so this question doesn’t even make sense given the standard. I'm not sure about C, though.

Is there some kind of documentation out there that indicates whether I am right or wrong? I know that these are dirty waters, since they may not meet the standards, it seems to me an important issue.

Besides the compiler that generates the read, I also fear that the CPU cache may technically preserve the obsolete value, and that although my compiler has done everything possible to fetch the read and write, the values ​​are never synchronized between threads. Could this happen?

+4
source share
7 answers

I am writing this answer because most of the help came from commenting on the questions, and not always from the authors of the answers. I already supported the answers that helped me the most, and I am doing this wiki community so as not to abuse the knowledge of others. (If you want to support this answer, also think about answering Billy and Dietrich's questions: they were most useful to me.)

There are two problems for addressing when values ​​written from a stream must be visible from another stream:

  • Caching (a value written from the CPU can never be associated with another CPU);
  • Optimization (the compiler can optimize the reading of a variable if it cannot be changed).

The first one is pretty simple. On modern Intel processors, there is a concept of cache coherency, which means that changes in the cache apply to other CPU caches.

It turns out that the optimization part is also not too complicated. Once the compiler cannot guarantee that a function call cannot change the contents of a variable, even in a single-threaded model, it will not optimize reading. In my examples, the compiler does not know that sleep cannot change i , and so reading is done with every operation. This should not be sleep , although any function for which the compiler has no implementation information will do. I believe that a particularly good function to use would be one that emits a memory barrier.

In the future, it is possible that compilers will better know the current impenetrable functions. However, when that time comes, I expect that there will be standard ways to ensure that changes are propagated correctly. (This happens with C ++ 11 and the class std::atomic<T> . I don't know for C1x.)

0
source

Access to variables that do not necessarily have an address.

All variables must have addresses (from the intended language - compilers are allowed to avoid providing the addresses of objects if they can, but which are not visible inside the language). This is a side effect that everything must be "pointerable", that everything has an address - even an empty class usually has a size of at least char , so a pointer can be created for it.

Since I am local, but its address is taken and passed to another function, I assume that my compiler will now find out that these are “addressable” variables and will generate the memory that it reads to ensure that perhaps someday the prince will come .

It depends on the content onedaymyprincewillcome . The compiler can embed this function, if it wishes, and not read the memory.

Since I am global, I assume that my compiler knows that the variable has an address and will generate its reading.

Yes, but it really doesn't matter if they are read. These reads may simply be cached on your current CPU core, rather than completely returning to main memory. To do this, you will need something like a memory barrier, and no C ++ compiler will do this for you.

I hope the dereference operator is clear enough so that my compiler generates memory readable at each iteration of the loop.

No - not required. The function can be built-in, which allows the compiler to completely remove these things, if it so wishes.

The only language feature in the standard that allows you to control things like wrt threading volatile , which just requires the compiler to generate reads. This does not mean that the value will be consistent, although due to a problem with the CPU cache - for this you need memory barriers.

If you need true multithreading correctness, you will use some platform-resistant library to create memory barriers and the like, or you will need a C ++ 0x compiler that supports std::atomic , which makes these requirements for explicit expressions.

+6
source

You are wrong.

 void onedaymyprincewillcome(int* i); void sleepingbeauty() { int i = 1; onedaymyprincewillcome(&i); while (i) sleep(1); } 

In this code, your compiler will load i from memory every time through a loop. What for? NOT , because he thinks another thread can change its value, but because he believes that sleep can change its value. This has nothing to do with whether address i address or should have an address, and everything related to operations performed by this thread that can change the code.

In particular, it is not guaranteed that the int assignment is even atomic, although this is true for all the platforms we use these days.

Too much is wrong if you do not use the correct synchronization primitives for your streaming programs. For instance,

 char *str = 0; asynch_get_string(&str); while (!str) sleep(1); puts(str); 

It can (and even will, on some platforms) sometimes print out complete garbage and crash the program. It looks safe, but since you are not using the correct synchronization primitives, the transition to ptr can be seen in your thread before the change in the memory location it refers to, although another thread initializes the line before setting the pointer.

So just don’t do it, don’t do it. And no, volatile not a fix.

Summary: The main problem is that the compiler only changes the execution order of the instructions and performs loading and storage operations. This is not enough to ensure thread safety in general, because the processor can freely change the order of loading and storage, and the order of loading and storage is not saved between processors. In order for things to happen in the correct order, you need memory barriers. You can either write the assembly yourself, or you can use the mutex / semaphore / critical section / etc, which does the right thing for you.

+2
source

While the C ++ 98 and C ++ 03 standards do not define a standard memory model that should be used by compilers, C ++ 0x does, and you can read about it here: http://www.hpl.hp.com /personal/Hans_Boehm/misc_slides/c++mm.pdf

In the end, for C ++ 98 and C ++ 03, it really depends on the compiler and the hardware platform. As a rule, there will be no memory barrier or fencing operation issued by the compiler for normally written code, unless you use the built-in compiler or something from your standard OS library for synchronization. Most mutex / semaphore implementations also include a built-in memory barrier function to prevent speculative reads and writes to lock and unlock operations on the mutex by the processor, and to prevent any re-ordering of operations on the same read or write calls by the compiler.

Finally, as Billy noted in the comments, on Intel x86 and x86_64 platforms, any read or write operation in increments of one byte is atomic, as well as reading or writing register values ​​for all 4-byte aligned memory on x86 and 4 or 8-byte aligned memory location on x86_64. On other platforms, this may not be the case, and you will have to consult the documentation on the platform.

+2
source

The only optimization control is volatile .

Compilers make NO gaurantee about simultaneous streams accessing the same location at the same time. You will need some kind of locking mechanism.

+1
source

I can only speak for C, and since synchronization is a functionality implemented in the CPU, the C programmer will need to call a library function for the OS that will have access to the lock (CriticalSection functions in the Windows NT engine) or implement something simpler (for example , spin-lock) and access the functionality itself.

volatile is a property that can be used at the module level. Sometimes a non-static (public) variable also works.

  • Local (stack) variables will not be accessible from other threads and should not be.
  • variables at the module level are good candidates for access by multiple threads, but synchronization functions will be required to perform the predicted functions.

Locks are inevitable, but they can be used more or less wisely, which leads to a slight or significant decrease in performance.

I answered a similar question here regarding unsynchronized threads , but I think you would be better off looking at similar topics to get quality answers.

0
source

I'm not sure that you understand the basics of the topic that you are argued to be discussing. Two threads, each of which starts at exactly the same time and every million cycles, each of which executes inc in the same variable, will NOT result in a final value of two million (two * one million increments). The value will be somewhere between one and two million.

The first increment will cause the value to be read from RAM in L1 (via L3 first, then L2) the access / core cache resource. The increment is executed and the new value originally written to L1 for distribution to the lower caches. When it reaches L3 (the highest cache common to both cores), the memory location will be invalid for the other main caches. This may seem safe, but at the same time, the other kernel simultaneously performed an increment based on the same initial value of the variable. Failure to write to the first value will be replaced by a write from the second core, invalid data in the caches of the first core.

Sound like a mess? It! The cores are so fast that what happens in the caches lags behind: the cores are an action. That's why you need explicit locks: to make sure the new value breaks so low in the memory hierarchy that other cores read the new value and nothing else. Or in another way: slow down so that caches can catch up with the cores.

The compiler does not "feel". The compiler is rules-based and, if built correctly, will be optimized to the extent permitted by the rules, and the compiler authors can build an optimizer. If the variable is volatile and the code is multithreaded, the rules will not allow the compiler to skip reading. Simple as that, although at first glance it may seem devilishly cunning.

I will have to repeat myself and say that locks cannot be implemented in the compiler because they are OS specific. The generated code will call all functions without knowing whether they are empty, contain a lock code, or cause a nuclear explosion. Similarly, the code will not be aware of the lock in progress, since the kernel will insert wait states until the lock request causes the lock to be in place. A lock is what exists in the kernel and in the memory of a programmer. The code should not (and should not) care.

0
source

All Articles