Preventing Compiler Optimization in Benchmarking

I recently met this brilliant conversation cpp2015 CppCon 2015: Chandler Carrut "C ++ Tuning: Tests, Processors and Compilers! Oh My!"

One of the methods mentioned to prevent the compiler from optimizing code is to use the following functions.

static void escape(void *p) { asm volatile("" : : "g"(p) : "memory"); } static void clobber() { asm volatile("" : : : "memory"); } void benchmark() { vector<int> v; v.reserve(1); escape(v.data()); v.push_back(10); clobber() } 

I'm trying to figure it out. The questions are as follows.

1) What is the advantage of escape over clobor?

2) In the above example, it seems that clobber () prevents the optimization of the previous statement (push_back). If in this case the wrong snippp is lower?

  void benchmark() { vector<int> v; v.reserve(1); v.push_back(10); clobber() } 

If that wasn't complicated enough, stupidity (FB threading lib) even has someone else's option

Corresponding fragment:

 template <class T> void doNotOptimizeAway(T&& datum) { asm volatile("" : "+r" (datum)); } 

I understand that the above snippet tells the compiler that the assembly block will write to the database. But if the compiler discovers that there is no consumer for this database, it can still optimize the entity producing the null value, right?

I assume this is not generally known, and any help is appreciated!

+6
source share
2 answers

tl; dr doNotOptimizeAway creates an artificial "use".

A little terminology here: "def" ("definition") is an operator that assigns a value to a variable; "use" is an operator that uses the value of a variable to perform some operation.

If from the point immediately after def all the paths to the program exit are not found using a variable called def dead , and skipping Dead Code Elimination (DCE) removes it. This, in turn, can lead to other defs becoming dead (if this def was used due to the presence of variable operands), etc.

Imagine a program after Scalar Replacement of Aggregates (SRA) that turns a local std::vector into two variables, len and ptr . At some point, the program assigns a ptr value; This statement is def.

Now the source program did nothing with the vector; in other words, there was no use of either len or ptr . Therefore, all their defs are dead, and DCE can remove them, effectively deleting all the code and making the mark useless.

Adding doNotOptimizeAway(ptr) creates an artificial use that prevents DCE from removing defs. (As a note, I don't see the point in "+", "g" should be enough).

A similar line of reasoning can be accompanied by loading and storing memory: the storage (def) is dead if there is no path to the end of the program that contains the loading (use) from this storage. Since tracing arbitrary memory locations is much more complicated than tracing individual pseudo-registration variables, the compiler argues conservatively - the storage is dead if there is no way to the end of the program that may encounter the use of this storage.

One such case is a storage in the memory area, which, as guaranteed, should not be an alias - after freeing the memory, there can be no use of this storage, which does not cause undefined behavior. IOW, such applications do not exist.

Thus, the compiler can eliminate v.push_back(42) . But escape occurs - this causes v.data() be considered arbitrarily matched, like @Leon described above.

The purpose of clobber() in this example is to create an artificial use of all memory with an alias. We have a repository (from push_back(42) ), the repository is in a global alias (due to escape(v.data()) ), so clobber() can potentially contain the use of this repository (IOW, the effect of the repository is to be observed) , therefore, the compiler is not allowed to delete the repository.

A few simple examples:

Example I:

 void f() { int v[1]; v[0] = 42; } 

This does not generate any code.

Example II:

 extern void g(); void f() { int v[1]; v[0] = 42; g(); } 

This only generates a call to g() , without memory. Function g cannot access v because v not an alias.

Example III:

 void clobber() { __asm__ __volatile__ ("" : : : "memory"); } void f() { int v[1]; v[0] = 42; clobber(); } 

As in the previous example, no storage is generated, since v not smoothed, and the clobber call is not attached to anything.

Example IV:

 template<typename T> void use(T &&t) { __asm__ __volatile__ ("" :: "g" (t)); } void f() { int v[1]; use(v); v[0] = 42; } 

This time, v is escaping (i.e. it may be potentially available from other activation frames). However, the storage is still deleted, because after it there was no potential use of this memory (without UB).

Example V:

 template<typename T> void use(T &&t) { __asm__ __volatile__ ("" :: "g" (t)); } extern void g(); void f() { int v[1]; use(v); v[0] = 42; g(); // same with clobber() } 

And finally, we get storage because v escapes, and the compiler should conservatively assume that the call to g can access the stored value.

(for experiments https://godbolt.org/g/rFviMI )

+4
source

1) What is the advantage of escape over clobor?

escape() has no advantage over clobber() . escape() complements clobber() following important way:

The effect of clobber() limited by memory potentially accessible through an imaginary global root pointer. In other words, the compiled memory compiler model is a connected graph of blocks referencing each other through pointers, and the specified imaginary global root pointer serves as an entry point to this graph. (Memory leaks are not taken into account in this model, i.e. The compiler ignores the possibility that available blocks may become inaccessible due to the lost pointer value). A recently allocated block is not part of such a graph and is immune to any side effects of clobber() . escape() ensures that the passed to address belongs to a globally accessible set of memory blocks. When applied to a newly allocated block of memory, escape() has the effect of adding it to the specified graph.

2) From the above example, it looks like clobber () prevents the previous statement (push_back) from being optimized. If this is a case, why is the snippet below incorrect?

  void benchmark() { vector<int> v; v.reserve(1); v.push_back(10); clobber(); } 

The distribution hidden inside v.reserve(1) does not appear in clobber() until it is registered via escape() .

+3
source

All Articles