Why don't gcc and clang raise strlen from this loop?

Consider the following code:

#include <string.h> void bar(char c); void foo(const char* __restrict__ ss) { for (int i = 0; i < strlen(ss); ++i) { bar(*ss); } } 

I would expect strlen(ss) be taken out of the loop under these essentially ideal conditions; and yet - this is neither clang 5.0 nor gcc 7.3 with maximum optimization ( -O3 ).

Why is this so?

Note: Inspired (my answer) this question .

+7
c compiler-optimization gcc hoisting clang
source share
3 answers

Other answers claim that the strlen call cannot be raised because the contents of the string can change between calls. These answers incorrectly take into account the semantics of restrict ; even if bar had access to the string through a global variable or some other mechanism, the semantics of restrict pointers to const types should (see caveat) prevent bar from changing the string.

From C11, draft N1570, 6.7.3.1 :

1 Let D be the declaration of a regular identifier that provides a means to assign an object P as a pointer with limited access to type T.

2 If D appears inside a block and does not have an external storage class, let B denote a block. If D appears in the list of parameters for declaring a function definition, let B denote the associated block. Otherwise, let B denote the main block (or block, any function is called when the program starts in the offline mode environment).

3 In what follows, the expression of the pointer E is called based on the object P, if (at some point in the sequence when B executes E), changing P to indicate a copy of the array object to which it had previously indicated will change the value of E. 137) Note. that `` based '' is defined only for expressions with pointer types.

4 During each execution of B, let L be any l-value having & L based on P. If L is used to access the value of the object X, then it and X are also modified (by any means), then the following requirements apply: T should not be const-qualified. All other values ​​used to access the value of X also have an address based on P. Each access that modifies X is also considered to change P, for the purpose of this subclause. If P is assigned the value a pointer expression E, which is based on another restricted pointer object P2 associated with block B2, then either execution B2 begins before execution B, or execution B2 must before assignment. If these requirements are not met, then the behavior is undefined.

5 Here, execution B means that part of the execution that would correspond to the lifetime of the object with the scalar type and duration of automatic storage associated with B.

Here, the declaration of D is const char* __restrict__ ss , and the associated block B is the body of foo . All lvalues ​​through which strlen accesses the string have ss based &L (see Caveat) and these calls occur at run time B (since, by definition in section 5, strlen is part of run B ). ss points to the type defined by the constant, so in section 4 the compiler is allowed to assume that the string elements accessed by strlen are not changed at runtime by foo ; their change will be undefined.

(caveat) The above analysis assumes strlen accesses a string through "normal" dereferencing or indexing of a pointer. If strlen uses methods such as the built-in SSE tools or the built-in assembly, it is not clear to me whether such accesses are technically counted using lvalue to access the value of the object that it designates. If they are not considered as such, restrict protection may not be applied, and the compiler will not be able to perform the upgrade.

Perhaps the above warning precludes restrict protection. Maybe the compiler does not know enough about the definition of strlen to analyze its interaction with restrict (I am surprised that it was not built-in). Perhaps the compiler was free to perform the lift and simply did not realize it; perhaps some relevant optimization is not implemented or it was not possible to distribute the necessary information between the right components of the compiler. Determining the exact cause will require much more familiarity with the internal components of GCC and Clang than I do.

Further simplified tests are eliminated by strlen , and the loop shows that Clang definitely has some support for a constraint-pointer to constant optimization, but I have not been able to observe such support from GCC.

+5
source share

Since strlen is passed a pointer, and it is possible that the contents of the memory it points to will change between calls to strlen , so optimizing the call can lead to errors. If you can guarantee gcc that the function will always return the same value, it will optimize it. From the documentation of attribute attributes :

Const

Many functions do not check values ​​other than their arguments, and have no effect other than returning a value. Calls for such features lend themselves to optimization, such as eliminating common subexpression. The const attribute imposes greater restrictions on the definition of functions than the similar pure attribute below, since it prohibits functions from reading global variables. Therefore, having an attribute in function declarations allows GCC to generate more efficient code for some function calls. The same function with the const and pure attribute is diagnosed.

Thus, choosing the external dependency on strlen , look at the difference in the following two compilations:

 int baz (const char* s) __attribute__ ((pure)); void foo(const char* __restrict__ ss) { for (int i = 0; i < baz(ss); ++i) bar(*ss); } 

Productivity:

 foo: push rbp push rbx mov rbp, rdi xor ebx, ebx sub rsp, 8 jmp .L2 .L3: movsx edi, BYTE PTR [rbp+0] add ebx, 1 call bar .L2: mov rdi, rbp call baz cmp eax, ebx jg .L3 add rsp, 8 pop rbx pop rbp ret 

But if we change the pure attribute from baz to const , you will see that the call broke out of the loop:

 foo: push r12 push rbp mov r12, rdi push rbx xor ebx, ebx call baz mov ebp, eax jmp .L2 .L3: movsx edi, BYTE PTR [r12] add ebx, 1 call bar .L2: cmp ebp, ebx jg .L3 pop rbx pop rbp pop r12 ret 

So maybe you can look at your header files and see how strlen declared.

+2
source share

ss can be some kind of global variable, because you can call foo with some kind of global array, for example char str[100]; as your argument (e.g. having foo(str); in your main ) ...

and bar can change this global variable (then strlen(ss) can change in every loop).

BTW restrict may not mean you believe. Carefully read the section §6.7.3 of standard C11 and §6.7.3.1 . IMHO restrict in practice is mostly useful for two formal arguments of the same function, to express the fact that they are not “aliases” or “overlapping” pointers (if you assume that I really mean it) and maybe optimization efforts at restrict probably focused on such cases.

It is possible (but unlikely) that in your particular program the compiler can be optimized as you wish if you call it as gcc -flto -fwhole-program -O3 (for each translation unit and for the duration of the connection to the program). I will not bet on this (but I leave you to check).

Why is this the case?

As for the current GCC (or Clang ) is not optimized as you want it, because no one wrote such an optimization skip and included it in -O3 .

Compilers are not required to perform optimization, it is just allowed to perform some of them (at the choice of their developers).

Since this is free software, feel free to offer a patch contribution to GCC (or to Clang). You may need a whole year of work, and you are not sure that your optimization will be accepted (because in practice there are no codes as you show, or because your optimization will be too specific, so it is unlikely to be launched, but will slow down the compiler anyway). But you can try.

Even if §6.7.3.1 allows optimization (as the answer by the user 2357112 demonstrates), it would hardly be worth the effort to implement it.

(my intuition is that the implementation of such optimization is difficult and will not bring much results to existing programs)

By the way, you can definitely experiment with this optimization by encoding some GCC plugin by doing this (since the plugin framework was designed for such experiments). You may find that such optimization requires a lot of work and practically does not improve the performance of most existing programs (for example, in the Linux distribution), because few people code it.

Both GCC and Clang are free software projects, and their participants are (from the point of view, for example, FSF) volunteers. So, feel free to improve GCC (or Clang) as you want it to optimize . From past experience, introducing a small piece of code into GCC is time consuming. And GCC is a huge program (about ten million lines of code), so understanding its internal components is not easy.

+1
source share

All Articles