Why can't VC ++ optimize an integer shell?

Question

Why can't VC ++ optimize an integer shell?

In C ++, I am trying to write a wrapper around an integer of 64 bits. My assumption is that if correctly written and all methods are nested, such a wrapper should be as effective as the real type. The answer to this question on SO seems to agree with my expectation.

I wrote this code to test my expectation:

class B { private: uint64_t _v; public: inline B() {}; inline B(uint64_t v) : _v(v) {}; inline B& operator=(B rhs) { _v = rhs._v; return *this; }; inline B& operator+=(B rhs) { _v += rhs._v; return *this; }; inline operator uint64_t() const { return _v; }; }; int main(int argc, char* argv[]) { typedef uint64_t; //typedef BT; const unsigned int x = 100000000; Utils::CTimer timer; timer.start(); T sum = 0; for (unsigned int i = 0; i < 100; ++i) { for (uint64_t f = 0; f < x; ++f) { sum += f; } } float time = timer.GetSeconds(); cout << sum << endl << time << " seconds" << endl; return 0; }

When I run this using typedef BT ; instead of typedef uint64_t T reported time was sequentially compiled with VC ++ 10% slower. With g ++, the characteristics are the same if I use the shell or not.

Since g ++ does this, I think there are no technical reasons why VC ++ cannot optimize it correctly. Is there something I can do to optimize it?

I already tried playing with the optimization flag without success

+7

c ++ optimization visual-c ++ wrapper

Mathieu pagé Feb 04 '15 at 13:01

source share

2 answers

TC · Answer 1 · 2015-02-04T13:12:46+0000

For the record, this is what the g ++ and clang ++ assembly generates in -O2 (in both cases, shells and without shells), modulo the time part:

 sum = 499999995000000000; cout << sum << endl;

In other words, he completely optimized the cycle. No matter how hard you try to vectorize the loop, it's pretty hard to beat, not loop :)

Daerst · Answer 2 · 2015-02-04T13:08:58+0000

Using /O2 (maximum speed), both alternatives generate exactly the same build using Visual Studio 2012. This is your code, minus the time and exit:

 00FB1000 push ebp 00FB1001 mov ebp,esp 00FB1003 and esp,0FFFFFFF8h 00FB1006 sub esp,8 00FB1009 mov edx,64h 00FB100E mov edi,edi 00FB1010 xorps xmm0,xmm0 00FB1013 movlpd qword ptr [esp],xmm0 00FB1018 mov ecx,dword ptr [esp+4] 00FB101C mov eax,dword ptr [esp] 00FB101F nop 00FB1020 add eax,1 00FB1023 adc ecx,0 00FB1026 jne main+2Fh (0FB102Fh) 00FB1028 cmp eax,5F5E100h 00FB102D jb main+20h (0FB1020h) 00FB102F dec edx 00FB1030 jne main+10h (0FB1010h) 00FB1032 xor eax,eax

I would suggest that measured times fluctuate or are not always correct.

Why can't VC ++ optimize an integer shell?

More articles: