Virtual Functions and C ++ Performance

Before you eat up a duplicate of the name, another question is not suitable for what I am asking here (IMO). So.

I really want to use virtual functions in my application to make things a hundred times easier (not that OOP is all about;)). But somewhere I read that they came to the cost of execution, not seeing anything but the same old far-fetched hype from premature optimization, I decided to give her a quick whirlwind in a small test test, using:

CProfiler.cpp

#include "CProfiler.h" CProfiler::CProfiler(void (*func)(void), unsigned int iterations) { gettimeofday(&a, 0); for (;iterations > 0; iterations --) { func(); } gettimeofday(&b, 0); result = (b.tv_sec * (unsigned int)1e6 + b.tv_usec) - (a.tv_sec * (unsigned int)1e6 + a.tv_usec); }; 

main.cpp

 #include "CProfiler.h" #include <iostream> class CC { protected: int width, height, area; }; class VCC { protected: int width, height, area; public: virtual void set_area () {} }; class CS: public CC { public: void set_area () { area = width * height; } }; class VCS: public VCC { public: void set_area () { area = width * height; } }; void profileNonVirtual() { CS *abc = new CS; abc->set_area(); delete abc; } void profileVirtual() { VCS *abc = new VCS; abc->set_area(); delete abc; } int main() { int iterations = 5000; CProfiler prf2(&profileNonVirtual, iterations); CProfiler prf(&profileVirtual, iterations); std::cout << prf.result; std::cout << "\n"; std::cout << prf2.result; return 0; } 

At first I did only 100 and 10,000 iterations, and the results were worried: 4 ms for non-virtualized and 250 ms for virtualized! I almost went "nooooooo" inside, but then I increased the iteration to 500,000; to see that the results become almost completely identical (perhaps 5% slower without optimization flags).

My question is: why was there such a significant change with a small number of iterations compared to a large number? Was it purely because virtual functions are hot in the cache with so many iterations?

Denial of responsibility
I understand that my “profiling” code is not perfect, but it, as it is, gives an assessment of things, and that’s all that matters at this level. I also ask these questions in order to learn, and not just optimize my application.

+7
source share
8 answers

Extension of Charles's answer .

The problem is that your loop does more than just testing the virtual call itself (memory allocation probably overshadows the overhead of the virtual call anyway), so its suggestion is to change the code to test only the virtual call.

Here, the reference function is a template, because a template can be embedded, while calling through function pointers is unlikely to be.

 template <typename Type> double benchmark(Type const& t, size_t iterations) { timeval a, b; gettimeofday(&a, 0); for (;iterations > 0; --iterations) { t.getArea(); } gettimeofday(&b, 0); return (b.tv_sec * (unsigned int)1e6 + b.tv_usec) - (a.tv_sec * (unsigned int)1e6 + a.tv_usec); } 

Classes:

 struct Regular { Regular(size_t w, size_t h): _width(w), _height(h) {} size_t getArea() const; size_t _width; size_t _height; }; // The following line in another translation unit // to avoid inlining size_t Regular::getArea() const { return _width * _height; } struct Base { Base(size_t w, size_t h): _width(w), _height(h) {} virtual size_t getArea() const = 0; size_t _width; size_t _height; }; struct Derived: Base { Derived(size_t w, size_t h): Base(w, h) {} virtual size_t getArea() const; }; // The following two functions in another translation unit // to avoid inlining size_t Derived::getArea() const { return _width * _height; } std::auto_ptr<Base> generateDerived() { return std::auto_ptr<Base>(new Derived(3,7)); } 

And measurement:

 int main(int argc, char* argv[]) { if (argc != 2) { std::cerr << "Usage: %prog iterations\n"; return 1; } Regular regular(3, 7); std::auto_ptr<Base> derived = generateDerived(); double regTime = benchmark<Regular>(regular, atoi(argv[1])); double derTime = benchmark<Base>(*derived, atoi(argv[1])); std::cout << "Regular: " << regTime << "\nDerived: " << derTime << "\n"; return 0; } 

Note. This checks the virtual call overhead compared to a regular function. The functionality is different (because in the second case you do not dispatch at runtime), but in this case the worst case overhead.

EDIT

The launch results (gcc.3.4.2, -O2, Quadcore SLES10 server) note: with function definitions in another translation unit, to prevent embedding

 > ./test 5000000 Regular: 17041 Derived: 17194 

Not very convincing.

+5
source

I believe your test case is too artificial to be of any great value.

First, inside your profiled function, you dynamically select and free an object, and also call a function, if you want to profile only a function call, then you should do just that.

Secondly, you are not profiling a case where a virtual function call is a viable alternative to this problem. A virtual function call provides dynamic dispatch. You should try to profile a case, for example, when a call to a virtual function is used as an alternative to something using an anti-on-down pattern.

+11
source

With a small number of iterations, there is a chance that your code will be downloaded using some other program running in parallel or exchanging, or something else that isolates your program from the operating system, and you will have time when it was A suspended operating system is included in your test results. This is the number one reason why you should run your code about ten million times in order to measure something more or less reliably.

+3
source

I think such testing is pretty useless, in fact:
1) you spend time profiling calling gettimeofday() ;
2) you really do not experience virtual functions, and IMHO - this is the worst.

Why? Since you use virtual functions to not write things like:

 <pseudocode> switch typeof(object) { case ClassA: functionA(object); case ClassB: functionB(object); case ClassC: functionC(object); } </pseudocode> 

in this code, you skip the "if ... else" block so you don’t actually take advantage of virtual functions. This is a scenario where they always “lose” against non-virtual.

To do the proper profiling, I think you should add something like the code that I posted.

+2
source

There may be several reasons for the time difference.

  • Your sync function is not accurate enough.
  • the heap manager can influence the result because sizeof(VCS) > sizeof(VS) . What happens if you move new / delete from a loop?

  • Again, due to differences in size, the memory cache can indeed be part of the time difference.

BUT: you really have to compare similar functionality. When using virtual functions, you do this for some reason that calls another member function, depending on the identifier of the object. If you need this functionality and you do not want to use virtual functions, you will have to execute it manually, whether using a function table or even a switch statement. This is also worth what you need to compare with virtual functions.

+2
source

The effect affects the invocation of a virtual function, because it is slightly larger than the invocation of a regular function. However, the impact is likely to be completely negligible in a real application - even to a lesser extent than in even the most finely tuned tests.

In a real-world application, an alternative to a virtual function is usually related to the fact that you manually record some similar system, because the behavior of a virtual function call and a call to a non-virtual function is different - previous changes based on the type of runtime of the calling object. Your benchmark, even ignoring any flaws it has, does not measure equivalent behavior, but only syntax-equivalent. If you must introduce an encoding policy that prohibits virtual functions, you will have to either write some potentially very devious or confusing code (which may be slower), or re-implement the similar runtime dispatch system that the compiler uses to implement virtual ones (which, of course will not be faster than what the compiler does in most cases).

+2
source

Using too many iterations in the measurement has a lot of noise. The gettimeofday function gettimeofday not be accurate enough to give you good measurements for just a few iterations, not to mention the fact that it records the total time of the wall (including the time spent forcing out other threads).

On the bottom line, however, you should not come up with some ridiculously confusing design to avoid virtual functions. They are not really too much overhead. If you have incredibly powerful critical code, and you know that virtual functions make up most of the time, then maybe this is something to worry about. However, in any practical application, virtual functions will not slow down the application.

+1
source

In my opinion, when there were fewer cycles, there might not have been a context switch. But when you increase the number of cycles, there is a very strong chance that a context switch occurs, and this dominates reading. For example, the first program takes 1 second and the second program 3 seconds, but if the context switch takes 10 seconds, the difference is 13/11 instead of 3/1.

0
source

All Articles