I am implementing a custom iterator for a container type other than STL, and came across the following behavior, which at this stage seems a little unexpected to me.
It seems that there is significant success in performance when you define an "empty" dtor? Why??
To try to figure this out, I used a simple iterator for std :: vector to compare performance directly with the standard STL iterator. For a fair test, I just copied a simplified implementation from "vector.hpp" and experimented with adding an additional "empty" dtor:
template <typename _Myvec> class my_slow_iterator // not inheriting from anything!! { public : _Myvec::pointer _ptr;
Then I modified std :: vector so that I can return its new iterator type and use the following for comparison: sort a vector of 2,000,000 random numbers averaged over three runs:
std::vector vec; // fill via rand(); int tt = clock(); std::sort(vec.begin(), vec.end()); tt = clock() - tt; // elapsed time in ms
I got the following results (VS2010, Release build, _ITERATOR_DEBUG_LEVEL 0, etc.):
- Using the standard STL iterator: 550 ms.
- Using
my_slow_iterator
when deleting an empty dtor: 560 ms. - Using
my_slow_iterator
with empty dtor turned on: 900 ms.
It seems that an empty dtor in this case causes a deceleration of about 40%.
Obviously, if dtor is empty, then why is it needed, but I expected that simple "empty" functions like this would be rejected and optimized at compile time. If this is not the case, then I would like to understand what happens if this type of problem has consequences in more complex cases.
EDIT: compiled with O2 optimization.
EDIT: dig a little further, it seems that a similar effect occurs with a copy of ctor. Initially (and in the above tests) my_slow_iterator
does not have a copy-ctor identifier, so it uses the default compiler.
If I define the following instance-ctor (which does nothing more than I expected from the compiler created):
my_slow_iterator ( const my_slow_iterator<_Myvec> &_src ) : _ptr(_src._ptr) {}
I see the following results for the same test as above:
- Using
my_slow_iterator
, dtor removed, copy-ctor enabled: 690ms - Using
my_slow_iterator
, dtor enabled, copy-ctor enabled: 980 ms
This is another (albeit not so sharp) performance.
Why / how are the default functions for the compiler much more efficient? Does the user ctor / dtor implicitly determine to do something in the background?