Doing qsort vs std :: sort?

Question

Doing qsort vs std :: sort?

According to Scott Meyers, article 46 is effective in his book Effective STL. He claimed that std::sort is about 670% faster than std::qsort due to the inline fact. I checked myself and I saw that qsort is faster :( Can someone help me explain this strange behavior?

 #include <iostream> #include <vector> #include <algorithm> #include <cstdlib> #include <ctime> #include <cstdio> const size_t LARGE_SIZE = 100000; struct rnd { int operator()() { return rand() % LARGE_SIZE; } }; int comp( const void* a, const void* b ) { return ( *( int* )a - *( int* )b ); } int main() { int ary[LARGE_SIZE]; int ary_copy[LARGE_SIZE]; // generate random data std::generate( ary, ary + LARGE_SIZE, rnd() ); std::copy( ary, ary + LARGE_SIZE, ary_copy ); // get time std::time_t start = std::clock(); // perform quick sort C using function pointer std::qsort( ary, LARGE_SIZE, sizeof( int ), comp ); std::cout << "C quick-sort time elapsed: " << static_cast<double>( clock() - start ) / CLOCKS_PER_SEC << "\n"; // get time again start = std::clock(); // perform quick sort C++ using function object std::sort( ary_copy, ary_copy + LARGE_SIZE ); std::cout << "C++ quick-sort time elapsed: " << static_cast<double>( clock() - start ) / CLOCKS_PER_SEC << "\n"; }

This is my result:

 C quick-sort time elapsed: 0.061 C++ quick-sort time elapsed: 0.086 Press any key to continue . . .

Update

Effective STL 3rd Edition (2001)
Chapter 7 Programming in STL
Paragraph 46: Consider functional objects instead of functions as algorithm parameters.

Respectfully,

+63

c ++ performance sorting stl

Chan Jan 16 2018-11-21T00:

source share

7 answers

I am surprised that no one mentions hiding places.

In your code, you start by touching ary and * ary_copy * so that they are in cache during qsort. During qsort * ary_copy * can be evicted. During std :: sort, elements had to be extracted from memory or at a higher (slower) cache level. This, of course, will depend on the size of your cache.

Try changing the test, i.e. start by running std :: sort.

As some people pointed out; making the array larger will make the test fairer. The reason is that a large array is less likely to fit the cache.

+16

rasmus Mar 25 2018-11-11T00:

source share

Two sorting algorithms without optimizations enabled should have comparable performance. The reason C ++ sort tends to beat qsort noticeably is because the compiler can embed comparisons, because the compiler has type information about which function is used to perform the comparison. Have you run these tests with optimization enabled? If not, try turning it on and running this test again.

+11

templatetypedef Jan 16 '11 at

source share

Another reason qsort may work much better than expected is because new compilers can embed and optimize with a function pointer.

If the C header defines the built-in qsort implementation instead of its implementation inside the library, and the compiler supports an indirect function, then qsort can be as fast as std :: sort.

+9

Zan Lynx Jan 16 '11 at 21:45

source share

On my machine, some meat was added (creating an array of 10 million elements and moving it to the data section) and compiling with

 g++ -Wall -O2 -osortspeed sortspeed.cpp

I get the result

 C quick-sort time elapsed: 3.48 C++ quick-sort time elapsed: 1.26

Be also wary of modern green CPUs that can be configured to operate at variable speeds depending on system load. When benchmarking such behavior can drive you crazy (on my machine I installed two small scripts normal and fast , which can be used when performing speed tests).

+3

6502 Jan 16 2018-11-11T00:

source share

Writing accurate tests is difficult, so let Nonius do it for us! Let test qsort , std::sort without insertion and std::sort with the embedding (by default) of a vector of a million random numbers.

 // sort.cpp #define NONIUS_RUNNER #include <nonius.h++> #include <random> #include <algorithm> // qsort int comp(const void* a, const void* b) { const int arg1 = *static_cast<const int*>(a); const int arg2 = *static_cast<const int*>(b); // we can't simply return a - b, because that might under/overflow return (arg1 > arg2) - (arg1 < arg2); } // std::sort with no inlining struct compare_noinline { __attribute__((noinline)) bool operator()(const int a, const int b) { return a < b; } }; // std::sort with inlining struct compare { // the compiler will automatically inline this bool operator()(const int a, const int b) { return a < b; } }; std::vector<int> gen_random_vector(const size_t size) { std::random_device seed; std::default_random_engine engine{seed()}; std::uniform_int_distribution<int> dist{std::numeric_limits<int>::min(), std::numeric_limits<int>::max()}; std::vector<int> vec; for (size_t i = 0; i < size; i += 1) { const int rand_int = dist(engine); vec.push_back(rand_int); } return vec; } // generate a vector of a million random integers constexpr size_t size = 1'000'000; static const std::vector<int> rand_vec = gen_random_vector(size); NONIUS_BENCHMARK("qsort", [](nonius::chronometer meter) { // Nonius does multiple runs of the benchmark, and each one needs a new // copy of the original vector, otherwise we'd just be sorting the same // one over and over const size_t runs = static_cast<size_t>(meter.runs()); std::vector<std::vector<int>> vectors{runs}; std::fill(vectors.begin(), vectors.end(), rand_vec); meter.measure([&](const size_t run) { std::vector<int>& current_vec = vectors[run]; std::qsort(current_vec.data(), current_vec.size(), sizeof(int), comp); return current_vec; }); }); NONIUS_BENCHMARK("std::sort noinline", [](nonius::chronometer meter) { const size_t runs = static_cast<size_t>(meter.runs()); std::vector<std::vector<int>> vectors{runs}; std::fill(vectors.begin(), vectors.end(), rand_vec); meter.measure([&](const size_t run) { std::vector<int>& current_vec = vectors[run]; std::sort(current_vec.begin(), current_vec.end(), compare_noinline{}); return current_vec; }); }); NONIUS_BENCHMARK("std::sort inline", [](nonius::chronometer meter) { const size_t runs = static_cast<size_t>(meter.runs()); std::vector<std::vector<int>> vectors{runs}; std::fill(vectors.begin(), vectors.end(), rand_vec); meter.measure([&](const size_t run) { std::vector<int>& current_vec = vectors[run]; std::sort(current_vec.begin(), current_vec.end(), compare{}); return current_vec; }); });

Compilation with Apple Clang 7.3.0,

 $ clang++ -std=c++14 -stdlib=libc++ -O3 -march=native sort.cpp -o sort $ ./sort

and running it on my 1.7 GHz i5 Macbook Air, we get

 qsort 211 ms +/- 6 ms std::sort noinline 127 ms +/- 5 ms std::sort inline 87 ms +/- 4 ms

So std::sort without inlay is about 1.7 times faster than qsort (possibly due to different sorting algorithms), and striking, which is up to about 2.4x faster. Of course, impressive acceleration, but much less than 670%.

+3

Mr Bingley Feb 01 '16 at 22:39

source share

I don't know how std :: sort was implemented many years ago. But std :: sort can be much faster because std :: sort quicksort with heap backup. Heapsort is a linear algorithmic algorithm, i.e. if you have sort data twice, the sort time doubles. In fact, it more than doubles because it is not exactly linear, but nevertheless qsort can be quadratic, so it takes exponentially more time to sort twice the input.

-3

Dennis Beier Oct 10 '15 at 7:17

source share

Puppy · Accepted Answer · 2011-01-16 21:22

std :: clock () is not a viable clock timer. You must use a higher resolution timer for the platform, such as the high-performance Windows timer. Moreover, the way you call clock () is to first print the text to the console, which is turned on at the time. This definitely invalidates the test. Also, make sure you are compiled with all the optimizations.

Finally, I copied and pasted your code and got 0.016 for qsort and 0.008 for std :: sort.

Doing qsort vs std :: sort?

More articles: