A bit confused with Big O

Question

A bit confused with Big O

So, I have a quick question on how to test a great O function.

for example: a quick sort algorithm that sorts an array of 5,000,000 elements gives a time interval of 0.008524 seconds, while the same algorithm with a 1,000,000 element gives 0.017909. How can I check big O if my quicksort / not big O n * log (n) ??

I think I understand: n is increased by 2, so should the runtime increase by 2 * log (2)?

f (n) = 0.008524 → n log (n)

f (2n) = 0.017909 → 2n * log (2n)

+1

big o

kingcong3 Jun 2 '11 at 0:02

source share

5 answers

Nemo · Answer 1 · 2011-06-02T00:10:43+0000

The designation Big-O is asymptotic. This means that it applies only in the limit when n becomes large.

There are many reasons why sorting with 50 and 100 elements might not keep track of O (n log n), and caching is a likely candidate. If you try 100,000 against 200,000 against 1 million, you will probably find it a little better.

Another possibility is that most quick-sort implementations are just O (n log n) on average; some inputs will take longer. The probability of collision with such a pathological input is higher for 50 elements than for 100,000.

Ultimately, you do not "check" the Big-O runtime; you prove it based on an algorithm.

Oliver Charlesworth · Answer 2 · 2011-06-02T00:05:10+0000

The Big-O designation is usually not associated with execution time in seconds. This concerned the number of operations that were performed by the algorithm. The only way to set this up is by looking at the code for the function.

Not only runtime will depend on lower order conditions (remember that a note with a large number applies only to the highest deadline), but also things such as program launch overhead, cache efficiency, and branch prediction.

Having said all this, it is possible that none of these other effects will be significant in your case. In this case, if n doubles, you expect the runtime to increase from knlog (n) to k.2n.log (2n) = k (2n.log (2) + 2n.log (n)).

Jerry Coffin · Answer 3 · 2011-06-02T04:21:51+0000

There are a few points to keep in mind: first of all, as you resize, the hardware (at least on a typical computer) will also have an effect. In particular, when your data gets large to fit a particular cache size, you can expect a significant transition at run time, completely independent of the algorithm in question.

To get an idea of the right algorithm, you should (probably) start by comparing with some algorithm with really obvious complexity, but working with the same data size. For one obvious possibility, the time it takes to fill your array with random numbers. At least assuming a fairly typical PRNG, it certainly should be linear.

Then run your algorithm and see how it changes relative to the linear algorithm for the same sizes. For example, you can use the following code:

#include <vector> #include <algorithm> #include <iostream> #include <time.h> #include <string> #include <iomanip> class timer { clock_t begin; std::ostream &os; std::string d; public: timer(std::string const &delim = "\n", std::ostream &rep=std::cout) : os(rep), begin(clock()), d(delim) {} ~timer() { os << double(clock()-begin)/CLOCKS_PER_SEC << d; } }; int main() { static const unsigned int meg = 1024 * 1024; std::cout << std::setw(10) << "Size" << "\tfill\tsort\n"; for (unsigned size=10000; size <512*meg; size *= 2) { std::vector<int> numbers(size); std::cout << std::setw(10) << size << "\t"; { timer fill_time("\t"); std::fill_n(numbers.begin(), size, 0); for (int i=0; i<size; i++) numbers[i] = rand(); } { timer sort_time; std::sort(numbers.begin(), numbers.end()); } } return 0; }

If I type the time to fill and the sort time, I get something like this:

Since our dimensions are exponential, our linear algorithm shows a (roughly) exponential curve. Sorting time obviously grows (somewhat) faster.

Change In fairness, it should be added that log (N) grows so slowly that for almost any amount of data it is very small. For most practical purposes, you can simply treat quicksort (for example) as linear in size, just with a slightly larger constant factor than filling the memory. The growing size is linear and the graphical result makes this more obvious:

If you look carefully, you can see the top row showing only a small upward curve from the coefficient "log (N)". On the other hand, I’m not sure that I would have noticed curvature at all if I had not yet known that it should be there.

Jonathan leffler · Answer 4 · 2011-06-02T00:09:57+0000

Two data points are actually not enough.

However, 2 * n * log (2 * n) = 2 * n * (log (n) + log (2)), so you can see that the multiplier should be approximately 2 * log (2) when you are double. This looks plausible for the numbers you gave. You should add a few more points and double check.

Please note that there will probably be some kind of constant term in your time, at least if you enable the launch of the program there, it can be significant. If, on the other hand, you only have the sorting phase time without starting, this is more accurate. You have to repeat the process over many permutations of the input data to make sure that you get a representative set of values.

TheChes44 · Answer 5 · 2011-06-02T00:10:47+0000

Using this code that you are looking at depends on how long it takes to run the method through 50 elements compared to 100, not the time itself. As if I were iterating through an array, this would be O (n) linear time, because the array had to go through each index to the end. N represents how large the array will be. In addition, the Big O notation is intended for a general outline of things, ultimately. If you average the time for 50, 100, 1000, 100000, 1,000,000, you would see that on average it will have O (nlog (n)).

A bit confused with Big O

More articles: