Can Big (O) be confirmed by measurement?

Say you have developed an algorithm that you think works in O (n). If I measure the time when it is working with 1000 inputs, then increase the input 10x and then measure again. Can I conclude that O (n) is correct if the startup time is almost 10 times longer than the first attempt?

How stupid is that? Obviously repeating the test will give better accuracy, but I want to know if this really makes sense.

+7
algorithm big-o
source share
4 answers

Often the answer is yes. If you increase the problem size by 10, and the time increases by 10, you are probably right to accept O (N). However, the number is unlikely to be so beautiful.

If you go from 1000 to 10000, the O (N.logN) algorithm increases by about 13 times (see bc below). This is just around the corner 10, and you may mistakenly believe that an increase of 12 indicates O (N.logN), not O (N). However, if you increase by 10, and time increases by about 100, you are probably dealing with a non-linear algorithm - O (N 2 ). So, 2 points is not enough, but this is significant. Several launches and more data points help.

Sometimes, however, something else happens. For example, you may suddenly start using so much memory that your program is unloaded, and not just launched. This will drop dramatically, although the algorithm is still linear, given sufficient resources.

Also, be careful with caching effects and optimization effects. Caching can make things seem faster. If the optimizer concludes that the calculation is ignored, it may exclude the entire calculation. Therefore, you must be careful.

But, with a little luck, you can scale the problem by several orders of magnitude (or at least a few different numbers), and come up with a reasonable assumption as to whether it is linear or something else.

O (N.logN) for 1000 vs 10 000

 $ bc -l n=1000 n*l(n) 6907.75527898213705205000 a=n*l(n) m=n*10 m*l(m) 92103.40371976182736070000 b=m*l(m) b/a 13.33333333333333333333 quit $ 
+12
source share

Unlike the other answer, I will say no. However, you can get a pretty good guess (not even an estimate, since this is not suitable here). This probably means "often."

The fact is that you never know the constant factors. Big Oh - this is an asymptomatic behavior (at infinity), it is very useful to leave everything except the growing term. Therefore, mathematically you cannot confirm your assumption.

Firstly, there are many algorithms and use cases where asymptomatic behavior is not useful in real applications. Just because the distribution of "typical use" input data is falling. This most often happens. And you can still check / "rate" it.

But there is also a case where the best algorithm has such large constant factors that it is not applicable to modern systems. The best example I know is large number multiplication algorithms .

Nevertheless, there are systems that “approximate” (better mentioned) the complexity class of the algorithm. I'm not sure if this measures the coding or guesses their code analysis, but they can do it: https://codility.com/public-report-detail/ .

What can be done is to run the algorithm and resize the input, run the tests and fit the data to the model. It is pretty simple. Then you can say that for the test input range, the algorithms behave like O(class(n)) . (This can be of practical value, valuable even more than theoretical asymptotic complexity.)

Please note that selecting test points is not trivial. Basically, if your algorithm behaves "fast", you need to increase the input speed to the next class. For example. if you have something like (100n+n!) you can go n={1,10,100} because it will kill the runtime. However, the transition n={1,2,3,4,5,6,7} will not get part n! (ok 7! is 5040 , but for n^2 it will be much harder).

The bottom line, which received a pleasant assumption, is of course possible, but outside the simplest cases it can be difficult and difficult to do, and, unfortunately, it is quite difficult to say whether the situation is complicated or not.

In addition, this discussion is purely theoretical, skipping hardware effects. I heard about algorithms that n^2 behaved better than n^log n , because the previous one was always (very) cache friendly, but don’t take my word for it, I can’t remember the source.

+6
source share

The runtime input size for real programs is an extremely useful tool that lets you see how your code works. In general, it is dangerous to think that you can use it to determine complexity.

Here is a practical example of how it breaks.

Good quick sorting implements an array into three parts: less than a support rod equal to the axis of rotation, and more than a pivot point. This means that quick sorting of random arrays of 64-bit ints (actually sorting a random array of any type of data of a fixed size) uses O (n) comparisons, since in the end each sub-matrix will be constant.

Unfortunately, you cannot see it empirically: if you build n against the number of comparisons, the graph will look like n * log (n) until the input array becomes much larger than 2 ^ 64 elements. Even if you have enough memory, your programming language probably does not allow you to index arrays of this size.

This example also demonstrates that the empirical measurement gives you interesting data (that the code executes as n * log (n) on the actual input), and the complexity gives you the theoretical but almost useless fact that the asymptotic growth is linear.

+2
source share

In addition to what others have said. Sometimes the average case and the worst case can be different, and the worst case can be difficult to find. A well-known example is quicksort , with its average behavior of O(n log n) (abuse of the designation O ?) And O(n^2) worst-case behavior. If you are provided with a black box algorithm (well, a program), such worst cases can be difficult to catch experimentally without knowledge of the algorithm.

+2
source share

All Articles