Although all the answers seem right so far ... none of them feel "right" in the context of the CS class. In the course of computational complexity, you want to be precise and use definitions. I will describe the many nuances of this issue and the complexity of computing in general. By the end, we will go why the Itay solution at the top is probably you should have written. My main problem with Itay's solution is that it lacks definitions that are the key to writing good proof for the CS class. Please note that my definitions may be slightly different from your class, so feel free to replace whatever you want.
When we say “ O(n) algorithm”, we really mean “this algorithm is in the O(n) ”. And the set O(n) contains all the algorithms, the worst asymptotic complexity f(n) has the property f(n) <= c*n + c_0 for some constant c and c_0 , where c, c_0 > 0 .
Now we want to prove your claim. First of all, as you stated about the problem, it has a trivial solution. This is because our asymptotic estimates are the “worst”. For many “slow” algorithms, there is some contribution that is very fast. For example, insertion sort is linear if the input is already sorted! So take insertion sort ( O(n) ) and sort ( O(nlog(n)) ) and note that insertion sorting will work faster if you go through the sorted array! Boom, proof.
But I suppose your exam meant more than "show why a linear algorithm can be faster than a quadratic algorithm in the worst case." As Alex noted, this is an open question. The essence of the problem is that runtime analysis makes assumptions that certain operations are O(1) (for example, for some problem you can assume that multiplication is equal to O(1) , although it becomes quadratically slower for large numbers (you can assert that the numbers for this problem are limited to 100 bits, so it is still "constant time")). Since your class probably focuses specifically on computational complexity, they probably want you to mask this issue. Thus, we will prove a statement assuming that our O(1) assumptions are true, and therefore there are no details such as "caching makes this algorithm faster than the other."
So now we have one algorithm that works in f(n) , which is O(n) and some other algorithm that works in g(n) , which is O(n^2) . We want to use the above definitions to show that for some n we can have g(n) < f(n) . The trick is that our assumptions did not fix c, c_0, c', c_0' . As Itai mentions, we can choose values for those constants such that g(n) < f(n) for many n . The rest of the proof is what he wrote above (for example, let c, c_0 be constants for f(n) and say that they are 100, and c', c_0' be constants for g(n) , and they both are equal to 1. Then g(n) < f(n) => n + 1 < 100n^2 + 100 => 100n^2 - n + 99 > 0 => (factor to get actual bounds for n) )