Could an algorithm be faster than a linear search?

I heard that the algorithm faster does not work faster than the linear search (for an unsorted array), but when I run this algorithm (linear):

public static void search(int[] arr, int value){ for(int i = 0; i < arr.length; i++){ if(arr[i] == value) return; } } 

With a random array of 1,000,000 in length, the average time it takes to find the value is 75 ns, but with this algorithm:

 public static void skipSearch(int[] arr, int value){ for(int i = 0; i < arr.length; i+=2){ if(arr[i] == value) return; } for(int i = 1; i < arr.length; i+=2){ if(arr[i] == value) return; } } 

Am I getting a shorter average, 68 ns?

Edit: Many of you say that I did not do the right test, and it was by chance, but I performed these functions 1,000,000 times and got the average value. And every time I ran functions 1,000,000 times, I got 75-76ns for the first algorithm and 67-69ns for the second algorithm.

I used java System.nanoTime() to measure this.

the code:

 int[] arr = new int[1000]; Random r = new Random(); for(int i = 0; i < arr.length; i++){ arr[i] = r.nextInt(); } int N = 1000000; long startTime = System.nanoTime(); for(int i = 0; i < N; i++){ search(arr, arr[(int) Math.floor(Math.random()*arr.length)]); } System.out.println("Average Time: "+(System.nanoTime()-startTime)/(float)N+"ns"); startTime = System.nanoTime(); for(int i = 0; i < N; i++){ skipSearch(arr, arr[(int) Math.floor(Math.random()*arr.length)]); } System.out.println("Average Skip Search Time: "+(System.nanoTime()-startTime)/(float)N+"ns"); 
+8
java algorithm search
source share
8 answers

It is possible that since your search() methods return nothing and there is no action inside the loops, the JIT compiler in your JVM optimizes the code — in other words, modifies the byte code before loading it into the JVM, so your search() methods most likely (almost) nothing is being executed. Most importantly, it probably also completely removes the loops. JIT optimization is pretty smart, it can identify many situations where there is no need to load any code into the JVM (however, the code is in the .class byte code file).

Then you measure only random numbers, not the complexity of your methods in real time.

Reading, for example. to make sure there is no jvm and compiler optimization , apply it and run your test again.

Also modify the search() methods to return an index, making life harder for the optimizer. However, it is sometimes unexpectedly difficult to create code that cannot be optimized :). Disabling optimization (as in the link above) is more reliable.


As a rule, it makes no sense to test non-optimized code. However, in this case, the OP wants to measure the theoretical algorithm. He wants to measure the actual number of passes. He must make sure that the cycles are actually executed. This is why it should disable optimization.

The OP thought that he measured the speed of the algorithm, while in reality the algorithm did not even have the ability to run at all. The transformation of JIT optimization in this particular case captures the benchmark.

+31
source share

That’s why we don’t care about literally counting how much time needs to be done, and more, how things grow in scale as the complexity of inputs increases. Look at the asymptotic runtime analysis:

https://en.wikipedia.org/wiki/Analysis_of_algorithms

+18
source share

What is value statistics? Most likely, this is even the value in your case. It is clear that for both cases the complexity of the algorithm is O(n) and O(n/2) + O(n/2) , which is almost the same - linear time

+7
source share

It is just by chance that he is "faster." You probably noticed that your values ​​appear more often on an even index than on an odd index.

+5
source share

In theory, the time complexity of both algorithms is the same O(n) . One of the suggestions why skipSearch was faster when you ran it was that the item you were looking for was located at an even index, so it would be found in the first loop, and in the worst case, it would do half the number of iterations of a linear search . In such tests, you not only need to consider the size of the data, but also how the data looks. Try to find an element that does not exist, an element that exists in an even index, an element that exists in an odd index.

In addition, even if this skipSearch works better using proper tests, it still reduces only a few nanoseconds, so there is no significant increase, and this should not be used in practice.

+3
source share

One of the problems mentioned by someone was that you use different indexes for each algorithm. So, to fix this, I reworked your code a bit. Here is the code I have:

 int[] arr = new int[1000]; Random r = new Random(); for(int i = 0; i < arr.length; i++){ arr[i] = r.nextInt(); } int N = 1000000; List<Integer> indices = new ArrayList<Integer>(); for(int i = 0; i < N; i++){ //indices.add((int) Math.floor(Math.random()*arr.length/2)*2); //even only indices.add((int) Math.floor(Math.random()*arr.length/2)*2+1); //odd only //indices.add((int) Math.floor(Math.random()*arr.length)); //normal } long startTime = System.nanoTime(); for(Integer i : indices) { search(arr, arr[i]); } System.out.println("Average Time: "+(System.nanoTime()-startTime)/(float)N+"ns"); startTime = System.nanoTime(); for(Integer i : indices) { skipSearch(arr, arr[i]); } System.out.println("Average Skip Search Time: "+(System.nanoTime()-startTime)/(float)N+"ns"); 

So, you will notice that I made an ArrayList<Integer> to store the indices, and I provide three different ways to populate this list of arrays - one with even numbers, one with odd numbers and your original random method.

Running with even numbers only causes this output:

Average time: 175.609ns

Average search time: 100.64691ns

Running with odd numbers only causes this output:

Average time: 178.05182ns

Average search time: 263.82928ns

Running with the original random value produces this output:

Average time: 175.95944ns

Average search time: 181.20367ns

Each of these results makes sense.

If you select only indices, your skipSearch algorithm is set to O (n / 2), so we process no more than half of the indices. Usually we don’t need constant factors in time complexity, but if we really look at runtime, then it matters. In this case, we literally halve the problem, so this will affect the execution time. And we see that the real runtime is almost cut in half, respectively.

When choosing only odd indices, we see a much greater effect on runtime. This is to be expected because we process at least half of the indexes.

When using the original random selection, we see that skipSearch does worse (as we expect). This is due to the fact that on average we will have an even number of even indices and odd indices. Even numbers will be found quickly, but odd numbers will be found very slowly. A linear search will find index 3 at the beginning, while skipSearch processes approximately O (n / 2) elements before it finds index 3.

As for why your source code gives odd results, as far as I can tell in the air. Maybe the pseudo-random number generator slightly supports even numbers, this may be due to optimization, this may be due to the madness of branch prediction. But this, of course, was not a comparison of apples with apples, choosing random indices for both algorithms. Some of these things can still affect my results, but at least two algorithms are trying to find the same numbers.

+3
source share

Both algorithms do the same thing, which is faster, depending on where the value you are looking for is located, so this is a match that is faster in ONE particular case.

But firstly, the best coding style.

+2
source share

When people call linear search the “fastest search,” this is a purely academic statement. This has nothing to do with benchmarks, but rather the complexity of the Big O search algorithm. To make this measurement useful, Big O identifies only the worst case scenario for this algorithm.

In the real world, data does not always correspond to the worst case scenario, so the tests will vary for different data sets. In your example, there is 7ns difference between the two algorithms. However, what happens if your data looks like this:

 linear_data = [..., value]; skip_search_data = [value, ...]; 

This 7ns difference will be much larger. For a linear search, the complexity will be O (n) every time. For a missed search, this will be O (1) every time.

In the real world, the “fastest” algorithm is not always the fastest. Sometimes your data set lends itself to a different algorithm.

0
source share

All Articles