Nested Loop Efficiency

See the following snippet:

Long first_begin = System.currentTimeMillis(); // first nested loops for (int i = 0; i < 10; i++) { for (int j = 0; j < 1000000; j++) { // do some stuff } } System.out.println(System.currentTimeMillis() - first_begin); // second nested loops Long seconde_begin = System.currentTimeMillis(); for (int i = 0; i < 1000000; i++) { for (int j = 0; j < 10; j++) { // do some stuff } } System.out.println(System.currentTimeMillis() - seconde_begin); 

I wonder why the first nested loops are slower than the second?

Hello!

Important Note! : I am sorry that I made the variable j starting with 1 by chance, when this question was asked first, I made a correction.

Update: there is no specific logic in the cycle, I just do a test, in fact this is the question asked during the interview, and the interviewer tells me to change the order of the cycles to achieve better performance. By the way, I am using JDK1.5. after some test, I'm more confused because the result of the program is not consistent --- once the first cycle runs faster than the second, but most of the time it runs slower than the second.

+7
nested-loops
source share
4 answers

This answer is for an updated question:

  • If you are accessing a two-dimensional array, for example int[][] , then with a large value in the inner loop it should be slower. Not much, but still. To get a little understanding of the problem, read about Schlemiel street artist in one of Joel's blog posts.
  • The reason you get inconsistent results is because you are not using JVM warming up. The JVM constantly analyzes the bytecode that runs and optimizes it, as a rule, only after 30-50 iterations does it work at the optimal speed. Yes, this means that you need to run the code a few dozen times first, and then compare it with the average number of dozens more starts because of the garbage collector, which will slow down a couple of runs.
  • General note: using a Long object instead of a Long primitive is just dumb, the JVM will most likely optimize it by replacing it with a primitive if possible, and if it cannot, there should be some (albeit extremely minor) constant slowdown from its use.
+4
source share

EDIT: original answer below. Now that you have corrected the example so that all loop variables start at 0, we return to a simple lack of information. It seems likely that this is the coherence / locality of the link cache - but we just guess. If you could provide a short but complete program that demonstrates the problem, it would help ... as if to tell us which language / platform we are talking from, to start with!


The first loop has 10 * 999999 = 9999990 iterations. The second loop has 1,000,000 * 9 = 9,000,000 iterations. Therefore, I would expect (ceteris paribus) the first cycle would take longer.

However, you did not indicate what work you are doing or on which platform it works. There are many things that can affect things:

  • The second loop can best improve the cache.
  • If you are using a JIT-compiled platform, JIT may have preferred to optimize the second loop more.
  • The operations you perform may themselves have caching or something like that
  • If you are doing a bit of work, but first you need to load and initialize a bunch of types, which may cause the first loop to be slower.
+6
source share

The question has moved. These are not the droids you are looking for ...

Because you do ~ 1,000,000 times more work in the first example .; -) Strike>

+2
source share

If you look at the generated bytecode, the two loops are almost identical. EXCEPT that when it fulfills the while condition for cycle 10, Java gets 10 as the immediate value from the statement, but when it fulfills the while condition for cycle 1000000, Java loads 1,000,000 from the variable. I have no information about how long it takes to execute each command, but it seems likely that immediate loading will be faster than loading from a variable.

Please note that in the first cycle, a comparison with 1,000,000 should be performed 10 million times, and in the second cycle, only 1 million times. Of course, a comparison with 10 is performed much more often in the second cycle, but if the variable load is much slower than the direct load, this explains the results you see.

+2
source share

All Articles