My JVM does this quite straightforwardly in the inner loop when you use long s:
0x00007fdd859dbb80: test %eax,0x5f7847a(%rip) 0x00007fdd859dbb86: dec %r11 0x00007fdd859dbb89: mov %r11,0x258(%r10) 0x00007fdd859dbb90: test %r11,%r11 0x00007fdd859dbb93: jge 0x00007fdd859dbb80
It is cheating, badly when you use int s; at first there is some kind of talkativeness, which I do not claim to understand, but looks like a setting for an expanded cycle:
0x00007f3dc290b5a1: mov %r11d,%r9d 0x00007f3dc290b5a4: dec %r9d 0x00007f3dc290b5a7: mov %r9d,0x258(%r10) 0x00007f3dc290b5ae: test %r9d,%r9d 0x00007f3dc290b5b1: jl 0x00007f3dc290b662 0x00007f3dc290b5b7: add $0xfffffffffffffffe,%r11d 0x00007f3dc290b5bb: mov %r9d,%ecx 0x00007f3dc290b5be: dec %ecx 0x00007f3dc290b5c0: mov %ecx,0x258(%r10) 0x00007f3dc290b5c7: cmp %r11d,%ecx 0x00007f3dc290b5ca: jle 0x00007f3dc290b5d1 0x00007f3dc290b5cc: mov %ecx,%r9d 0x00007f3dc290b5cf: jmp 0x00007f3dc290b5bb 0x00007f3dc290b5d1: and $0xfffffffffffffffe,%r9d 0x00007f3dc290b5d5: mov %r9d,%r8d 0x00007f3dc290b5d8: neg %r8d 0x00007f3dc290b5db: sar $0x1f,%r8d 0x00007f3dc290b5df: shr $0x1f,%r8d 0x00007f3dc290b5e3: sub %r9d,%r8d 0x00007f3dc290b5e6: sar %r8d 0x00007f3dc290b5e9: neg %r8d 0x00007f3dc290b5ec: and $0xfffffffffffffffe,%r8d 0x00007f3dc290b5f0: shl %r8d 0x00007f3dc290b5f3: mov %r8d,%r11d 0x00007f3dc290b5f6: neg %r11d 0x00007f3dc290b5f9: sar $0x1f,%r11d 0x00007f3dc290b5fd: shr $0x1e,%r11d 0x00007f3dc290b601: sub %r8d,%r11d 0x00007f3dc290b604: sar $0x2,%r11d 0x00007f3dc290b608: neg %r11d 0x00007f3dc290b60b: and $0xfffffffffffffffe,%r11d 0x00007f3dc290b60f: shl $0x2,%r11d 0x00007f3dc290b613: mov %r11d,%r9d 0x00007f3dc290b616: neg %r9d 0x00007f3dc290b619: sar $0x1f,%r9d 0x00007f3dc290b61d: shr $0x1d,%r9d 0x00007f3dc290b621: sub %r11d,%r9d 0x00007f3dc290b624: sar $0x3,%r9d 0x00007f3dc290b628: neg %r9d 0x00007f3dc290b62b: and $0xfffffffffffffffe,%r9d 0x00007f3dc290b62f: shl $0x3,%r9d 0x00007f3dc290b633: mov %ecx,%r11d 0x00007f3dc290b636: sub %r9d,%r11d 0x00007f3dc290b639: cmp %r11d,%ecx 0x00007f3dc290b63c: jle 0x00007f3dc290b64f 0x00007f3dc290b63e: xchg %ax,%ax /* OK, fine; I know what a nop looks like */
then the expanded cycle itself:
0x00007f3dc290b640: add $0xfffffffffffffff0,%ecx 0x00007f3dc290b643: mov %ecx,0x258(%r10) 0x00007f3dc290b64a: cmp %r11d,%ecx 0x00007f3dc290b64d: jg 0x00007f3dc290b640
then the break code for the expanded cycle, the test itself and the direct cycle:
0x00007f3dc290b64f: cmp $0xffffffffffffffff,%ecx 0x00007f3dc290b652: jle 0x00007f3dc290b662 0x00007f3dc290b654: dec %ecx 0x00007f3dc290b656: mov %ecx,0x258(%r10) 0x00007f3dc290b65d: cmp $0xffffffffffffffff,%ecx 0x00007f3dc290b660: jg 0x00007f3dc290b654
So this happens 16 times faster for int, because the JIT unwrapped the int loop 16 times, but didn't unwrap the long loop at all.
For completeness, here is the code I really tried:
public class foo136 { private static int i = Integer.MAX_VALUE; public static void main(String[] args) { System.out.println("Starting the loop"); for (int foo = 0; foo < 100; foo++) doit(); } static void doit() { i = Integer.MAX_VALUE; long startTime = System.currentTimeMillis(); while(!decrementAndCheck()){ } long endTime = System.currentTimeMillis(); System.out.println("Finished the loop in " + (endTime - startTime) + "ms"); } private static boolean decrementAndCheck() { return --i < 0; } }
Assembly dumps were generated using the -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly . Note that you need to bother with your JVM installation so that this work is for you; you need to put some random shared library in the right place or it will fail.