Java CAS operation is faster than C, why?

  • Here I have Java and C code that is trying to perform an atom increment operation using CAS.
  • To increase a long variable from 0 to 500,000,000.
  • C: Time: 7300ms
  • Java: time: 2083 m.
  • Can anyone double check these results? Because I just can't believe them.
  • thanks

Java Code:

import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicLong; public class SmallerCASTest { public static void main(String[] args){ final long MAX = 500l * 1000l * 1000l; final AtomicLong counter = new AtomicLong(0); long start = System.nanoTime(); while (true) { if (counter.incrementAndGet() >= MAX) { break; } } long casTime = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start); System.out.println("Time Taken=" + casTime + "ms"); } } 

C code:

 #include <stdio.h> #include <stdlib.h> #include <time.h> #define NITER 500000000 int main (){ long val = 0; clock_t starttime = clock (); while (val < NITER){ while (1){ long current = val; long next = current+1; if ( __sync_bool_compare_and_swap (&val, current, next)) break; } } clock_t castime = (clock()-starttime)/ (CLOCKS_PER_SEC / 1000); printf ("Time taken : %d ",castime); } 

run.sh

 #!/bin/bash gcc -O3 test.c -o test.o echo -e "\nC" ./test.o javac SmallerCASTest.java echo -e "\nJava" java SmallerCASTest 

Other information:

 System : Linux XXXXXXXXX #1 SMP Thu Mar 22 08:00:08 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux gcc --version: gcc (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3) java -version: java version "1.6.0_31" Java(TM) SE Runtime Environment (build 1.6.0_31-b04) Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode) 
+7
source share
3 answers

You compare apples to oranges, as I am sure you expected. The java version is the true CAS with an attempt to retry on failure, and the C version uses what I would call in the form of java a synchronized .

See this question for more details.

See this answer to this question for storytelling support, where it says A full memory barrier is created when this function is invoked , that is, in java terms, this is a synchronized call.

Try using _compare_and_swap in the same way AtomicLong uses its java equivalent, i.e. twists the function until the value changes to what you want.

Added:

I cannot find the final C ++ equivalent of java AtomicLong , but that does not mean that it is not. Essentially, AtomicLong can be changed by any thread at any time, and only one of them will be successful. However, the change will be consistent, i.e. Change will be the result of a change in one stream or another, it will not be a combination of the two. If thread A tries to change the value to 0xffff0000 (or an equivalent 64-bit number), while thread B tries to change the value 0x0000ffff (the same), the result will be either two values, more specifically it will not be 0x00000000 or 0xffffffff (if , of course, the third stream is not involved).

Essentially, AtomicLong has no synchronization whatsoever besides this.

+5
source

EDIT Indeed, java seems to implement incrementAndGet using the CAS operation, as you specify.

My testing seems to suggest that the versions of C and Java have roughly equivalent performance (which makes sense since the consuming part is atomic, and not any optimization of the rest that java or C compilers can do).

So, on my machine (Xeon X3450), the java version takes ~ 4700 ms, the C version ~ 4600 ms, the C version using __sync_add_and_fetch () ~ 3800 ms (assuming that java can be improved here instead of implementing all atomic operations on CAS )

java version

 java version "1.6.0_24" OpenJDK Runtime Environment (IcedTea6 1.11.4) (6b24-1.11.4-1ubuntu0.10.04.1) OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode) 

GCC is 4.4.3, x86_64.

OS - Ubuntu 10.04 x86_64.

Therefore, I can only conclude that something seems suspicious in your tests.

+2
source

Because Java is awesome?

The java version takes 4ns for each loop. It's right. Restless CAS is actually a local processor; it must be very fast. (edit: maybe not 4ns fast!)

Java achieves this speed with aggressive runtime optimization, the code is embedded and becomes just a few machine instructions, i.e. as fast as you can execute manual code in the assembly.

If the gcc version was unable to inline the function call, this is a lot of overhead for each loop.

0
source

All Articles