Java Increment Test

I am investigating the best multi-threaded increment performance. I checked the implementation based on synchronization, AtomicInteger, and a custom implementation, as in AtomicInteger, but with parkNanos (1), with a failed CAS.

private int customAtomic() { int ret; for (;;) { ret = intValue; if (unsafe.compareAndSwapInt(this, offsetIntValue, ret, ++ret)) { break; } LockSupport.parkNanos(1); } return ret; } 

I did a test based on JMH: explicit execution of each method, each of which uses a processor (1,2,4,8,16 times) and consumes only a processor. Each reference method is performed on an Intel (R) Xeon (R) processor E5-1680 v2 @ 3.00 GHz, 8 Core + 8 HT 64Gb RAM, in 1-17 threads. The results surprised me:

  • CAS is most efficient in 1 thread. 2 thread - a similar result with the monitor. 3 or more - worse than a monitor, ~ 2 times.
  • In most cases, the user implementation is 2-3 times better than the monitor.
  • But in a user implementation, random execution sometimes happens. A good case is 50 op / microsec. A bad case is 0.5 op / microsec.

Questions:

  • Why is AtomicInteger not based on synchronization, is it more productive than the current impl?
  • Why doesn't AtomicInteger use LockSupport.parkNanos (1), doesn't it work on CAS?
  • Why is this happening in a custom implementation?

CustomIncrementGraph

I tried to run this test several times, and the surge always occurs in different numbers. I also tried this test on other machines, the result is the same. Maybe these are problems in the test. In the "bad case" of custom imports in StackProfiler, I see:

 ....[Thread state distributions].................................................................... 50.0% RUNNABLE 49.9% TIMED_WAITING ....[Thread state: RUNNABLE]........................................................................ 43.3% 86.6% sun.misc.Unsafe.park 5.8% 11.6% com.jad.generated.IncrementBench_incrementCustomAtomicWithWork_jmhTest.incrementCustomAtomicWithWork_thrpt_jmhStub 0.8% 1.7% org.openjdk.jmh.infra.Blackhole.consumeCPU 0.1% 0.1% com.jad.IncrementBench$Worker.work 0.0% 0.0% java.lang.Thread.currentThread 0.0% 0.0% com.jad.generated.IncrementBench_incrementCustomAtomicWithWork_jmhTest._jmh_tryInit_f_benchmarkparams1_0 0.0% 0.0% org.openjdk.jmh.infra.generated.BenchmarkParams_jmhType_B1.<init> ....[Thread state: TIMED_WAITING]................................................................... 49.9% 100.0% sun.misc.Unsafe.park 

In the "good case":

 ....[Thread state distributions].................................................................... 88.2% TIMED_WAITING 11.8% RUNNABLE ....[Thread state: TIMED_WAITING]................................................................... 88.2% 100.0% sun.misc.Unsafe.park ....[Thread state: RUNNABLE]........................................................................ 5.6% 47.9% sun.misc.Unsafe.park 3.1% 26.3% org.openjdk.jmh.infra.Blackhole.consumeCPU 2.4% 20.3% com.jad.generated.IncrementBench_incrementCustomAtomicWithWork_jmhTest.incrementCustomAtomicWithWork_thrpt_jmhStub 0.6% 5.5% com.jad.IncrementBench$Worker.work 0.0% 0.0% com.jad.generated.IncrementBench_incrementCustomAtomicWithWork_jmhTest.incrementCustomAtomicWithWork_Throughput 0.0% 0.0% java.lang.Thread.currentThread 0.0% 0.0% org.openjdk.jmh.infra.generated.BenchmarkParams_jmhType_B1.<init> 0.0% 0.0% sun.misc.Unsafe.putObject 0.0% 0.0% org.openjdk.jmh.runner.InfraControlL2.announceWarmdownReady 0.0% 0.0% sun.misc.Unsafe.compareAndSwapInt 

Link to test code

Link to graphical results. X - number of threads, Y - thpt, op / microsec

Link to RAW Magazine

UPD

Well, I know, I understand that when I use parkNanos, a single thread can also hold a lock (CAS) for long periods of time. Themes, with CAS-fail, go to sleep, and only one thread does the work and increases the value. I see that for a large level of concurrency when the work is so small - AtomicInteger is not the best approach. But if we increase workSize, for example, to = CASThrpt / threadNum, it should work fine: For the local machine, I set workSize = 300, the result of my test:

 Benchmark (workSize) Mode Cnt Score Error Units IncrementBench.incrementAtomicWithWork 300 thrpt 3 4.133 ± 0.516 ops/us IncrementBench.incrementCustomAtomicWithWork 300 thrpt 3 1.883 ± 0.234 ops/us IncrementBench.lockIntWithWork 300 thrpt 3 3.831 ± 0.501 ops/us IncrementBench.onlyWithWork 300 thrpt 3 4.339 ± 0.243 ops/us 

AtomicInteger - win, lock - second place, custom - third. But the problem with the spikes is still not clear. And I forgot about the java version: Java (TM) SE Runtime Environment (build 1.7.0_79-b15) Java HotSpot (TM) 64-bit server VM (build 24.79-b02, mixed mode)

+7
java increment atomic jmh
source share
1 answer

In the case of synchronization, it tends to be sticky with locks, which means that one thread can hold the lock for long periods of time and prevent the other thread from grabbing it enough. This is very bad for multithreading, but great if you have a benchmark that will work better if only one thread works for relatively long periods of time.

You need to modify the test so that it works better when using multiple threads than when using only one thread, or you will really test which blocking strategy has the worst equity policies.

The locking strategy tries to tune how the locking is performed, so it can change the behavior, but it may not work well, since the code should never be multithreaded in the first place.

+1
source share

All Articles