Multithreaded Bench

I studied multithreading and found a slowdown in Object.hashCode in a multithreaded environment, since it takes twice as long to calculate the default hash code that executes 4 threads vs 1 thread for the same number of objects.

But, according to my understanding, parallel execution should be performed in parallel.

You can change the number of threads. Each thread does the same job, so you can hope that starting 4 threads on my machine, which is a quad-core machine, can take about the same time as a single thread.

I see ~ 2.3 seconds for 4x, but 0.9 s for 1x.

Is there a gap in my understanding, please help me understand this behavior.

import java.util.Arrays; import java.util.List; import java.util.concurrent.Callable; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.Future; import java.util.concurrent.ThreadFactory; public class ObjectHashCodePerformance { private static final int THREAD_COUNT = 4; private static final int ITERATIONS = 20000000; public static void main(final String[] args) throws Exception { long start = System.currentTimeMillis(); new ObjectHashCodePerformance().run(); System.err.println(System.currentTimeMillis() - start); } private final ExecutorService _sevice = Executors.newFixedThreadPool(THREAD_COUNT, new ThreadFactory() { private final ThreadFactory _delegate = Executors.defaultThreadFactory(); @Override public Thread newThread(final Runnable r) { Thread thread = _delegate.newThread(r); thread.setDaemon(true); return thread; } }); private void run() throws Exception { Callable<Void> work = new java.util.concurrent.Callable<Void>() { @Override public Void call() throws Exception { for (int i = 0; i < ITERATIONS; i++) { Object object = new Object(); object.hashCode(); } return null; } }; @SuppressWarnings("unchecked") Callable<Void>[] allWork = new Callable[THREAD_COUNT]; Arrays.fill(allWork, work); List<Future<Void>> futures = _sevice.invokeAll(Arrays.asList(allWork)); for (Future<Void> future : futures) { future.get(); } } } 

Number of threads 4 Exit

 ~2.3 seconds 

Number of threads 1 Output

 ~.9 seconds 
+7
java multithreading executorservice microbenchmark
source share
3 answers

I created a simple JMH test to test various cases:

 @Fork(1) @State(Scope.Benchmark) @OutputTimeUnit(TimeUnit.NANOSECONDS) @Measurement(iterations = 10) @Warmup(iterations = 10) @BenchmarkMode(Mode.AverageTime) public class HashCodeBenchmark { private final Object object = new Object(); @Benchmark @Threads(1) public void singleThread(Blackhole blackhole){ blackhole.consume(object.hashCode()); } @Benchmark @Threads(2) public void twoThreads(Blackhole blackhole){ blackhole.consume(object.hashCode()); } @Benchmark @Threads(4) public void fourThreads(Blackhole blackhole){ blackhole.consume(object.hashCode()); } @Benchmark @Threads(8) public void eightThreads(Blackhole blackhole){ blackhole.consume(object.hashCode()); } } 

And the results are as follows:

 Benchmark Mode Cnt Score Error Units HashCodeBenchmark.eightThreads avgt 10 5.710 ยฑ 0.087 ns/op HashCodeBenchmark.fourThreads avgt 10 3.603 ยฑ 0.169 ns/op HashCodeBenchmark.singleThread avgt 10 3.063 ยฑ 0.011 ns/op HashCodeBenchmark.twoThreads avgt 10 3.067 ยฑ 0.034 ns/op 

So, we can see that as long as there are no more threads than cores, the time for the hash code remains unchanged.

PS: As @Tom Cools commented, you measure the distribution speed, not the hashCode () speed in your test.

+6
source share

See a comment by Palamino:

You do not measure hashCode (), you measure an instance of 20 million objects in single-threaded recording and 80 million objects in 4 streams. Move the new Object () logic from the for loop to Callable, then you will measure hashCode () - Palamino

+1
source share

Two questions that I see with the code:

  • The size of the allWork [] array is ITERATIONS.
  • And when repeating in the call () method, make sure that each thread gets its share of the load. Iterations / THREAD _COUNT.

Below is a modified version that you can try:

 import java.util.Arrays; import java.util.List; import java.util.concurrent.Callable; import java.util.concurrent.CountDownLatch; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.Future; import java.util.concurrent.ThreadFactory; public class ObjectHashCodePerformance { private static final int THREAD_COUNT = 1; private static final int ITERATIONS = 20000; private final Object object = new Object(); public static void main(final String[] args) throws Exception { long start = System.currentTimeMillis(); new ObjectHashCodePerformance().run(); System.err.println(System.currentTimeMillis() - start); } private final ExecutorService _sevice = Executors.newFixedThreadPool(THREAD_COUNT, new ThreadFactory() { private final ThreadFactory _delegate = Executors.defaultThreadFactory(); @Override public Thread newThread(final Runnable r) { Thread thread = _delegate.newThread(r); thread.setDaemon(true); return thread; } }); private void run() throws Exception { Callable<Void> work = new java.util.concurrent.Callable<Void>() { @Override public Void call() throws Exception { for (int i = 0; i < ITERATIONS/THREAD_COUNT; i++) { object.hashCode(); } return null; } }; @SuppressWarnings("unchecked") Callable<Void>[] allWork = new Callable[ITERATIONS]; Arrays.fill(allWork, work); List<Future<Void>> futures = _sevice.invokeAll(Arrays.asList(allWork)); System.out.println("Futures size : " + futures.size()); for (Future<Void> future : futures) { future.get(); } } } 
0
source share

All Articles