Java performance in numerical algorithms

I'm interested in learning about the performance of Java numeric algorithms, say, for example, double-precision multiplication of a matrix matrix using the latest JIT machines, for example, to manually configure SSE C ++ / assembler or Fortran mappings.

I looked on the Internet, but most of the results came almost 10 years ago, and I understand that Java has been developing quite a lot since then.

If you have experience using Java for numerical load applications, you can share your experience. Also, how well does Java work in kernels where the loops are relatively short and the memory access is not very uniform but still within the L1 cache? If such a kernel is executed several times in a row, can the JVM optimize it at run time?

thanks

+6
java performance optimization numerical sse
source share
9 answers

I wrote quite a large and high-performance numerical code in Java (usually the crunch of large arrays of doubles).

I found Java good enough for quick numerical calculations. Especially if you think that in any case you are usually not attached to the processor - memory latency and cache understanding are likely to be your biggest problem for large data sets.

However, you can still use Java with the manually optimized C / C ++ codec, which takes advantage of certain vector instructions, etc. or highly customizable memory layouts. Thus, for the fastest code, you can consider writing the main algorithm in C / C ++ and calling it from Java using JNI.

Personally, I believe that creating dependency on native code is usually more of a problem than it's worth, so I tend to stick to the principle of pure Java.

+2
source share

This is a link to the shootout page in the programming language for java vs C ++, which will give you a comparison of java speed on several intensive algorithm calculations. It will also show you what high-performance Java code looks like. For the most part, for these few specific tests, java took longer (but no more than 2 or 3 times) to run.

+1
source share

This happens on the .NET side of things, but I'm 90% sure that this also applies to Java. While JIT will use some SSE instructions where possible, it does not currently auto-vectorize your code when working with, for example, matrix multiplication. Manual vectorized C ++ using inline compiler / inline assemblies will definitely be faster here.

+1
source share

One of the weakest points in java is the (native) matrix operations. This is due to the nature of Java matrices:

  • You cannot declare a matrix rectangular, i.e. each row can have a different number of columns.

  • The matrix is ​​technically not a "double matrix (or ints, ...)", but an array of arrays .... The big difference is that since arrays are Java objects, you can assign the same array object to more than 1 row .

These two properties make many standard matrix optimizations impossible for the compiler.

You can get better performance using the Java library, which emulates matrices on one long array. However, you have the overhead of method calls for all access.

+1
source share

C ++ will certainly be faster. You may even have some manually optimized libraries for your purposes that contain assembler codes for each of the main CPUs. You cannot get better.

Subsequently, you can use the JNI to call it from Java, if necessary.

Java is not intended for high-performance arithmetic calculations like this. If you are depending on this, I would recommend choosing a suitable low-level language for its implementation. Or, alternatively, you can write the performance-related part in a low-level language and then connect it to the Java interface using the JNI or other IPC method.

+1
source share

The second thing that is best to test it yourself, as the performance will be slightly different depending on what you do. It's hard for me to believe that Shane C. Mason answers that Java performance will be the same as performance in C ++ or Fortran, since even C ++ and Fortran are not really comparable for some computational algorithms.

I have a fluid dynamics compute code that I wrote using C ++ and the same code essentially translated into Fortran. I'm still not sure why for now, but the Fortran version is about twice as fast as the C ++ version. I would suggest that with features like border checking and garbage collection, Java will be slower than both, but I don't know until I check.

0
source share

This may depend on what you are doing in C ++ code.

For example, do you use a GPU? Edit I forgot about jogl, so Java can compete here.

Whether you are parallelized using STM or shared-memory, then Java cannot compete. For a link to the analysis of parallel matrix multiplication: http://www.cs.utexas.edu/users/plapack/papers/ipps98/ipps98.html

You have enough memory to perform in-memory calculations, so you don’t need the garbage collector, and you fine-tuned the garbage collector for optimal performance? Then maybe Java can be competitive.

Do you use multi-core processors and is C ++ optimized to use this architecture? Then Java will not be able to compete.

If you use several computers connected to each other, then Java will not be able to compete.

Do you use any combination of them, then it will depend on the specific implementation.

Java is not designed to compete with a manually configured C ++ program, but the time it takes to configure is that you do enough computation where it matters? Java will be able to give some reasonable speed, but with less work than manual tuning, but not much of an improvement than just C ++ code.

You might want to see if there are improvements over Haskell or Erlang, for example, over your C ++, as these languages ​​are better designed for this type of work.

0
source share

Are you interested in these calculations? Fast Fourier transform, Jacobi, successive for relaxation, Monte Carlo integration, Sparse matrix array, Dense matrix matrix LU?

They make up the SciMark 2.0 composite test, which can be run as an applet on your computer.

There are also ANSI C versions and Intel's Document (pdf) for optimizing and recompiling SciMark for C ++ .


Similarly, you can use the Java Grande Forum Benchmark Suite and C comparison programs .

0
source share

Java uses the Just in Time (JIT) compiler to convert bytecode to its own machine language, so it will be slower the first time you run the code, but as soon as the segment is “warmed up”, the performance will be equivalent. In short, the numerical performance is pretty good.

-4
source share

All Articles