C ++ and Java performance

This question is simply speculative.

I have the following implementation in C ++:

using namespace std; void testvector(int x) { vector<string> v; char aux[20]; int a = x * 2000; int z = a + 2000; string s("X-"); for (int i = a; i < z; i++) { sprintf(aux, "%d", i); v.push_back(s + aux); } } int main() { for (int i = 0; i < 10000; i++) { if (i % 1000 == 0) cout << i << endl; testvector(i); } } 

In my box, this program runs in approx. 12 seconds Surprisingly, I have a similar implementation in Java [using String and ArrayList], and it works much faster than my C ++ application (about 2 seconds).

I know that Java HotSpot performs many optimizations when translating to native, but I think that if such performance can be done in Java, it can also be implemented in C ++ ...

So, what do you think should be changed in the program above or, I do not know, in the libraries used or in the memory allocator to achieve similar characteristics in this material? (writing the actual code of these things can be very long, so discussing it would be great) ...

Thanks.

+4
source share
9 answers

You should be careful with performance tests, because it is very easy to fool yourself or not to compare, as with similar ones.

However, I have seen similar results comparing C # with C ++, and there are a number of well-known blog posts about learning about native encoders when faced with similar evidence. In principle, a good modern compression GC for generation is much more optimized for many small distributions.

In C ++, default allocator each block is handled the same way, and therefore it is quite expensive to allocate and release. In a general-generation GC, all blocks are very, very cheap for distribution (almost as cheap as stack distribution), and if they are short-lived, they are also very cheap for cleaning.

This is why the "fast performance" of C ++ compared to more modern languages ​​is mostly mythical. Before competing with the performance of an equivalent naively written C # or Java program, you must manually configure your program in C ++.

+12
source

Your entire program displays the numbers 0..9000 in increments of 1000. Calls to testvector() do nothing and can be eliminated. I suspect your JVM is noticing this and is significantly optimizing the whole function.

You can achieve a similar effect in your C ++ version by simply commenting on the testvector() call!

+6
source

Well, this is a pretty useless test that only measures the selection of small objects. However, simple changes made me shorten the time from about 15 seconds to about 4 seconds. A new version:

 typedef vector<string, boost::pool_allocator<string> > str_vector; void testvector(int x, str_vector::iterator it, str_vector::iterator end) { char aux[25] = "X-"; int a = x * 2000; for (; it != end; ++a) { sprintf(aux+2, "%d", a); *it++ = aux; } } int main(int argc, char** argv) { str_vector v(2000); for (int i = 0; i < 10000; i++) { if (i % 1000 == 0) cout << i << endl; testvector(i, v.begin(), v.begin()+2000); } return 0; } real 0m4.089s user 0m3.686s sys 0m0.000s 

The Java version has time:

 real 0m2.923s user 0m2.490s sys 0m0.063s 

(This is my direct java port of your source program, except that it passes an ArrayList as a parameter to reduce useless distributions).

So, to summarize, small allocations are faster in java, and memory management is a bit more complicated in C ++. But we already knew that :)

+5
source

Hotspot optimizes hotspots in code. As a rule, everything that is performed 10,000 times tries to optimize.

For this code, after 5 iterations, it will try to optimize the inner loop by adding lines to the vector. The optimization that he will do more than likely will include an analysis of the leakage of variables in the method. A vector is a local variable and never eludes the local context, it is very likely that it will delete all the code in the method and turn it into no op. To check this, try returning the results from the method. Even then, be careful to do something meaningful as a result - just get its length, for example, you can optimize it, since horpsot can see that the result always matches the number of iterations in the loop.

All of this points to the key advantage of a dynamic compiler such as hotspot - by using runtime analysis, you can optimize what is actually being executed at runtime and get rid of redundant code. After all, it doesn't matter how efficient your C ++ custom memory manager is - not executing any code will always be faster.

+4
source

In my box, this program runs in approx. 12 seconds Surprisingly, I have a similar implementation in Java [using String and ArrayList], and it works much faster than my C ++ application (about 2 seconds).

I can not reproduce this result.

To account for the optimization mentioned by Alex, Ive modified the codes so that both Java and C ++ code printed the last result of the vector v at the end of the testvector method.

Now C ++ code (compiled with -O3 ) is about as fast as yours (12 seconds). The Java code (straightforward, uses an ArrayList instead of Vector , although I doubt it will affect performance due to escape analysis) takes about two times.

I did not do many tests, so this result is by no means significant. It just shows how easy it is to get these tests wrong, and how few single tests can tell about real performance.

For write-only purposes, tests were performed in the following configuration:

 $ uname -ms Darwin i386 $ java -version java version "1.6.0_15" Java(TM) SE Runtime Environment (build 1.6.0_15-b03-226) Java HotSpot(TM) 64-Bit Server VM (build 14.1-b02-92, mixed mode) $ g++ --version i686-apple-darwin9-g++-4.0.1 (GCC) 4.0.1 (Apple Inc. build 5490) 
+3
source

This should help if you use Vector::reserve to reserve space for z elements in v before the loop (however, the same thing should also speed up the java equivalent of this code).

+1
source

To understand why the performance of both C ++ and java is different, it would be important to see the source for both, I can see a number of performance problems in C ++, for some it would be useful to see if you are doing the same in java (for example, dumping the output stream via std :: endl, you call System.out.flush () or just add '\ n' if you just gave java a clear advantage later)?

+1
source

What are you really trying to measure here? Putting ints in a vector?

You can start by pre-allocating space into a vector with knowledge of the size of the vector:

instead:

 void testvector(int x) { vector<string> v; int a = x * 2000; int z = a + 2000; string s("X-"); for (int i = a; i < z; i++) v.push_back(i); } 

to try:

 void testvector(int x) { int a = x * 2000; int z = a + 2000; string s("X-"); vector<string> v(z); for (int i = a; i < z; i++) v.push_back(i); } 
0
source

In your inner loop, you insert ints into the row vector. If you're just a step-by-step machine-level code, I bet you will find that it takes a lot of time to distribute and format the strings, and then goes into pushback for a while (not to mention freeing when freeing a vector).

This can easily vary depending on the implementation of the runtime library, based on the developer’s understanding of what people could reasonably do.

0
source

All Articles