Why is my C ++ application faster than my C application (using the same library) on Core i7

I have a library written in C, and I have 2 applications written in C ++ and C. This library is a communication library, so one of the API calls looks like this:

int source_send( source_t* source, const char* data ); 

In application C, the code does something like this:

 source_t* source = source_create(); for( int i = 0; i < count; ++i ) source_send( source, "test" ); 

Where does the C ++ application do it:

 struct Source { Source() { _source = source_create(); } bool send( const std::string& data ) { source_send( _source, data.c_str() ); } source_t* _source; }; int main() { Source* source = new Source(); for( int i = 0; i < count; ++i ) source->send( "test" ); } 

On Intel Core i7, C ++ code produces almost exactly 50% more messages per second. While on Intel Core 2 Duo it produces almost exactly the same amount of messages per second. (The i7 core has 4 cores with two processing threads each)

I'm curious what kind of magic the equipment does to take it off. I have some theories, but I thought I was getting a real answer :)

Edit: More info from comments

The compiler is visual C ++, so this is a window (both of them)

The implementation of the communication library creates a new thread for sending messages. Source_create creates this stream.

+6
c ++ c cpu cpu-architecture hardware
source share
5 answers

Studying only the source code, I see no reason why C ++ code should be faster.

The next thing I would like to do is check the assembly code that is being generated. If you use the GNU toolchain, there are several ways you can do this.

You can request gcc and g ++ to output assembly code using the -S command line argument. Make sure the other by adding this argument, you use the same command line arguments that you do for regular compilation.

The second option is to download your program using gdb and use the disas command.

Good luck.

Update

You can do the same with the Microsoft Toolchain.

To get the compiler to output the assembly, you can use / FA or / FA . The first should output the assembly only when the second will mix the assembly and the source (which should facilitate the execution).

Regarding the use of the debugger, after starting the debugger in Visual Studio, go to the section "Debugging | Windows | Disassembly" (tested in Visual Studio 2005, other versions may vary).

+7
source share

Without seeing the complete code or assembly, I think the C ++ compiler is for you. One of the beauties of C ++ compilers is the ability to embed almost anything for speed, and Microsoft compilers are widely known to be irrevocably embedded almost at the point of unreasonably inflated final executable files.

+2
source share

The first thing I would recommend doing is to profile both versions and see if there are any noticeable differences.

Is it possible that version C is copying something unnecessarily (it may be subtle or not as subtle as optimization of return value).

This should appear in a good profiler, if you have a higher end VS SKU, a sample-based profiler is good, if you are looking for a good free profiler, the Windows performance analyzer is incredibly powerful for Vista and here is a step-by-step guide on using the stackwalking option

The first thing I will probably do myself is to break into the debugger and check the disassembly for each to see if they are different from the others. Note that there is a compiler option that spits out asm into a text file.

I would follow this with the profile if there was nothing obvious (like an extra copy).

One more thing, if you are worried that hyperthreads are interfering, it’s hard to affinity the process in the non HT core. You can do this either through the task manager in the GUI, or through SetThreadAffinityMask.

-Rick

+1
source share

Core i7 are hyperthreaded - do you have HT support?

Maybe the C ++ code is somehow compiled to use HT, while the C code does not. What does the task manager look like when you run the code? Evenly distributed load on how many cores or several cores are exceeded?

0
source share

Just a wild hunch: if you compile the library source with your application, and the C API functions are not declared extern "C", then maybe the C ++ version uses a different and somehow faster calling convention?

Also, if you compile the library source with your application, then perhaps the C ++ compiler compiles your library source as C ++ and is somehow better optimized than your C compiler?

0
source share

All Articles