I am writing a C ++ program to simulate a specific system. For each time interval, most of the execution takes one cycle. Fortunately, this confuses the parallel, so I decided to use Boost Threads to parallelize it (I work on a dual-core computer). I expect that during acceleration it will be close to the 2x serial version, as there is no blocking. However, I find that acceleration does not occur at all.
I implemented a parallel version of the loop as follows:
I used this approach, since it should provide good load balancing (since each calculation may take different time intervals). I am very curious about what could lead to this slowdown. I always read that atomic variables are fast, but now I'm starting to wonder if they have their own performance costs.
If anyone has ideas on what to look for or some hints, I would appreciate it. I punched my head for a week, and the profiling did not show much.
: !
, . gprof, (-O3). , , : , .
. , , vtable voila. 2! .
, , , - !
. , .