I have an evaluation of a function that is somewhat slow. I am trying to speed it up using threads, as there are three things that can be done in parallel. Single threaded version
return dEdx_short(E) + dEdx_long(E) + dEdx_quantum(E);
where an estimate of these functions takes ~ 250us, ~ 250us, and ~ 100us, respectively. Therefore, I implemented a three-threaded solution:
double ret_short, ret_long, ret_quantum; // return values for the terms auto shortF = [this,&E,&ret_short] () {ret_short = this->dEdx_short(E);}; std::thread t1(shortF); auto longF = [this,&E,&ret_long] () {ret_long = this->dEdx_long(E);}; std::thread t2(longF); auto quantumF = [this,&E,&ret_quantum] () {ret_quantum = this->dEdx_quantum(E);}; std::thread t3(quantumF); t1.join(); t2.join(); t3.join(); return ret_short + ret_long + ret_quantum;
Which I expected to take ~ 300us, but actually it takes ~ 600us - basically the same as the single-threaded version! All of them are essentially thread-safe, so there are no expectations for locks. I checked the thread creation time on my system and it is ~ 25us. I don't use all of my cores, so I'm a little puzzled as to why the parallel solution is so slow. Does this have anything to do with creating lambda?
I tried to get around lambda, for example:
std::thread t1(&StopPow_BPS::dEdx_short, this, E, ret_short);
after overwriting the called function, but that gave me an error attempt to use a deleted function ...
c ++ multithreading lambda c ++ 11
Alex z
source share