Thread Performance in C ++ 11

Question

Thread Performance in C ++ 11

I'm interested in mutex performance and messaging in the latest gcc with threads based on pthreads and the Ubuntu development environment. A good common problem for this is lunch philosophers, in which every philosopher uses lh and rh fork, shared with a left and right hand neighbor. I am increasing the number of philosophers to 99 to support the operation of a quad-core processor.

int result = try_lock(forks[lhf], forks[rhf]);

the above code allows my philosopher to try and grab the two forks they need to eat.

  // if the forks are locked then start eating if (result == -1) { state[j] = philosophers::State::Eating; eating[j]++; if (longestWait < waiting[j]) { longestWait = waiting[j]; } waiting[j] = 0; } else { state[j] = philosophers::State::Thinking; thinking[j]++; waiting[j]++; }

the above code tracks how my philosophers progress in food or thinking, depending on whether they can reserve two forks.

  { testEnd te(eating[j]+thinking[j]-1); unique_lock<mutex> lk(cycleDone); endCycle.wait(lk, te); }

the above code expects all philosophers to complete the selection after this time, when the philosopher can make a new attempt:

  if ( philosophers::State::Eating == state[j] ) { state[j] = philosophers::State::Thinking; forks[lhf].unlock(); forks[rhf].unlock(); }

I have a main stream that controls philosophers and moves them from one cycle to the next, which allows them to eat and think as much as possible for about 10 seconds. The result is about 9,540 cycles with some starving philosophers, while others have a lot of food and a lot of time to think! Therefore, I need to protect my philosophers from hunger and wait too long, so I add more logic to prevent overeating, requiring that the philosopher-spruce break free and think, and not grab the same forks after a very short break:

  // protect the philosopher against starvation if (State::Thinking == previous) { result = try_lock(forks[lhf], forks[rhf]); }

Now I have 9598 cycles, each philosopher receives a relatively equal share of food (2620 - 2681) and thinks with the longest expectation 14. Not bad. But I'm not satisfied, so now I get rid of all the mutexes and locks and keep them simple when the four philosophers eat smooth cycles, and strange philosophers eat odd cycles. I use a simple philosopher synchronization method

 while (counter < counters[j]) { this_thread::yield(); }

Prevents the philosopher from using or thinking too many times using a global cycle counter. The same time period and philosophers manage about 73,543 cycles with 36,400 units. And no more than 3 waiting cycles. Thus, my simple lock-free algorithm is faster and has a better distribution of processing between different threads.

Can anyone think of a better way to solve this problem? I am afraid that when I implement a complex system with multiple threads, if I follow the traditional methods of mutex and message passing, I end up with slower than necessary and possible unbalanced processing of various threads in my system.

+8

c ++ concurrency c ++ 11

Pete Jun 09 '13 at 14:09

source share

1 answer

hazydev · Accepted Answer · 2013-06-11T18:09:16+0000

This is an interesting way to study threading problems in C ++.

To indicate specific points:

I am afraid that when I implement a complex system with multiple threads, if I follow the traditional methods of mutex and message passing, I will be slower than necessary, and possible unbalanced processing of various threads in my system.

Unfortunately, the best answer I can give you is that this is a well-founded fear. The cost of planning and synchronization is very application-specific, although it becomes a technical decision when designing a large system. First of all, planning is NP-Hard ( http://en.wikipedia.org/wiki/Multiprocessor_scheduling ), but has good approximations.

As for your specific example, I find it difficult to draw general conclusions based on the results that you present - there is one main starting point: trading between synchronization of coarse grains and synchronization of small grains. This is a well-researched issue, and some research may be helpful (e.g. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=744377&tag=1 ).

In general, this concerns a technical problem that will be specific to the problem you want to solve, the operating system and hardware.

Thread Performance in C ++ 11

More articles: