Problems related to threading and debugging

Question

Problems related to threading and debugging

This is my previous post on memory management issues. Listed below are the issues that I know.

1) data races (atomic violations and data corruption)

2) ordering problems

3) misuse of locks leading to dead locks

4) heisenbags

Any other issues with multiple threads? How to solve them?

+6

c ++ c multithreading operating-system

brett Aug 18 '10 at 18:30

source share

7 answers

Unfortunately, there is no good pill that helps to automatically solve most / all problems with threads. Even unit tests that work so well on single-threaded code fragments can never detect an extremely subtle race condition.

One thing that will help is storing the interaction data of streams encapsulated in objects. The smaller the interface / area of the object, the easier it will be to detect errors in the review (and, possibly, testing, but race conditions can be a pain to detect in test cases). Keeping a simple interface that can be used, clients that use the interface will also be correct by default. By creating a larger system from many small pieces (only a few of which actually interact with the stream), you can make significant progress in preventing streaming errors.

+2

Mark b Aug 18 '10 at 18:58

source share

The four most common design problems are:

1-Dead End
2-
active locking 3-racial conditions
4-starvation

+1

Eric Aug 18 '10 at 18:36

source share

How to solve [problems with multiple threads]?

A good way to "debug" MT applications is to register. A good log library with rich filtering capabilities makes it easy. Of course, logging in itself affects time, so you can still have heisenbugs, but it is much less likely than when you enter the debugger.

Prepare and plan it. From the very beginning, include a good logging tool in your application.

+1

sbi Aug 18 '10 at 18:39

source share

Make your flows as simple as possible.

Try not to use global variables. Global constants (actual constants that never change) are fine. When you need to use global or general variables, you need to protect them with some mutex / lock (semaphore, monitor, ...).

Make sure you really understand how your mutex works. There are several different implementations that can work in different ways.

Try organizing your code so that critical sections (the places where you hold the lock (blocks) of a certain type) as quickly as possible. Keep in mind that some functions may block (sleep or wait for something and not allow the OS to allow this thread to continue for some time). Do not use them while holding any locks (if it is absolutely necessary or during debugging, as sometimes they may show other errors).

Try to understand what actually more threads actually do for you. Blindly throwing more problems into a problem very often gets worse. Different threads compete for processor and locks.

Avoiding a dead end requires planning. Try not to get more than one lock at a time. If this is unavoidable, select the order that you will use to acquire and release locks for all flows. Make sure you know what a dead end really means.

Debugging multithreaded or distributed applications is difficult. If you can do most of the debugging in one streaming environment (perhaps even just by making other threads sleep), you can try to clear up some obscure central errors before switching to multi-threaded debugging.

Always think about what other threads can do. Comment this in your code. If you are doing something in a certain way because you know that at that time no other thread should access a specific resource, write a big comment saying this.

You might want to wrap mutex lock / unlock calls in other functions, such as:

int my_lock_get (lock_type lock, const char * file, unsigned line, const char * msg) {

  thread_id_type me = this_thread(); logf("%u\t%s (%u)\t%s:%u\t%s\t%s\n", time_now(), thread_name(me), me, "get", msg); lock_get(lock); logf("%u\t%s (%u)\t%s:%u\t%s\t%s\n", time_now(), thread_name(me), me, "in", msg);

}

And a similar version for unlocking. Please note: all functions and types used in this are compiled and not too much based on any API.

Using something like this, you can return if there is an error and use a perl script or something like this to run queries in your logs to check where everything went wrong (for example, matching locks and unlocks).

Please note that blocking may be required for your print or logging function. Many libraries already have this built in, but not all. These locks should not use the print version of lock_ [get | release] or you will have infinite recursion.

+1

nategoose Aug 18 '10 at 19:27

source share

Beware of global variables, even if they are const , particularly in C ++. Only PODs that are statically initialized with à la C are good here. Once the runtime constructor comes into play, it will be extremely careful. The initialization order of AFAIR static-linked variables that are in different compilation units is called in undefined order. Maybe C ++ Classes that initialize all of their members properly and the empty function body may be okay at the moment, but I once had a bad experience with this too.
This is one of the reasons why the POSIX Side pthread_mutex_t much easier to program than sem_t : it has the static initializer PTHREAD_MUTEX_INITIALIZER .
Keep critical sections short, perhaps for two reasons: it may be more efficient at the end, but more importantly, maintain and debug.
A critical section should never be longer than the screen, including the lock and unlock that are necessary to protect it, including comments and statements that help to understand what is happening.
Starting the implementation of critical sections is very tough, perhaps with one global block all of them, and restrictions afterwards.
Logging can be difficult if many threads start writing at the same time. If each thread does a reasonable amount of work, let each one write its own file so that they do not block each other.
But be careful, record changes in code behavior. This can be bad when errors disappear or are useful when errors that you did not otherwise notice.
To make post-mortem analysis such a mess, you must have accurate timestamps on each line so that all files can be merged and give you a consistent view of execution.

+1

Jens gustedt Aug 18 '10 at 21:00

source share

-> Add priority inversion to this list.

While another poster slipped away, the log files are wonderful things. For deadlocks, using LogLock instead of Lock can help determine when entities stop working. That is, as soon as you know that you have a dead end, the magazine will tell you when and where the locks were created and released. This can be extremely useful for tracking these things.

I found that the race conditions when you use the Actor model, following the same message-> confirm-> confirm, got fading styles. However, YMMV.

+1

wheaties Aug 18 '10 at 21:22

source share

Dave dunn · Accepted Answer · 2010-08-18T22:52:45+0000

Eric's list of four questions is pretty much a slick spot on. But debugging these problems is tough.

At a dead end, I always preferred "aligned locks." In essence, you give each type of lock a level number. And then require the thread locks to be monotonous.

To align the locks, you can declare the structure as follows:

typedef struct { os_mutex actual_lock; int level; my_lock *prev_lock_in_thread; } my_lock_struct; static __tls my_lock_struct *last_lock_in_thread; void my_lock_aquire(int level, *my_lock_struct lock) { if (last_lock_in_thread != NULL) assert(last_lock_in_thread->level < level) os_lock_acquire(lock->actual_lock) lock->level = level lock->prev_lock_in_thread = last_lock_in_thread last_lock_in_thread = lock }

What's cool about aligned locks is that the possibility of a dead end calls for approval. And with some extra magic with FUNC and LINE, you know exactly what your thread was.

For data races and lack of synchronization, the current situation is pretty bad. There are static tools that try to identify problems. But the false positives are high.

The company I work for ( http://www.corensic.com ) has a new product called Jinx that is actively looking for cases where race conditions can be set. This is done using virtualization technology to control the rotation of threads on different CPUs and scaling when exchanging between CPUs.

Check this. You probably have a few more days to download the beta for free.

Jinx is especially good at finding errors in lock-free data structures. It also helps to find other race conditions very well. Which is great, that there are no false positives. If code testing approaches race conditions, Jinx helps the code go the wrong way. But if the bad path does not exist, you will not be given false warnings.

Problems related to threading and debugging

More articles: