Heisenbug Search Strategies

Currently, I am again in a situation that I need to find the cause of the error, which almost never occurs when the debugger is working (maybe some race condition). The only thing I can find is find:

  • Add debugging fingerprints and assertions to code that tells me what happens without a debugger.
  • Go through the code and think through each line and possible side effects that may occur.

All in all, this is very frustrating. What are your strategies and experiences with such errors?

Edit: I am using Visual C ++ 2005, but I think this question applies to many (all) languages ​​and development environments.

+6
language-agnostic debugging heisenbug
source share
3 answers

If you can identify the point (s) at which the problem is first visible (not when it is caused, obviously), throw an exception there and use the Process Dumper to get a dump for debugging postmortem.

Run Release executables outside of the IDE and then attach the debugger. This avoids the special heap and other flags that run inside the debugger.

If you have ANY idea of ​​where the error is, extract this code into a minimally sufficient test application that clogs it as much as possible - you are trying to check it for destruction. Reconnect the debugger only after the code is up and running to avoid many of the side effects of the debugger.

Validation options. Build with /W4 to make sure that nothing obvious is missing. Check the code and warnings for a C-style cast or reinterpret_cast in case someone drops miunderstood but a vital warning or error message.

+4
source share

I found that running lint over my C / C ++ / Java code and ensuring that I correct every warning that it offers led to these race conditions just disappearing. But this is not a solution. Never code by coincidence. You need to understand what you fixed and why it fixed the problem.

I believe this is K (& ||) R, which states that plentiful logging messages have done more to help them debug code than any debugger, especially in multi-threaded environments, but need a quote.

Carefully looking at a very detailed trace of all the activities that lead to the error, really helps a lot.

+2
source share

When all methods are finished, and BoundsCheckers or RationalPurify do not help. I usually use a very dumb method. We opened it in the first year of study at the university. In Russian, he is called very rude.

So - I start by choosing a suspicious block / module that fails in a multi-threaded environment. If this is not possible, then the program structure is re-entered into the statistics.

When a module is selected, I comment on small blocks of code until the exception disappears. This allows you to determine what exactly causes (for example, the state of the race). If at this stage you can say what is wrong, then it is great.

If not, the same method is used to detect an error precondition, so I comment on the code that adds the lock.

+1
source share

All Articles