Heisenbug: WinApi crashes on some computers

Please, help! I'm really out of my mind. My program is a small personal note manager (google for "cintanotes"). On some computers (and, of course, I don’t own them), it crashes with an unhandled exception right after launch. Nothing special can be said about these computers, except that they usually have AMD processors.

Environment: Windows XP, Visual C ++ 2005/2008, raw WinApi.

Here is what Heisenbug knows about:

1) Failure occurs only in the Release version.

2) The crash will disappear as soon as I remove all the things related to GDI.

3) BoundChecker does not complain.

4) A log entry indicates that an accident occurs when a local variable is declared int! How could this be? Memory corruption

Any ideas would be greatly appreciated!

UPDATE: I was able to debug the application on a β€œfaulty” PC. Results:

"Unhandled exception at 0x0044a26a in CintaNotes.exe: 0xC000001D: illegal instruction."

and code breaks

0044A26A cvtsi2sd xmm1, dword ptr [esp + 14h]

Thus, it seems that the problem was in the compiler option "Code Generation / Enable Extended Instruction Set". It was configured on "/ arch: SSE2" and crashed on machines that did not support SSE2. I set this option to "Not set" and the error disappeared. Phew!

Thank you very much for your help!

+8
c ++ debugging winapi crash gdi
Sep 25 '08 at 8:27
source share
11 answers

So this is not a failure when configuring the DEBUG configuration? There are many things different from the RELEASE configuration: 1.) Initialization of global variables 2.) The actual machine code is generated, etc.

So, the first step is to find out what exact settings for each parameter are in RELEASE mode compared to DEBUG mode.

-AD

+5
Sep 25 '08 at 8:38
source share

4) Writig log shows that an accident occurs when a local variable is declared int! how could this be? Memory corruption

What is the base code in the executable / assembly? An int declaration is not code at all and as such cannot crash. Do you initialize int somehow?

To see the code where the failure occurred, you must perform a so-called postmortem analysis.

Windows Error Reporting

If you want to analyze the crash, you must get a dump of crashes. One option for this is to register for Windows error reporting - it takes a little money (you need a digital code signing identifier) ​​and filling out a specific form. For more visit https://winqual.microsoft.com/ .

Get an emergency reset designed for WER directly from the client

Another option is to contact a user who is experiencing a failure and get an emergency dump dedicated to WER from him. The user can do this when he clicks on the Technical Details before sending the accident to Microsoft - the location of the accident dump file can be checked there.

Your own mini drive

Another option is to register your own exception handler, handle the exception, and write a mini-drive anywhere. A detailed description can be found on Code Project Post-Mortem Debugging Your Application Using Minidumps and Visual Studio.NET Articles .

+10
Sep 25 '08 at 8:46
source share

1) Failure occurs only in the Release version.

This is usually a sign that you are relying on some kind of behavior that is not guaranteed, but as it turns out, it is true in the debug assembly. For example, if you forgot to initialize your variables or access the array outside the bounds. Make sure you include all compiler checks (/ RTCsuc). Also check things like relying on the order in which function parameters are evaluated (which is not guaranteed).

2) The crash will disappear as soon as I remove all the things related to GDI.

Maybe this is a hint that you are doing something wrong with the GDI stuff? Do you use HANDLE after they are released, for example?

+4
Sep 25 '08 at 8:43
source share

Download the debugging package for Windows . Set the character paths correctly, then run the application under WinDbg. At some point, it will break with access violation. Then you should run the β€œ! Analysis -v” command, which is pretty smart and should give you a hint about what is going wrong.

+2
Sep 25 '08 at 10:09
source share

Most heisenbugs / release errors are only due to a control flow that depends on reading from uninitialized memory / obsolete / past end buffers or race conditions or both.

Try to redefine your allocators so that they zero out memory when allocating. Does the problem disappear (or become more reproducible?)

Writig: a log shows that an accident occurs when a local variable is declared int! How could this be? Memory corruption

Stack overflow!;)

+1
Sep 25 '08 at 8:37
source share

Sounds like stack damage to me. My favorite tool for tracking these topics is IDA Pro . Of course, you do not have such access to the user machine.

Some memory controllers have difficulty bribing stack damage (if that is true). The surest way to get the ones that I think is runtime analysis.

It can also be related to corruption in the path of exclusion, even if the exception is handled. Are you debugging when you enable "catch first-chance exceptions"? You owe as much as you can. In many cases, this causes irritation.

Can you send these users a verified version of your application? Check Minidump Handle this exception and write a dump. Then use WinDbg to debug at your end.

Another method is to write very detailed journals. Create the option "Log every one action" and ask the user to enable it and send it to you too. Upload memory to logs. Check out '_CrtDbgReport ()' on MSDN.

Good luck

EDIT:

Reply to your comment: The error in declaring a local variable is not surprising to me. I have seen this a lot. This usually happens due to a damaged stack.

Some variable in the stack can, for example, work on its boundaries. After that, all hell is torn. Then declarations of stack variables cause random memory errors, virtual tables get corrupted, etc.

Anytime I saw them for a long period of time, I had to go to IDA Pro. Detailed debugging of runtime parsing is the only thing I know that is really reliable.

Many developers use WinDbg for this analysis. That is why I also suggested Minidump.

+1
Sep 25 '08 at 8:46
source share

4) Writig log shows that an accident occurred while declaring a local variable int! How could this be? Memory corruption

I found the cause of numerous "strange crashes" in dereferencing a broken this inside a member function of a specified object.

+1
Sep 25 '08 at 9:05
source share

Try Rational (IBM) PurifyPlus. It catches a lot of errors that BoundsChecker does not make.

+1
Sep 25 '08 at 9:08
source share

What does the crash say? Access Violation? An exception? This will be a further hint to solve this problem with

Verify that you do not have prior memory failures with PageHeap.exe

Make sure you haven't (CBig array [1000000])

Make sure you do not have uninitialized memory.

Then you can run the release version also inside the debugger, as soon as you create debugging symbols (not the same as creating the debug version) for the process. Go through and see if you get warnings in the debugger trace window.

+1
Sep 25 '08 at 10:02
source share

"4) A log entry indicates that a crash occurs when a local variable is declared int. How could this be? Memory corruption?"

This may be a sign that the hardware is actually erroneously or pressed too hard. Find out if they got their computer up.

+1
Sep 25 '08 at 12:40
source share

When I get this type of thing, I try to run the code through gimpels PC-Lint (static code analysis), as it checks various error classes on BoundsChecker. If you are using Boundschecker, enable memory poisoning options.

You mentioned AMD processors. Have you investigated if there is a similar version of the video card / driver and / or configuration on machines that crash? Does this always happen on these machines or just occasionally? Perhaps run the System Information tool on these machines and see what they have in common,

+1
Sep 26 '08 at 7:32
source share



All Articles