We have a Windows32 application in which one thread can stop another to check its state [PC, etc.] by doing SuspendThread / GetThreadContext / ResumeThread.
if (SuspendThread((HANDLE)hComputeThread[threadId])<0) // freeze thread ThreadOperationFault("SuspendThread","InterruptGranule"); CONTEXT Context, *pContext; Context.ContextFlags = (CONTEXT_INTEGER | CONTEXT_CONTROL); if (!GetThreadContext((HANDLE)hComputeThread[threadId],&Context)) ThreadOperationFault("GetThreadContext","InterruptGranule");
It is extremely rare that GetThreadContext returns error code 5 on a multi-core system (Windows system code is “Access Denied”).
The SuspendThread documentation explicitly states that the target thread is suspended if the error is not returned. We check the return status of SuspendThread and ResumeThread; they never complain.
How can I pause a stream but cannot access its context?
This blog is http://www.dcl.hpi.uni-potsdam.de/research/WRK/2009/01/what-does-suspendthread-really-do/
suggests that SuspendThread, when he returns, may have started hanging another thread, but that thread has not yet been suspended. In this case, I can see how GetThreadContext will be problematic, but it seems like a dumb way to define SuspendThread. (How could a SuspendThread call find out when the target thread was actually paused?)
EDIT: I lied. I said this is for windows.
Well, the strange truth is that I don’t see this behavior on Windows XP 64 (at least not last week, and I really don’t know what happened before) ... but we tested this Windows Application under Wine on Ubuntu 10.x. The GetThreadContext source for brushes contains the “Access Denied” response on line 819 when, for some reason, trying to capture the state of a thread does not work. I suppose, but it seems that Wine GetThreadStatus believes that the thread simply cannot be re-accessed. Why would this be true after SuspendThead is outside of me, but there is code. Thoughts?
EDIT2: I lied again. I said that we only saw behavior in Wine. No ... we found Vista Ultimate, which seems to produce the same error (again, rarely). So it looks like Wine and Windows agree with an obscure case. It also appears that simply turning on the Sysinternals Process monitoring program exacerbates the situation and causes a problem in Windows XP 64; I suspect Heisenbug. (The process monitor does not even exist on the Wine-tasting (:-) machine or XP 64 system that I use for development).
What it is?
EDIT3: September 15, 2010. I added a thorough check for error return status, without breaking code, for SuspendThread, ResumeThread and GetContext. I have not seen any hints of this behavior on Windows systems since I did it. They didn’t return to Wines experiment.
November 2010: Strange. It seems that if I compile this in VisualStudio 2005, it does not work on Windows Vista and 7, but not before. If I compile under VisualStudio 2010, this will not work anywhere. You can point your finger at VisualStudio2005, but I am suspicious of the problem taking into account the location, and different optimizers in VS 2005 and VS 2010 put the code in several different places.
November 2012: The saga continues. We see this crash on several XP and Windows 7 machines at a fairly low speed (once every few thousand starts). Our Suspend actions apply to threads that mostly execute pure compute code, but sometimes make calls on Windows. I do not remember to see this problem when the thread computer was in our computational code. Of course, I do not see the PC in the stream when it hangs, because GetContext will not give it to me, so I can not directly confirm that the problem only occurs when making system calls. But all of our system calls are transmitted through one point, and so far the proof is that this moment was executed when we get the hang. Thus, indirect evidence suggests that the GetContext in the thread only fails if the system call is made by this thread. I did not have the energy to create a critical experiment to test this hypothesis.