How to debug a (possible) RTL problem?

I ask this because I have no good ideas ... hoping for someone else with a fresh perspective.

I have a user who runs our 32-bit Delphi application (compiled with BDS 2006) on a 64-bit Windows 7. Our software was "working fine" until a couple of weeks ago. Now, unexpectedly, this is not the case: it causes an access violation during initialization (instancing objects).

We forced him to reinstall all of our software — starting from scratch. Same AV error. We disabled its antivirus software; same error.

Our stack tracking code ( madExcept ) for some reason could not provide a stack trace in the error line, so we sent a couple of error logging versions for the user to install and run, to isolate the line that generated the error ...

It turns out that this is a line in which instances of a simple descendant of TStringList (there is no overridden constructor for Create, etc.), basically Create is just starting a TStringList, which has several custom methods associated with the descendant class.)

I want to send the user another .EXE test; one that simply enters a regular TStringList to see what happens. But at the moment I feel like windmills are flickering, and I risk enduring the patient’s patience if I send too many “things to try.”

Any fresh ideas on a better approach to debugging this user problem? (I don’t like when problems arise with the user ... those that are usually those that, if ignored, suddenly become an epidemic that 5 other users suddenly “find”.)

EDIT as Lasse requested:

procedure T_fmMain.AfterConstruction; begin inherited; //Logging shows that we return from the Inherited call above, //then AV in the following line... FActionList := TAActionList.Create; ...other code here... end; 

And here is the definition of the created object ...

 type TAActionList = class(TStringList) private FShadowList: TStringList; //UPPERCASE shadow list FIsDataLoaded : boolean; public procedure AfterConstruction; override; procedure BeforeDestruction; override; procedure DataLoaded; function Add(const S: string): Integer; override; procedure Delete(Index : integer); override; function IndexOf(const S : string) : Integer; override; end; implementation procedure TAActionList.AfterConstruction; begin Sorted := False; //until we're done loading FShadowList := TStringList.Create; end; 
+7
source share
5 answers

I hate such problems, but I believe that you should focus on what has been happening lately before the object tries to build.

The symptoms you describe sound like typical heap damage, so maybe you have something like ...

  • An array that is written to external borders? (checking turn boundaries if you have this)
  • Is code trying to access a remote object?

Since my answer is above, you posted code snippets. This raises a few possible problems that I see.

a: AfterConstruction vs. modified constructor: As others have noted, using AfterConstruction in this way is at best not idiomatic. I do not think this is really “wrong,” but it is a possible smell. There is a good introduction to these methods on Dr. Bob is here.

b: overridden methods Add, Remove, IndexOf I assume that these methods somehow use the FshadowList element. Is it possible for these methods to be called (and thus using FShadowList) before creating the FShadowList? This seems possible because you are using the AfterConstruction methods above, and by this time the virtual methods should be working. Hopefully this is easily verified using the debugger by setting some breakpoints and seeing how they fall.

+5
source

Our software was “working fine” until a couple of weeks ago ... suddenly it became an epidemic that 5 other users suddenly “find”.):

It looks like you need to do some forensic analysis, not debugging: you need to find out what has changed in this user environment in order to cause an error. Especially if you have other users with the same deployment who do not have a problem (it sounds like in your situation). Submitting custom “things to try” is one of the best ways to quickly erode user trust! (If the user’s site has IT support, enable them, not the user).

To get started, study the following options:

*) If possible, I would check the Windows event log for events that might have occurred on this machine around the time the problem occurred.

*) Is there any technical support user on the user side that you can talk about about possible changes / problems in this user environment?

*) . Was there any support / incident problem with this user during the emergence of an error that may be related to it, and / or caused incorrect data or file corruption?

(As for the code itself, I agree with @Warran P about decoupling, etc.)

+2
source

You should never override the AfterConstruction and BeforeDestruction methods in your programs. They are not intended for what you do with them, but for hacking a VCL with a low level (for example, adding links, setting up user memory, or such).

Instead, you should override Create constructor and Destroy destructor and enter the initialization code here, for example:

 constructor TAActionList.Create; begin inherited; // Sorted := False; // not necessary IMHO FShadowList := TStringList.Create; end; 

Take a look at the VCL code and all the serious published Delphi code, and you will see that the AfterConstruction and BeforeDestruction methods AfterConstruction never used. I assume this is the main cause of your problem, and therefore your code should be modified. This could be even worse in a future version of Delphi.

+2
source

Obviously, there is nothing suspicious about what the TAActionList doing at the time of build. Even considering the ancestor constructors and the possible side effects of installing Sorted := False , there are no problems. I'm more interested in what's going on inside T_fmMain .

Something basically happens that leads to an FActionList := TAActionList.Create; although there is nothing wrong with implementing TAActionList.Create (it is possible that the form may be unexpectedly destroyed).

I suggest you try changing T_fmMain.AfterConstruction as follows:

 procedure T_fmMain.AfterConstruction; begin //This is safe because the object created has no form dependencies //that might otherwise need to be initialised first. FActionList := TAActionList.Create; //Now, if the ancestor AfterConstruction is causing the problem, //the above line will work fine, and... inherited AfterConstruction; //... your error will have shifted to one of these lines here. //other code here end; 

If an environmental issue with the component used by your form causes the form to be destroyed during AfterConstruction , then this is the assignment of a new TAActionList.Create instance to FActionList , which actually calls AV. Another way of testing would be to first create an object for a local variable, and then assign it to the class: FActionList := LActionList .

Environmental problems may be minor. For example. We use a reporting component, which, as we discovered, requires the installation of a printer driver, otherwise this will prevent our application from starting.

You can validate the destruction theory by setting a global variable in the form destructor. You can also output stack traces from the destructor to confirm the exact sequence leading to the destruction of the form.

+2
source

What to do when MadExcept is NOT WANTED (which is rare, I have to say):

  • Try Jedi JCL JCLDEBUG instead. You can get a stack trace with it if you change MadExcept for JCLDEBUG and immediately write the stack trace to disk without PERSONAL user interaction.

  • Run a debugging tool, such as debugview MS / SysInternals, and trace output, such as pointers to Self objects in which problems occur. I suspect the INVALID instance pointer is somehow appearing.

  • Separate things and recycle things, and write unit tests until you find a really ugly affair that will break you. (Someone suggested that heap corruption. I often find that cumulative damage goes hand in hand with unsafe ugly unverified code and deeply related to the cascade of UI + errors.)

0
source

All Articles