Debugging a crash when opening a library via dlopen on OSX

I have a problem with a C ++ application that I developed that uses dlopen to load custom libraries. Over the past few years, the application has been used by many people on various Linux distributions and OSX versions, and therefore I assume that my use of dlopen is fine, and so depends on this code (yes, this is arrogance, so I will report when it does not work). The problem that I have now is that the user has developed a library that does not load on my system (OSX 10.6.4). When the system tries to boot it, it hangs and then crashes. A thread that fails looks like this in the crash report:

Thread 5 Crashed: 0 com.apple.CoreFoundation 0x00007fff80fa6110 __CFInitialize + 1808 1 dyld 0x00007fff5fc0d5ce ImageLoaderMachO::doImageInit(ImageLoader::LinkContext const&) + 138 2 dyld 0x00007fff5fc0d607 ImageLoaderMachO::doInitialization(ImageLoader::LinkContext const&) + 27 3 dyld 0x00007fff5fc0bcec ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int) + 236 4 dyld 0x00007fff5fc0bc9d ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int) + 157 5 dyld 0x00007fff5fc0bc9d ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int) + 157 6 dyld 0x00007fff5fc0bc9d ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int) + 157 7 dyld 0x00007fff5fc0bc9d ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int) + 157 8 dyld 0x00007fff5fc0bc9d ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int) + 157 9 dyld 0x00007fff5fc0bda6 ImageLoader::runInitializers(ImageLoader::LinkContext const&) + 58 10 dyld 0x00007fff5fc08fbb dlopen + 573 11 libSystem.B.dylib 0x00007fff816492c0 dlopen + 61 12 cast-server-c++ 0x0000000100007819 cast::loadLibrary(std::string const&) + 96 (ComponentCreator.cpp:43) 13 cast-server-c++ 0x00000001000079c7 cast::createComponentCreator(std::string const&) + 24 (ComponentCreator.cpp:87) 14 cast-server-c++ 0x00000001000089c5 cast::CASTComponentFactory::createBase(std::string const&, std::string const&, Ice::Current const&) + 197 (CASTComponentFactory.cpp:27) 15 cast-server-c++ 0x00000001000090e9 cast::CASTComponentFactory::newManagedComponent(std::string const&, std::string const&, bool, Ice::Current const&) + 73 (CASTComponentFactory.cpp:62) 16 libCDL.dylib 0x00000001009ceb6c cast::interfaces::ComponentFactory::___newManagedComponent(IceInternal::Incoming&, Ice::Current const&) + 218 (CDL.cpp:14904) 17 libCDL.dylib 0x00000001009cf1d0 cast::interfaces::ComponentFactory::__dispatch(IceInternal::Incoming&, Ice::Current const&) + 572 (CDL.cpp:15057) 18 libIce.3.3.1.dylib 0x00000001000c9078 IceInternal::Incoming::invoke(IceInternal::Handle<IceInternal::ServantManager> const&) + 2004 (Incoming.cpp:484) 19 libIce.3.3.1.dylib 0x0000000100091a5d Ice::ConnectionI::invokeAll(IceInternal::BasicStream&, int, int, unsigned char, IceInternal::Handle<IceInternal::ServantManager> const&, IceInternal::Handle<Ice::ObjectAdapter> const&) + 367 (ConnectionI.cpp:2436) 20 libIce.3.3.1.dylib 0x000000010009bb40 Ice::ConnectionI::message(IceInternal::BasicStream&, IceInternal::Handle<IceInternal::ThreadPool> const&) + 416 (ConnectionI.cpp:1105) 21 libIce.3.3.1.dylib 0x00000001001a9bbc IceInternal::ThreadPool::run() + 3470 (ThreadPool.cpp:523) 22 libIce.3.3.1.dylib 0x00000001001aa4ec IceInternal::ThreadPool::EventHandlerThread::run() + 152 (ThreadPool.cpp:782) 23 libIceUtil.3.3.1.dylib 0x00000001006eb1e9 startHook + 128 (Thread.cpp:375) 24 libSystem.B.dylib 0x00007fff8167c456 _pthread_start + 331 25 libSystem.B.dylib 0x00007fff8167c309 thread_start + 13 

(I can publish the full journal if necessary, but it exceeds the body text limit if I include it in my post)

In the terminal where I run the executable, the crash does not produce output, except for the notification that the script launching the executable has trapped the signal.

My question is how to get more information about what could cause this crash? I am also glad if someone can suggest possible solutions, but for a start I would at least want to know how to generate more information when the system crashes about what is actually wrong.

If I run otool in a library that dlopen initially opens, everything looks fine (without missing links, characters, etc.). My main suspicion is that it is the particular combination of libraries that the downloadable library is associated with that causes this crash. These other libraries can be loaded that use different subsets of these linked libraries. For recording, libraries include X11, ZeroC Ice, Player / Stage, and OpenCV (with the latter 2 compiled manually with dependencies installed using MacPorts). It seems that turning on OpenCV is causing a problem, since other libraries that link to all of them except OpenCV can be loaded without problems. These are my suspicions, but currently I do not have enough know-how for further investigation.

Thanks! Nick

UPDATE: Thanks to Kaelin's answers (the DYLD_PRINT_ * parameters that I did not know about before), I was able to at least confirm that nothing completely obvious is happening. Using debugging information, I was able to narrow down the problem to one specific library that caused the crash. It turned out that this library (libdc1394, connected to my application via libhighgui in OpenCV) was not properly connected to CoreServices, and this caused a crash. For some reason, the error was hidden by other things, which led to the collapse. For information on the libdc1394 problem, see here . Unfortunately, I could not make a clean mistake, which I can report here, so I just managed to get a version of the application that did not refer to the dodgy library (disabling libdc1394 in the OpenCV compilation).

+7
c ++ debugging dlopen macos
source share
2 answers

dyld starts initializers in the shared library (I think static initializers in C ++), and one of them calls the CoreFoundation __CFInitialize function. [Perhaps this is the first thing about CoreFoundation?] And for some reason, __CFInitialize is not happy. It may be some kind of missing addiction. Or it could be a bunch of damaged. Or it could be a hidden bug in the CoreFoundation structure.

I would suggest trimming the first two possibilities: a) by running with all the environment variables DYLD_PRINT_ * set [see man dyld ] and b) running under MallocDebug. If none of them shed light, you'll probably be left to write a radar for CoreFoundation people to watch.

+3
source share

After some further problems and some further Googling, I eventually found the real cause of my problem.

You cannot call the dlopen library associated with CoreFoundation in a (sub) stream if CoreFoundation was not initialized in the first place. CFInitialize is invoked, apparently checking if the thread is the main thread, and if not, the SIGTRAP failure.

http://openradar.appspot.com/7209349

+7
source share

All Articles