Update, 4/10 2012: Fixed by libc patch
I have a problem with canceling threads in pthread_cond_wait that use mutexes with a set of attributes PTHREAD_PRIO_INHERIT . This only happens on some platforms.
The following minimal example demonstrates this: (compile with g++ <filename>.cpp -lpthread )
#include <pthread.h>
Every time I run it, main() hangs on pthread_join() . The output from gdb shows the following:
Thread 2 (Thread 0xb7d15b70 (LWP 257)): #0 0xb7fde430 in __kernel_vsyscall () #1 0xb7fcf362 in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:142 #2 0xb7fcc9f9 in __condvar_w_cleanup () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/pthread_cond_wait.S:434 #3 0x08048fbe in threadFunc (arg=0x0) at /home/pthread_cond_wait.cpp:22 #4 0xb7fc8ca0 in start_thread (arg=0xb7d15b70) at pthread_create.c:301 #5 0xb7de73ae in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130 Thread 1 (Thread 0xb7d166d0 (LWP 254)): #0 0xb7fde430 in __kernel_vsyscall () #1 0xb7fc9d64 in pthread_join (threadid=3083950960, thread_return=0x0) at pthread_join.c:89 #2 0x0804914a in main () at /home/pthread_cond_wait.cpp:41
If PTHREAD_PRIO_INHERIT not set in the mutex, everything works as it should, and the program crashes.
Platforms with problems:
- AMD Fusion integrated board running PTXDist based on 32-bit Linux 3.2.9-rt16 (with RTpatch 16). We use the latest OSELAS i686 cross toolchain (2011.11.1), using gcc 4.6.2, glibc 2.14.1, binutils 2.21. 1a, kernel 2.6.39.
- The same board with the instrumental combination 2011.03.1 as well (gcc 4.5.2 / glibc 2.13 / binutils 2.18 / kernel 2.6.36).
Platforms without problems:
- Our own ARM board, also working with PTXDist Linux (32-bit version 2.6.29.6-rt23), using OSELAS arm-v4t cross-tool binding (1.99.3) with gcc 4.3.2 / glibc 2.8 / binutils 2.18 / kernel 2.6 .27.
- My laptop (Intel Core i7) running on 64-bit Ubuntu 11.04 (virtualized / kernel 2.6.38.15-generic), gcc 4.5.2 / eglibc 2.13-0ubuntu13.1 / binutils 2.21.0.20110327.
I was browsing the network for solutions and came across several fixes that I tried, but without any effects:
Are we doing something wrong in our code that just happens on some platforms, or is this a bug in the underlying systems? If anyone knows where to look, or knows any patches or something like that, I would be happy to hear about it.
Thanks!
Update:
source share