Finally it became clear what the problem was. In RHEL 5.4, if we call sem_init and then sem_timedwait, we get a somewhat random wait time behavior, depending on where the code is located, regardless of whether the sem_t object belongs to the heap or the stack, etc. Sometimes the wait time returns immediately with errno = 38 (ENOSYS), sometimes it returns correctly before returning.
Running through valgrind gives this error:
==32459== Thread 2: ==32459== Syscall param futex(op) contains uninitialised byte(s) ==32459== at 0x406C78: sem_timedwait (in /lib/libpthread-2.5.so) ==32459== by 0x8049F2E: TestThread::Run() (in /home/stsadm/semaphore_test/semaphore_test) ==32459== by 0x44B2307: nxThread::_ThreadProc(void*) (in /home/stsadm/semaphore_test/libcore.so) ==32459== by 0x4005AA: start_thread (in /lib/libpthread-2.5.so) ==32459== by 0x355CFD: clone (in /lib/libc-2.5.so)
If I run the exact same code in RHEL 5.2, the problem disappears and valgrind does not report errors.
If I do a memset in the sem_t variable before calling sem_init, the problem disappears on RHEL 5.4
memset( &_semaphore, 0, sizeof( sem_t ) );
So it looks like the error was introduced using semaphores on RHEL5.4 or something that it uses internally, and sem_init incorrectly initializes sem_t memory. Or, sem_timed wait has changed to be sensitive to this as before.
Interestingly, by no means does sem_init return an error to indicate that it does not work.
Alternatively, if the expected behavior is that sem_init will not initialize the memory of sem_t and as for the caller, then the behavior has certainly changed with RHEL 5.4
pxb
Update - here is the code of the test code if someone else wants to try it. Note that the problem only occurs when sem_timedwait is called from .so, and only RHEL5.4 (maybe 5.3 did not test it) and only when building as a 32-bit binary (link to 32-bit libraries, of course)
1) in semtest.cpp
#include <semaphore.h> #include <stdio.h> #include <string.h> #include <errno.h> #include <time.h> void semtest( int semnum, bool initmem ) { sem_t sem; if ( initmem ) { memset( &sem, 0, sizeof( sem_t ) ); printf( "sem %d: memset size = %d\n", semnum, sizeof( sem_t ) ); } errno = 0; int res = sem_init( &sem, 0, 0 ); printf( "sem %d: sem_init res = %d, errno = %d\n", semnum, res, errno ); timespec ts; clock_gettime( CLOCK_REALTIME, &ts ); ts.tv_sec += 1; errno = 0; res = sem_timedwait( &sem, &ts ); printf( "sem %d: sem_timedwait res = %d, errno = %d\n\n", semnum, res, errno ); }
2) in main.cpp (pay attention to the duplicated test function, so that we can compare the work inside .so with exe)
#include <semaphore.h> #include <stdio.h> #include <string.h> #include <errno.h> #include <time.h> extern void semtest( int semnum, bool initmem ); void semtest_in_exe( int semnum, bool initmem ) { sem_t sem; if ( initmem ) { memset( &sem, 0, sizeof( sem_t ) ); printf( "sem %d: memset size = %d\n", semnum, sizeof( sem_t ) ); } errno = 0; int res = sem_init( &sem, 0, 0 ); printf( "sem %d: sem_init res = %d, errno = %d\n", semnum, res, errno ); timespec ts; clock_gettime( CLOCK_REALTIME, &ts ); ts.tv_sec += 1; errno = 0; res = sem_timedwait( &sem, &ts ); printf( "sem %d: sem_timedwait res = %d, errno = %d\n\n", semnum, res, errno ); } int main(int argc, char* argv[], char** envp) { semtest( 1, false ); semtest( 2, true ); semtest_in_exe( 3, false ); semtest_in_exe( 4, true ); }
3) makefile here
all: main semtest.o: semtest.cpp gcc -c -fpic -m32 -I /usr/include/c++/4.1.2 -I /usr/include/c++/4.1.2/i386-redhat-linux semtest.cpp -o semtest.o libsemtest.so: semtest.o gcc -shared -m32 -fpic -lstdc++ -lrt semtest.o -o libsemtest.so main: libsemtest.so gcc -m32 -L . -lsemtest main.cpp -o semtest
In test cases:
- executed from within .so without executing memset
- executed from within .so and do memset
- executed from inside exe without executing memset
- run exe from inside and execute memset
And here is the result running on RHEL5.4
sem 1: sem_init res = 0, errno = 0 sem 1: sem_timedwait res = -1, errno = 38 sem 2: memset size = 16 sem 2: sem_init res = 0, errno = 0 sem 2: sem_timedwait res = -1, errno = 110 sem 3: sem_init res = 0, errno = 0 sem 3: sem_timedwait res = -1, errno = 110 sem 4: memset size = 16 sem 4: sem_init res = 0, errno = 0 sem 4: sem_timedwait res = -1, errno = 110
You can see that case 1 returns immediately with errno = 38.
If we run the same code in RHEL5.2, we get the following:
sem 1: sem_init res = 0, errno = 0 sem 1: sem_timedwait res = -1, errno = 110 sem 2: memset size = 16 sem 2: sem_init res = 0, errno = 0 sem 2: sem_timedwait res = -1, errno = 110 sem 3: sem_init res = 0, errno = 0 sem 3: sem_timedwait res = -1, errno = 110 sem 4: memset size = 16 sem 4: sem_init res = 0, errno = 0 sem 4: sem_timedwait res = -1, errno = 110
You can see that all cases now work as expected!