Strange SEGFAULTS using fprintf

I have a very difficult debugging of a multi-threaded C application, in which I made a few changes. I could not use GDB to help identify the problem (see below code for more information).

The following code is from one of the tasks open in its thread. I pulled out most of the code following this issue.

void tskProcessTenMinuteTables(void *input) { /* Check the minute as soon as we start. If we're started on a ten min * boundary, sleep for one minute. */ time_t now; time_t wakeup; struct tm *next_tick_ptr; now = time(NULL); next_tick_ptr = localtime(&now); /* returns a time struct populated w/ next ten min boundary */ GetNextTenMinBoundary(next_tick_ptr); wakeup = mktime(next_tick_ptr); sleep(2); /* Without this sleep, the following if() was always true. */ if(next_tick_ptr->tm_min % 10 == 0) { fprintf(stderr, "On tenmin boundary on initialization.. task sleeping for 60 seconds.\n"); /* debug statements to test the cause of segfault. */ fprintf(stderr, "NOM NOM NOM\n"); printf( "Test%d\n", 1); fprintf(stderr, "Test%d\n", 2); /* <~~~ This statement is the guilty party */ sleep(60); } /* Main loop. Every loop besides the tick itself will consist only * of a call to time and a comparison of current stamp with wakeup. * this should be pretty light on the processing side. * * Re-implement this as a sleep/awake with a signal in the future. */ while(1) { now = time(NULL); if( now >= wakeup ) { fprintf(stderr, "Triggered 1.\n"); fprintf(stderr, "Triggered 2.\n"); char statement[150]; fprintf(stderr, "Triggered 3.\n"); sprintf(statement, "SELECT ten_min_end(%d::int2)",GetTenMinPeriodNumber()); fprintf(stderr, "Triggered 4.\n"); DBCallStoredProcedure(statement); fprintf(stderr, "Triggered 5.\n"); } } 

The reason is trying to use fprintf with (?) Args variables. Calling it without anything but a template. Printf works with or without arguments.

 fprintf(stderr, "Hi #%d.\n", 1); <~~ segfault fprintf(stderr, "Hi #1.\n"); <~~ works printf("Hi #%d.\n", 1); <~~ works printf("Hi #1.\n"); <~~ works 

When run in gdb, I get the following spewage before gdb stops responding. To complete, you must kill -9.

 $gdb ir_client (gdb) r Starting program: /home/ziop/Experimental_IR_Clients/ir-10-20/IR_Client/obj-linux-x86/ir_client [Thread debugging using libthread_db enabled] [New Thread 0xb7fe5b70 (LWP 32269)] [New Thread 0xb7fc4b70 (LWP 32270)] (032266 - -1208067216) 20-Oct-2010 10:56:19.59 - IR_Client_ConnectCmdPort - Socket connected. [New Thread 0xb7ffdb70 (LWP 32272)] (032266 - main thread) 20-Oct-2010 10:56:19.59 - sl_exit - Exiting thread with code 0. On tenmin boundary on initialization.. task sleeping for 60 seconds. NOM NOM NOM Test1 

I am new to C, so this may be something obvious. My first thought was that with an unbuffered output it was not thread safe, but fprintf always succeeds if the variable is not passed. Pthread funkiness is still my prime suspect. Unfortunately, I still adhere to the architecture.

Thanks in advance.

+4
source share
2 answers

The first step is to try to run the function without entering threads. Just write a .c file that has main , which does a minimum to get ready to start the thread, and then instead it just calls the function. It is easier to debug if you can recreate the problem with just one thread.

In addition, if you are using gcc, you must compile with:

 -fstack-protector-all -Wstack-protector -fno-omit-frame-pointer 

in addition to your normal flags (at least until you find the problem). This will help with debugging and may give more warnings at compile time. I assume that you know how the -O flags can affect debugging ability and functionality (especially if you are already doing something wrong or undefined in C code).

When you are in GDB and everything looks like they are locked or the program takes a long time to do something, you can usually press CTRL Z to return to (gdb) without killing the program. This gives a stop signal to the program and allows you to interact again with GDB so that you can find out what the program actually does.

Edit

I, apparently, solved the problem in discussing comments, so I will write what the problem is.

A quick look at the code does not indicate a problem that would lead to a segmentation error (access to illegal memory), and Zypsy (OP) told me that the function works fine when called directly from the main, and not in a run through a separate thread.

Valgrind reported that the thread stack space could not be expanded to a specific address. On Linux, the main thread stack is mapped to the application in such a way that it can grow easily, but this often fails when memory is allocated for thread stacks.

I asked Zypsy (OP) to insert some code that will output the address of what is known to be low on the thread stack ( printf("thread stk = %p\n", &input); ) so that this value can be compared with the address indicated in the error message, From this I could guess the size of the stack. This did not suggest that between the start of the stream function and its failure a lot of stack space was spent, but the space also did not seem too small for the code in question (it apparently turned out to be too small).

Since the pthread_create function allows you to either accept the attributes of the thread (pass to NULL ) or pass an argument defining the various parameters for the thread, which I asked if the code that called pthread_create could be published, so that I could see if there were any suspicious settings.

After looking at this code, it (an application-specific shell for various pthread_ functions), I saw that some attributes related to the stack were actually set. I asked the OP to look at the calls to this function and look at suspicious things related to how the stack was allocated (make sure that the size value and the size of the allocated memory actually match). It turned out that the OP found that this stack stream is allocated less than the stacks of other threads. The stack was too small.

+3
source

Usually - these problems are related to memory corruption. Symptoms, such as inconsistent segfaults on different lines when you slightly change the code, are a great example.

Try to run your program with a tool like valgrind , you are guaranteed to see some illegal memory accesses. Correct them, and I suspect that everything will work.

0
source

All Articles