Recursive fib with streams, segmentation error?

Question

Recursive fib with streams, segmentation error?

Any ideas why it works great for values of type 0, 1, 2, 3, 4 ... and seg for values of type> 15? #include #include #include

void *fib(void *fibToFind); main(){ pthread_t mainthread; long fibToFind = 15; long finalFib; pthread_create(&mainthread,NULL,fib,(void*) fibToFind); pthread_join(mainthread,(void*)&finalFib); printf("The number is: %d\n",finalFib); } void *fib(void *fibToFind){ long retval; long newFibToFind = ((long)fibToFind); long returnMinusOne; long returnMinustwo; pthread_t minusone; pthread_t minustwo; if(newFibToFind == 0 || newFibToFind == 1) return newFibToFind; else{ long newFibToFind1 = ((long)fibToFind) - 1; long newFibToFind2 = ((long)fibToFind) - 2; pthread_create(&minusone,NULL,fib,(void*) newFibToFind1); pthread_create(&minustwo,NULL,fib,(void*) newFibToFind2); pthread_join(minusone,(void*)&returnMinusOne); pthread_join(minustwo,(void*)&returnMinustwo); return returnMinusOne + returnMinustwo; } }

+2

c pthreads programming-languages fibonacci

cnandreu Feb 23 '11 at 1:22

source share

3 answers

You do not check for errors - in particular from pthread_create() . When pthread_create() fails, the pthread_t variable remains undefined, and subsequent pthread_join() may fail.

If you check for errors, you will find that pthread_create() not working. This is due to the fact that you are trying to create almost 2000 threads - with the default settings, for this you will need to allocate 16 GB of thread stacks.

You should revise your algorithm so that it does not generate as many threads.

0

caf Feb 23 '11 at 1:36

source share

I tried running your code and came across a few surprises:

 printf("The number is: %d\n", finalFib);

This line has a small error: %d means printf expects an int , but a long int is passed. On most platforms, this is the same thing, or will have the same behavior anyway, but pedantically (or if you just want to stop the warning to get up, which is also a very noble ideal), you should use %ld instead, which will expect a long int .

Your fib function, on the other hand, seems dysfunctional. While checking it on my car, it does not fall, but it gives 1047 , which is not a Fibonacci number. If you look closely, it seems that your program is incorrect in several respects:

 void *fib(void *fibToFind) { long retval; // retval is never used long newFibToFind = ((long)fibToFind); long returnMinusOne; // variable is read but never initialized long returnMinustwo; // variable is read but never initialized pthread_t minusone; // variable is never used (?) pthread_t minustwo; // variable is never used if(newFibToFind == 0 || newFibToFind == 1) // you miss a cast here (but you really shouldn't do it this way) return newFibToFind; else{ long newFibToFind1 = ((long)fibToFind) - 1; // variable is never used long newFibToFind2 = ((long)fibToFind) - 2; // variable is never used // reading undefined variables (and missing a cast) return returnMinusOne + returnMinustwo; } }

Always watch for compiler warnings: when you get them, usually you really do something suspicious.

Perhaps you should revise the algorithm a bit: right now, all your function returns the sum of two undefined values, so I got 1047 earlier.

Implementing the Fibonacci package using a recursive algorithm means you need to call the function again. As others have noted, this is a pretty inefficient way to do this, but it's easy, so I think all computer science teachers use it as an example.

A regular recursive algorithm is as follows:

 int fibonacci(int iteration) { if (iteration == 0 || iteration == 1) return 1; return fibonacci(iteration - 1) + fibonacci(iteration - 2); }

I don’t know to what extent you should use threads - just run the algorithm on the secondary thread or create new threads for each call? Suppose to be the first at the moment, as it is much more straightforward.

Integer numbers for pointers and vice versa is bad practice because if you are trying to look at things at a higher level, they should be different. Integers do the math, and pointers resolve memory addresses. This happens because they are represented identically, but in fact you should not do this. Instead, you may notice that the function called to start your new thread takes the void* argument: we can use it to pass both where the input is and where the output will be.

Thus, based on my previous fibonacci function, you can use this code as the main thread program:

 void* fibonacci_offshored(void* pointer) { int* pointer_to_number = pointer; int input = *pointer_to_number; *pointer_to_number = fibonacci(input); return NULL; }

It expects a pointer to an integer and takes its input from it, then writes its output. ¹ Then you would create a thread like this:

 int main() { int value = 15; pthread_t thread; // on input, value should contain the number of iterations; // after the end of the function, it will contain the result of // the fibonacci function int result = pthread_create(&thread, NULL, fibonacci_offshored, &value); // error checking is important! try to crash gracefully at the very least if (result != 0) { perror("pthread_create"); return 1; } if (pthread_join(thread, NULL) { perror("pthread_join"); return 1; } // now, value contains the output of the fibonacci function // (note that value is an int, so just %d is fine) printf("The value is %d\n", value); return 0; }

If you need to call the Fibonacci function from new separate threads (note: this is not what I would recommend, but others seem to agree with me, it just explodes for a sufficiently large number of iterations), you first need to combine the fibonacci function with fibonacci_offshored functions. This will greatly expand it, because dealing with threads is harder than working with regular functions.

 void* threaded_fibonacci(void* pointer) { int* pointer_to_number = pointer; int input = *pointer_to_number; if (input == 0 || input == 1) { *pointer_to_number = 1; return NULL; } // we need one argument per thread int minus_one_number = input - 1; int minus_two_number = input - 2; pthread_t minus_one; pthread_t minus_two; // don't forget to check! especially that in a recursive function where the // recursion set actually grows instead of shrinking, you're bound to fail // at some point if (pthread_create(&minus_one, NULL, threaded_fibonacci, &minus_one_number) != 0) { perror("pthread_create"); *pointer_to_number = 0; return NULL; } if (pthread_create(&minus_two, NULL, threaded_fibonacci, &minus_two_number) != 0) { perror("pthread_create"); *pointer_to_number = 0; return NULL; } if (pthread_join(minus_one, NULL) != 0) { perror("pthread_join"); *pointer_to_number = 0; return NULL; } if (pthread_join(minus_two, NULL) != 0) { perror("pthread_join"); *pointer_to_number = 0; return NULL; } *pointer_to_number = minus_one_number + minus_two_number; return NULL; }

Now that you have this cumbersome function, setting up your main function will be pretty simple: just change the link to fibonacci_offshored to threaded_fibonacci .

 int main() { int value = 15; pthread_t thread; int result = pthread_create(&thread, NULL, threaded_fibonacci, &value); if (result != 0) { perror("pthread_create"); return 1; } pthread_join(thread, NULL); printf("The value is %d\n", value); return 0; }

You may have been told that threads are speeding up parallel processes, but there somewhere is somewhere more expensive to set up a thread than to run its contents. This is a very good example of this situation : the streaming version of the program is much slower than the non-streaming version.

For educational purposes, this program ends with threads on my machine when the number of desired iterations is 18 and takes several seconds. For comparison, using an iterative implementation, we never run out of threads, and we have our answer in milliseconds. It is also much simpler. This would be a great example of how using the best algorithm solves many problems.

Also, out of curiosity, it would be interesting to know if your machine crashes and where / how.

<sub> 1. As a rule, you should try to avoid changing the value of a variable between its value at the input and its value after the function returns. For example, here, as you type, the variable represents the number of iterations we want; the output is the result of the function. These are two very different meanings, and this is not a good practice. I did not want to use dynamic allocations to return a value through the return value of void* .

0

zneak Feb 23 '11 at 4:58

source share

Ira Baxter · Accepted Answer · 2011-02-23T01:29:46+0000

Runs from memory (from stack space) or the correct thread handles?

You are requesting a lot of threads that require a lot of stack / context. Windows (and Linux) has the stupid idea of a "large [contiguous] stack."

From the pthreads_create documentation: "On Linux / x86-32, the default stack size for a new thread is 2 megabytes." If you create 10,000 threads, you need 20 GB of RAM. I built a version of the OP program and it bombed about 3,500 (p) streams on Windows XP64.

See this SO thread for more details on why big stacks are really a bad idea: Why are stack overflow problems still a problem?

If you give up large stacks and implement a parallel language with heap allocation for activation records (our PARLANSE is one of them) the problem goes away.

Here is the first (sequential) program written in PARLANSE:

 (define fibonacci_argument 45) (define fibonacci (lambda(function natural natural )function `Given n, computes nth fibonacci number' (ifthenelse (<= ? 1) ? (+ (fibonacci (-- ?)) (fibonacci (- ? 2)) )+ )ifthenelse )lambda )define

Here's the execution on i7:

  C:\DMS\Domains\PARLANSE\Tools\PerformanceTest>run fibonaccisequential Starting Sequential Fibonacci(45)...Runtime: 33.752067 seconds Result: 1134903170

Here's the second one, which is parallel:

 (define coarse_grain_threshold 30) ; technology constant: tune to amortize fork overhead across lots of work (define parallel_fibonacci (lambda (function natural natural )function `Given n, computes nth fibonacci number' (ifthenelse (<= ? coarse_grain_threshold) (fibonacci ?) (let (;; [n natural ] [m natural ] ) (value (|| (= m (parallel_fibonacci (-- ?)) )= (= n (parallel_fibonacci (- ? 2)) )= )|| (+ mn) )value )let )ifthenelse )lambda )define

Creating an explicit parallelism expression simplifies working with programs.

The parallel version we are testing by calling (parallel_fibonacci 45). This is the launch of execution on the same i7 (which, possibly, has 8 processors, but in reality it is 4 processors with a hyperthread, so in fact these are not exactly 8 equivalent processors):

 C:\DMS\Domains\PARLANSE\Tools\PerformanceTest>run fibonacciparallelcoarse Parallel Coarse-grain Fibonacci(45) with cutoff 30...Runtime: 5.511126 seconds Result: 1134903170

Acceleration of about 6+, not bad for non-8 processors. One of the other answers to this question was in the pthreads version; it took a “few seconds” (explode) to calculate Fib (18), and it's 5.5 seconds for Fib (45). That tells you pthreads is a fundamentally bad way to make a lot of fine grain parallelism, because it really, really tall waybill. (PARLANSE is designed to minimize overhead).

Here's what happens if you set the constant technology to zero (forks on every call to fib):

 C:\DMS\Domains\PARLANSE\Tools\PerformanceTest>run fibonacciparallel Starting Parallel Fibonacci(45)...Runtime: 15.578779 seconds Result: 1134903170

You can see that the depreciation overhead is a good idea, even if you have fast forks.

Fib (45) produces many grains. Distributing a bunch of activation records solves the first-order OP problem (thousands of pthreads each with 1 MB of stack burn gigabytes of RAM).

But there is a second-order problem: 2 ^ 45 PARLANSE Grains also burn your entire memory simply by tracking the grain, even if your grain control unit is tiny. Thus, it helps to have a scheduler that throttles the forks when you have a lot, (for some definition of a lot, much less than 2 ^ 45) grains to prevent a parallelism explosion from soaking the machine using grain tracking data structures. He needs to unfurl the forks when the number of grains falls below the threshold also to make sure that there is always a lot of logical parallel operation for physical processors.

Recursive fib with streams, segmentation error?

More articles: