The default size for the stack over threads artificially sets a limit in your test. Although the default stack assigned to a process (source thread) dynamically grows as needed, the stacks for other threads are fixed in size. The default size is usually very large, something like two megabytes to make sure that the stacks for streams are large enough even for pathological cases (deep recursion, etc.).
In most cases, thread workers require very little stack. I found that on all the architectures I use, a 64k (65536 bytes) stack per thread is sufficient unless I use deep recursive algorithms or large local variables (structures or arrays).
To explicitly indicate the size of the stack in the stream, change your main() to the following:
#define MAXTHREADS 1000000 #define THREADSTACK 65536 int main(int argc, char *argv[]) { pthread_t pid[MAXTHREADS]; pthread_attr_t attrs; int err, i; int cnt = 0; pthread_attr_init(&attrs); pthread_attr_setstacksize(&attrs, THREADSTACK); pthread_mutex_init(&mutex_, NULL); for (cnt = 0; cnt < MAXTHREADS; cnt++) { err = pthread_create(&pid[cnt], &attrs, (void*)inc_thread_nr, NULL); if (err != 0) break; } pthread_attr_destroy(&attrs); for (i = 0; i < cnt; i++) pthread_join(pid[i], NULL); pthread_mutex_destroy(&mutex_); printf("Maximum number of threads per process is %d (%d)\n", cnt, thread_nr); }
Note that attrs not consumed by calling pthread_create() . Think that the attributes of the thread are more like the pattern on which pthread_create() should create threads; they are not attributes passed to the stream. This repels many novice pthreads programmers, so this is one of those things that you better get right from the output.
As for the stack size itself, it should be at least PTHREAD_STACK_MIN (16384 on Linux, I think) and is divided by sysconf(_SC_PAGESIZE) . Since page size is the power of two on all architectures, using enough power of the two should always work.
In addition, I added a fix there too. You are trying to join a non-existent thread (the one that the loop tried to create but failed), but you need to join them (to make sure that they all completed their work).
Further recommended fixes:
Instead of using sleep, use a condition variable. Ask each thread to wait ( pthread_cond_wait() ) in the condition variable (while holding the mutex), then release the mutex and exit. Thus, your main function should only broadcast ( pthread_cond_broadcast() ) in the condition variable to report all the threads that they can now exit, then it can join each of them, and you can be sure that this number of threads was simultaneously launched. Since your code is standing now, some threads may have enough time to wake up from sleep and exit.