I have my own thread pool pool, which creates some threads, each of which is waiting for its own event (signal). When a new task is added to the thread pool, it wakes up the first free thread to complete the task.
The problem is this: I have about 1000 cycles of each of 10,000 reps. These loops should be executed sequentially, but I have 4 processors. What I'm trying to do is to split 10'000 iteration cycles into 4,200 iteration cycles, i.e. one per stream. But I have to wait until the completion of 4 small cycles before proceeding to the next "big" iteration. This means that I cannot link the tasks.
My problem is that using a pool of threads and 4 threads is much slower than running tasks sequentially (one loop being executed by a separate thread is much slower than running it directly in the main thread sequentially).
I am on Windows, so I create events with CreateEvent() and then wait for one of them with WaitForMultipleObjects(2, handles, false, INFINITE) until the main thread calls SetEvent() .
It seems that this whole thing (along with thread synchronization using critical partitions) is pretty expensive!
My question is: is it normal that using events takes a lot of time? If so, is there any other mechanism that I could use, and that would be less expensive?
Here is some code to illustrate (some relevant parts copied from the thread pool class):
// thread function unsigned __stdcall ThreadPool::threadFunction(void* params) { // some housekeeping HANDLE signals[2]; signals[0] = waitSignal; signals[1] = endSignal; do { // wait for one of the signals waitResult = WaitForMultipleObjects(2, signals, false, INFINITE); // try to get the next job parameters; if (tp->getNextJob(threadId, data)) { // execute job void* output = jobFunc(data.params); // tell thread pool that we're done and collect output tp->collectOutput(data.ID, output); } tp->threadDone(threadId); } while (waitResult - WAIT_OBJECT_0 == 0); // if we reach this point, endSignal was sent, so we are done ! return 0; } // create all threads for (int i = 0; i < nbThreads; ++i) { threadData data; unsigned int threadId = 0; char eventName[20]; sprintf_s(eventName, 20, "WaitSignal_%d", i); data.handle = (HANDLE) _beginthreadex(NULL, 0, ThreadPool::threadFunction, this, CREATE_SUSPENDED, &threadId); data.threadId = threadId; data.busy = false; data.waitSignal = CreateEvent(NULL, true, false, eventName); this->threads[threadId] = data; // start thread ResumeThread(data.handle); } // add job void ThreadPool::addJob(int jobId, void* params) { // housekeeping EnterCriticalSection(&(this->mutex)); // first, insert parameters in the list this->jobs.push_back(job); // then, find the first free thread and wake it for (it = this->threads.begin(); it != this->threads.end(); ++it) { thread = (threadData) it->second; if (!thread.busy) { this->threads[thread.threadId].busy = true; ++(this->nbActiveThreads); // wake thread such that it gets the next params and runs them SetEvent(thread.waitSignal); break; } } LeaveCriticalSection(&(this->mutex)); }