How about an evolutionary algorithm.
Start with an assumption. 1 thread per processor core seems good, but depends on the task.
Measure the average time for each task in the generation. Compare it with the time that was adopted by the previous generation. (Assume effectively infinite time and 0 threads to generate 0).
If the tasks of the last generation averaged a better time than the previous one, continue to change the number of threads in the same direction as the last step (so if the last generation had more threads than the previous thread, add a thread for the new generation, but if there was one less then use less (obviously with a lower limit of 1 thread).
If the most recent generation tasks took longer, on average, than the previous generation, then change the number of threads in the opposite direction (therefore, if increasing the number of threads leads to worse time, use another thread next time).
As long as the optimal number of threads is not too close to 1, you will probably end up wobbling between 3 values that are all pretty close to optimal. You can clearly detect this case and block yourself in a central sense if you have a large number of generations to deal with.
source share