Optimization algorithm # of threads used in the calculation

I perform the operation by giving it a call to CalculateSomeData. CalculateSomeData works in successive "generations", numbered 1..x. The number of generations in the entire run is fixed by the input parameters for CalculateSomeData and is known a priori. One generation takes from 30 minutes to 2 hours. Some of this variability is due to input parameters and cannot be controlled. However, part of this variability is due to such things as hardware capabilities, CPU utilization from other processes, network bandwidth, etc. One parameter that can be controlled over a generation is the number of threads used by CalculateSomeData. Right now, that is fixed and probably not optimal. I would like to track the time that each generation takes, and then has some algorithm with which I adjust the number of threads so that each subsequent generation improves on the time of calculating the previous generation (minimizing the time). Which approach should I use? How applicable are genetic algorithms? Intuition tells me that the range will be pretty tough - from 1 to 16 threads on a dual core processor with four cores.

any pointers, pseudo codes, etc. very much appreciated.

+4
source share
3 answers

How about an evolutionary algorithm.

Start with an assumption. 1 thread per processor core seems good, but depends on the task.

Measure the average time for each task in the generation. Compare it with the time that was adopted by the previous generation. (Assume effectively infinite time and 0 threads to generate 0).

If the tasks of the last generation averaged a better time than the previous one, continue to change the number of threads in the same direction as the last step (so if the last generation had more threads than the previous thread, add a thread for the new generation, but if there was one less then use less (obviously with a lower limit of 1 thread).

If the most recent generation tasks took longer, on average, than the previous generation, then change the number of threads in the opposite direction (therefore, if increasing the number of threads leads to worse time, use another thread next time).

As long as the optimal number of threads is not too close to 1, you will probably end up wobbling between 3 values ​​that are all pretty close to optimal. You can clearly detect this case and block yourself in a central sense if you have a large number of generations to deal with.

+2
source

If the calculations are completely CPU related, the number of threads should be equal to the number of cores on the machine. This way you minimize the number of context switches.

If your calculations are related to I / O, network, synchronization, or something else that blocks execution, you must find the ultimate resource and measure usage. You need to monitor usage and slowly add more threads until usage approaches 100%. You should have as few threads as possible in order to saturate your limiting resource.

+2
source

You must divide your generations into many small tasks and put them in the queue. Create one thread per core and ask each thread to complete the task, run it until completion, and repeat.

You want a lot more tasks than kernels to make sure you don't finish with one task running at the end of a generation, and all other threads are idle. This is what can happen if you set #tasks = #threads = #cores as Albin suggests (unless you can guarantee that all tasks take exactly the same amount of time).

You also probably don't want more threads than cores. Context switching is not very expensive, but the larger cache that comes with more than C # clock activity at the same time can hurt you (if your tasks don't use very little memory).

+1
source

All Articles