Unfortunately, there is no simple answer. The only way to know for sure is to implement and then profile your application.
Generally, for maximum throughput, if the jobs are a pure processor, you need one per core. Depending on the type of work, this will include one per hypertext code or only one "true physical core". (If the work is identical for all 20 tasks, then hyper-threading often slows down the overall work ...)
If tasks have any non-processor functionality (for example, reading a file, waiting for something, etc.), then> 1 work item per core tends to be much better. For many situations, this will improve.
source share