The most efficient way to use multiprocessing is to split the work into n pieces of equal size with n pool sizes, which should be approximately equal to the number of cores in your system. The reason for this is that the work of the initial subprocesses and the relationship between them are quite large. If the work size is small compared to the number of work pieces, then the IPC overhead becomes significant.
In your case, you ask multiprocessing to process each stroke separately. The best way to deal with this problem is to give each employee a range of values ββ(possibly only a start and end value) and return all primes in the found range.
In the case of identifying large numbers, the work done grows with the starting value, and therefore you probably do not want to divide the full range into exactly n pieces, but rather n * k equal pieces, and k is some reasonable, small number, say 10 - 100. Thus, when some workers finish before others, there is one more work that can be done, and it can be effectively balanced for all workers.
Edit: here is an improved example to show how it would look. I have changed as little as possible so that you can compare apples with apples.
#!/user/bin/python.exe import math from multiprocessing import Pool global primes primes = set() def log(result): global primes if result: # since the result is a batch of primes, we have to use # update instead of add (or for a list, extend instead of append) primes.update(result) def isPrime( n ): if n < 2: return False if n == 2: return True, n max = int(math.ceil(math.sqrt(n))) i = 2 while i <= max: if n % i == 0: return False i += 1 return True, n def isPrimeWorker(start, stop): """ find a batch of primes """ primes = set() for i in xrange(start, stop): if isPrime(i): primes.add(i) return primes def main(): global primes pool = Pool() # pick an arbitrary chunk size, this will give us 100 different # chunks, but another value might be optimal step = 10000 # use xrange instead of range, we don't actually need a list, just # the values in that range. for i in xrange(1000000, 2000000, step): # call the *worker* function with start and stop values. pool.apply_async(isPrimeWorker,(i, i+step,), callback = log) pool.close() pool.join() print sum(primes) return if __name__ == "__main__": main()
SingleNegationElimination
source share