Using click.progressbar with multiprocessing in Python

I have a huge list that I need to process, which takes some time, so I divide it into 4 parts and a multiprocessor each piece with some function. It still takes a little time to work with 4 cores, so I decided that I would add some progress indicator so that it could tell me where each processor is located when processing the list.

My dream was to have something like this:

erasing close atoms, cpu0 [######..............................] 13% erasing close atoms, cpu1 [#######.............................] 15% erasing close atoms, cpu2 [######..............................] 13% erasing close atoms, cpu3 [######..............................] 14% 

in this case, each rod moves as the cycle progresses in the function. But instead, I get a continuous stream:

enter image description here

etc., filling out my terminal window.

Here is the main python script that calls the function:

 from eraseCloseAtoms import * from readPDB import * import multiprocessing as mp from vectorCalc import * prot, cell = readPDB('file') atoms = vectorCalc(cell) output = mp.Queue() # setup mp to erase grid atoms that are too close to the protein (dmin = 2.5A) cpuNum = 4 tasks = len(atoms) rangeSet = [tasks / cpuNum for i in range(cpuNum)] for i in range(tasks % cpuNum): rangeSet[i] += 1 rangeSet = np.array(rangeSet) processes = [] for c in range(cpuNum): na, nb = (int(np.sum(rangeSet[:c] + 1)), int(np.sum(rangeSet[:c + 1]))) processes.append(mp.Process(target=eraseCloseAtoms, args=(prot, atoms[na:nb], cell, 2.7, 2.5, output))) for p in processes: p.start() results = [output.get() for p in processes] for p in processes: p.join() atomsNew = results[0] + results[1] + results[2] + results[3] 

The following is the eraseCloseAtoms() function:

 import numpy as np import click def eraseCloseAtoms(protein, atoms, cell, spacing=2, dmin=1.4, output=None): print 'just need to erase close atoms' if dmin > spacing: print 'the spacing needs to be larger than dmin' return grid = [int(cell[0] / spacing), int(cell[1] / spacing), int(cell[2] / spacing)] selected = list(atoms) with click.progressbar(length=len(atoms), label='erasing close atoms') as bar: for i, atom in enumerate(atoms): bar.update(i) erased = False coord = np.array(atom[6]) for ix in [-1, 0, 1]: if erased: break for iy in [-1, 0, 1]: if erased: break for iz in [-1, 0, 1]: if erased: break for j in protein: protCoord = np.array(protein[int(j)][6]) trueDist = getMinDist(protCoord, coord, cell, vectors) if trueDist <= dmin: selected.remove(atom) erased = True break if output is None: return selected else: output.put(selected) 
+6
source share
4 answers

I see two problems in your code.

The first explains why your progress indicators often show 100% and not their real progress. You call bar.update(i) which bar.update(i) progresses the panel at step i when I think you want to update one step. A better approach would be to pass the iterable to the progressbar function and let it update automatically:

 with click.progressbar(atoms, label='erasing close atoms') as bar: for atom in bar: erased = False coord = np.array(atom[6]) # ... 

However, this still will not work with several processes iterating at the same time, each of which will have its own progress bar due to the second problem with your code. The documentation for click.progressbar sets the following restriction:

Printing should not be performed, otherwise the progress bar will be inadvertently destroyed.

This means that whenever one of your progress indicators updates itself, it breaks all other active progress indicators.

I do not think this is easy to fix. It is very difficult to interactively update multi-line console output (basically you need to use curses or a similar library "GUI console" with support for your OS). The click module does not have this feature; it can only update the current line. Probably your best hope is to expand your click.progressbar design to display multiple columns in columns, for example:

 CPU1: [###### ] 52% CPU2: [### ] 30% CPU3: [######## ] 84% 

This will require a non-trivial amount of code to make it work (especially when updates come from several processes), but this is not entirely impractical.

+5
source

the accepted answer says that it is impossible to click, and this will require "a non-trivial amount of code to make it work."

While this is true, there is another module with this functionality out of the box: tqdm https://github.com/tqdm/tqdm that does what you need.

You can do nested execution lines in documents https://github.com/tqdm/tqdm#nested-progress-bars etc.

+4
source

This may not match your dream, but you can use imap_unordered with click.progressbar to integrate with multiprocessing.

 import multiprocessing as mp import click import time def proc(arg): time.sleep(arg) return True def main(): p = mp.Pool(4) args = range(4) results = p.imap_unordered(proc, args) with click.progressbar(results, length=len(args)) as bar: for result in bar: pass if __name__ == '__main__: main() 
0
source

Something like this will work if everything is fine with one progress bar:

 import click import threading import numpy as np reallybiglist = [] numthreads = 4 def myfunc(listportion, bar): for item in listportion: # do a thing bar.update(1) with click.progressbar(length=len(reallybiglist), show_pos=True) as bar: threads = [] for listportion in np.split(reallybiglist, numthreads): thread = threading.Thread(target=myfunc, args=(listportion, bar)) thread.start() threads.append(thread) for thread in threads: thread.join() 
0
source

All Articles