Working in different directories (os.chdir) at the same time (parallel thread)

I want to synchronize all my vcs directories in parallel. I go to the directory and run special command line scripts to synchronize git or mercury repositories. This is a slow process, so I want to try to make it parallel.

But there are problems that my parallel threads are facing for the "current directory", so I need some trick to work in different directories at the same time.

Current solution:

def syncrepos(repos): for r in repos.split("\n"): if r: print("------ repository: ", r) thrd = ThreadingSync(r) thrd.setDaemon(True) thrd.start() 

where is ThreadingSync -

 class ThreadingSync(threading.Thread): def __init__(self, repo): threading.Thread.__init__(self) self.repo = repo def run(self): r = self.repo.split("-t") path = (r[0]).strip() if len(r) < 2: vcs = VCS.git else: vcs = { 'git' : VCS.git, 'git git' : VCS.git_git, 'git hg' : VCS.git_mercurial, 'git svn' : VCS.git_subversion, 'git vv' : VCS.git_veracity, 'hg hg' : VCS.hg_hg}[(r[1]).strip()] os.chdir(path) if vcs == VCS.git: checkGitModifications() gitSync() ... etc 

and gitSync is

 def gitSync(): pretty(cmd("git pull origin master")) pretty(cmd("git fetch upstream master")) pretty(cmd("git pull --rebase upstream master")) pretty(cmd("git push -f origin master")) 

Of course, this is not ideal, but this is my job, and I want to speed it up.

How to create one subprocess for each repository / directory (secure implementation of os.chdir in Thrad)?

+4
source share
1 answer

Create a pool of workers to run your routine:

http://docs.python.org/2/library/multiprocessing.html#using-a-pool-of-workers

In your case, maybe something like:

 from multiprocessing import Pool import os def gitSync(repo): print "I am", repo, "and my cwd is:", os.getcwd() os.chdir(repo) print "I am", repo, "and my cwd is:", os.getcwd() if __name__ == '__main__': dir = os.getcwd() repos = [item for item in os.listdir(dir) if os.path.isdir(os.path.join(dir, item))] print repos pool = Pool(maxtasksperchild=1) pool.map(gitSync, repos) pool.close() pool.join() 

Please note that the pool can debug the debugging a bit, as the parent usually doesn’t show much more than one of my children who died, so start it with a single thread first.

Edit: Well, that was interesting to evaluate - note the new argument to the pool maxtasksperchild=1 . This process is not rebooted between calls, so if you change the directory in one call, you are still in that directory when the process is reused. Here I solved it simply by telling the pool to kill processes after each call.

 john:captcrunch john$ python foo.py ['.git', '.idea', 'fixtures', 'lib', 'obj', 'raw', 'tests'] I am .git and my cwd is: /Users/john/code/linz/src/captcrunch I am .git and my cwd is: /Users/john/code/linz/src/captcrunch/.git I am .idea and my cwd is: /Users/john/code/linz/src/captcrunch I am .idea and my cwd is: /Users/john/code/linz/src/captcrunch/.idea I am fixtures and my cwd is: /Users/john/code/linz/src/captcrunch I am fixtures and my cwd is: /Users/john/code/linz/src/captcrunch/fixtures I am lib and my cwd is: /Users/john/code/linz/src/captcrunch I am lib and my cwd is: /Users/john/code/linz/src/captcrunch/lib I am obj and my cwd is: /Users/john/code/linz/src/captcrunch I am obj and my cwd is: /Users/john/code/linz/src/captcrunch/obj I am raw and my cwd is: /Users/john/code/linz/src/captcrunch I am raw and my cwd is: /Users/john/code/linz/src/captcrunch/raw I am tests and my cwd is: /Users/john/code/linz/src/captcrunch I am tests and my cwd is: /Users/john/code/linz/src/captcrunch/tests 
+5
source

All Articles