Python multithreading / multithreading is used for simultaneous file copy operation

I am writing python code to copy a bunch of large files / folders from one place to another place on the desktop (no network, everything is local). For this, I use the shutil module.

But the problem is that it takes more time, so I want to speed up this copying process. I tried using streaming and multiprocessing modules. But, to my surprise, they take longer than serial code.

Another observation is that the required time increases with the number of processes for the same number of folders. I mean, suppose I have a directory structure

/a/a1, /a/a2, /b/b1 and /b/b2 

If I create 2 processes for copying folders a and b, the time taken is 2 minutes. Now, if I create 4 processes for copying folders a / a1, a / a2, b / b1 and b / b2, it will take about 4 minutes. I tried this only with multiprocessing, not sure about streaming usage.

I'm not sure what is going on. Would anyone have a similar problem? Are there any recommendations you can use for multiprocessing / threading in python?

Thanks Abhijit

+4
source share
1 answer

Your problem is probably related to IO binding, so more computational parallelization will not help. As you saw, you are likely to exacerbate the problem by requiring IO requests to jump between different threads / processes. There are practical limitations to how quickly you can move data on your computer, especially where the drive is involved.

+8
source

All Articles