I want bzip2 to quickly compress a few hundred gigabytes of data using an 8-core workstation with a working capacity of 16 GB. I am currently using a simple python script to compress an entire directory tree using bzip2 and the os.system call associated with the os.walk call.
I see that bzip2 uses only one processor, while the other cpus remain relatively inactive.
I am new to processes with queue and threads. But I am wondering how I can implement this so that I have four bzip2 streams (actually I think os.system streams), each of which is probably cpu, which drain files from the queue, since they bzip them.
Here is my single thread script added.
import os import sys for roots, dirlist , filelist in os.walk(os.curdir): for file in [os.path.join(roots,filegot) for filegot in filelist]: if "bz2" not in file: print "Compressing %s" % (file) os.system("bzip2 %s" % file) print ":DONE"
source share