Python - unzip .gz files in parallel

I have several .gz files that add up to 1 TB. How can I use Python 2.7 to decompress these files in parallel? looping in files takes too much time.

I also tried this code:

filenames = [gz for gz in glob.glob(filesFolder + '*.gz')]

def uncompress(path):
    with gzip.open(path, 'rb') as src, open(path.rstrip('.gz'), 'wb') as dest:
        shutil.copyfileobj(src, dest)

with multiprocessing.Pool() as pool:
    for _ in pool.imap_unordered(uncompress, filenames, chunksize=1):
        pass

However, I get the following error:

  with multiprocessing.Pool() as pool:

AttributeError: __exit__

Thank!

+4
source share
1 answer

To use the construct with, the object used internally must have __enter__and methods __exit__. The error says that the class Pool(or instance) does not have this data, so you cannot use it in an instruction with. Try this (just remove the with statement):

import glob, multiprocessing, shutil

filenames = [gz for gz in glob.glob('.' + '*.gz')]

def uncompress(path):
    with gzip.open(path, 'rb') as src, open(path.rstrip('.gz'), 'wb') as dest:
        shutil.copyfileobj(src, dest)


for _ in multiprocessing.Pool().imap_unordered(uncompress, filenames, chunksize=1):
    pass

EDIT

@dhke, ( ) gz , ( ) ().

0

All Articles