Python subprocess with gzip

I am trying to transfer data through a subprocess, gzip it and write to a file. The following work. I wonder if you can use your own python gzip library instead.

fid = gzip.open(self.ipFile, 'rb') # input data oFid = open(filtSortFile, 'wb') # output file sort = subprocess.Popen(args="sort | gzip -c ", shell=True, stdin=subprocess.PIPE, stdout=oFid) # set up the pipe processlines(fid, sort.stdin, filtFid) # pump data into the pipe 

QUESTION: How do I do this? .. where is the gzip python package used? I'm mostly interested in finding out why the following gives me text files (instead of a compressed binary version) ... very strange.

 fid = gzip.open(self.ipFile, 'rb') oFid = gzip.open(filtSortFile, 'wb') sort = subprocess.Popen(args="sort ", shell=True, stdin=subprocess.PIPE, stdout=oFid) processlines(fid, sort.stdin, filtFid) 
+4
source share
2 answers

subprocess written to oFid.fileno() , but gzip returns fd of the main file object :

 def fileno(self): """Invoke the underlying file object fileno() method.""" return self.fileobj.fileno() 

To enable compression, use gzip methods directly:

 import gzip from subprocess import Popen, PIPE from threading import Thread def f(input, output): for line in iter(input.readline, ''): output.write(line) p = Popen(["sort"], bufsize=-1, stdin=PIPE, stdout=PIPE) Thread(target=f, args=(p.stdout, gzip.open('out.gz', 'wb'))).start() for s in "cafebabe": p.stdin.write(s+"\n") p.stdin.close() 

Example

 $ python gzip_subprocess.py && od -c out.gz && zcat out.gz 0000000 037 213 \b \b 251 E t N 002 377 out \0 K 344 0000020 J 344 J 002 302 d 256 TL 343 002 \0 j 017 j 0000040 k 020 \0 \0 \0 0000045 a a b b c e e f 
+4
source

Since you simply specify a file descriptor to execute the process you are running, there are no additional objects in the object. To get around this, you can write your output in a pipe and read from it like this:

 oFid = gzip.open(filtSortFile, 'wb') sort = subprocess.Popen(args="sort ", shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE) oFid.writelines(sort.stdout) oFid.close() 
+2
source

All Articles