Writing a lot of data to stdin

I write a lot of data for stdin.

How can I make sure it is not blocking?

p=subprocess.Popen([path],stdout=subprocess.PIPE,stdin=subprocess.PIPE) p.stdin.write('A very very very large amount of data') p.stdin.flush() output = p.stdout.readline() 

It seems to be hanging on p.stdin.write() after I read a large line and write to it.

I have a large array of files that will be written sequentially to stdin (> 1k files)

So what happens is that I run the loop

 #this loop is repeated for all the files for stri in lines: p=subprocess.Popen([path],stdout=subprocess.PIPE,stdin=subprocess.PIPE) p.stdin.write(stri) output = p.stdout.readline() #do some processing 

It somehow hangs in the file no. 400. A file is a large file with long lines.

I suspect this is a lock problem.

This only happens if I iterate from 0 to 1000. However, if I had to start with a 400 file, an error would not occur

0
python subprocess
source share
2 answers

To avoid deadlock in a portable way, write to the child in a separate thread:

 #!/usr/bin/env python from subprocess import Popen, PIPE from threading import Thread def pump_input(pipe, lines): with pipe: for line in lines: pipe.write(line) p = Popen(path, stdin=PIPE, stdout=PIPE, bufsize=1) Thread(target=pump_input, args=[p.stdin, lines]).start() with p.stdout: for line in iter(p.stdout.readline, b''): # read output print line, p.wait() 

See Python: read streaming input from subprocess.communicate ()

+2
source share

You may need to use Popen.communicate() .

If you write a large amount of data to stdin, and during this the child process generates output to stdout, then there may be a problem when the stdout buffer for the child becomes full before all your stdin data is processed. The child process blocks writing to stdout (because you are not reading it), and you block writing to stdin.

Popen.communicate() can be used to write stdin and read stdout / stderr at the same time to avoid the previous problem.

Note: Popen.communicate() is only suitable when the input and output can fit into your memory (they are not too large).

Update: If you decide to hack threads, here is an example implementation of the parent and child processes, which you can adapt to suit your needs:

parent.py:

 #!/usr/bin/env python2 import os import sys import subprocess import threading import Queue class MyStreamingSubprocess(object): def __init__(self, *argv): self.process = subprocess.Popen(argv, stdin=subprocess.PIPE, stdout=subprocess.PIPE) self.stdin_queue = Queue.Queue() self.stdout_queue = Queue.Queue() self.stdin_thread = threading.Thread(target=self._stdin_writer_thread) self.stdout_thread = threading.Thread(target=self._stdout_reader_thread) self.stdin_thread.start() self.stdout_thread.start() def process_item(self, item): self.stdin_queue.put(item) return self.stdout_queue.get() def terminate(self): self.stdin_queue.put(None) self.process.terminate() self.stdin_thread.join() self.stdout_thread.join() return self.process.wait() def _stdin_writer_thread(self): while 1: item = self.stdin_queue.get() if item is None: # signaling the child process that the end of the # input has been reached: some console progs handle # the case when reading from stdin returns empty string self.process.stdin.close() break try: self.process.stdin.write(item) except IOError: # making sure that the current self.process_item() # call doesn't deadlock self.stdout_queue.put(None) break def _stdout_reader_thread(self): while 1: try: output = self.process.stdout.readline() except IOError: output = None self.stdout_queue.put(output) # output is empty string if the process has # finished or None if an IOError occurred if not output: break if __name__ == '__main__': child_script_path = os.path.join(os.path.dirname(__file__), 'child.py') process = MyStreamingSubprocess(sys.executable, '-u', child_script_path) try: while 1: item = raw_input('Enter an item to process (leave empty and press ENTER to exit): ') if not item: break result = process.process_item(item + '\n') if result: print('Result: ' + result) else: print('Error processing item! Exiting.') break finally: print('Terminating child process...') process.terminate() print('Finished.') 

child.py:

 #!/usr/bin/env python2 import sys while 1: item = sys.stdin.readline() sys.stdout.write('Processed: ' + item) 

Note: IOError handled in read / write streams to handle cases when a child process terminates / terminates / kills.

+1
source share

All Articles