How to collect output from a Python subprocess

I am trying to make a python process that reads some data, processes it and outputs the result. Processing is performed by a subprocess (Stanford NER), for illustration I will use "cat". I don’t know exactly how much NER will produce, so I use a separate thread to collect all of it and print it. The following example illustrates.

import sys import threading import subprocess # start my subprocess cat = subprocess.Popen( ['cat'], shell=False, stdout=subprocess.PIPE, stdin=subprocess.PIPE, stderr=None) def subproc_cat(): """ Reads the subprocess output and prints out """ while True: line = cat.stdout.readline() if not line: break print("CAT PROC: %s" % line.decode('UTF-8')) # a daemon that runs the above function th = threading.Thread(target=subproc_cat) th.setDaemon(True) th.start() # the main thread reads from stdin and feeds the subprocess while True: line = sys.stdin.readline() print("MAIN PROC: %s" % line) if not line: break cat.stdin.write(bytes(line.strip() + "\n", 'UTF-8')) cat.stdin.flush() 

This seems to work well when I enter text from the keyboard. However, if I try to enter input into my script (cat file.txt | python3 my_script.py), a racing state occurs. Sometimes I get the right exit, sometimes not, sometimes it gets blocked. Any help would be appreciated!

I am running Ubuntu 14.04, python 3.4.0. The solution must be platform independent.

+7
python subprocess python-multithreading stanford-nlp
source share
1 answer

Add th.join() to the end, otherwise you can prematurely kill the thread before it processes all the output when the main thread exits: daemon threads are not saved in the main thread (or delete th.setDaemon(True) instead of th.join() ) ..

+2
source share

All Articles