I will start with a disclaimer: I do not have Python 3.5 (so I can not use the run function), and I could not reproduce your problem on Windows (Python 3.4.4) or Linux (3.1.6). Nonetheless...
Problems with subprocess.PIPE and the family
subprocess.run docs say this is just an interface to the old subprocess.Popen -and- communicate() technique. subprocess.Popen.communicate docs warn that:
Reading data is buffered in memory, so do not use this method if the data size is large or unlimited.
This seems like your problem. Unfortunately, the documents do not indicate how much data is "large", and what happens after reading "too much" data. Just "don't do it then."
The documents for subprocess.call are given a little more (my attention) ...
Do not use stdout=PIPE or stderr=PIPE with this function. The child process blocks if it generates enough output to the channel to fill the buffer of the OS buffer , because the channels are not read.
... as well as documents for subprocess.Popen.wait :
This will be inhibited when using stdout=PIPE or stderr=PIPE , and the child process generates enough output to the channel so that it blocks waiting for the OS buffer to receive more data. Use Popen.communicate() when using pipes to avoid this.
Of course, this means that Popen.communicate is the solution to this problem, but communicate your own documents say: โDo not use this method if the data size is largeโ - this is exactly the situation when the wait tag informs you should use communicate . (Maybe this "avoids that" by silently dropping data to the floor?)
Disappointingly, I see no way to use subprocess.PIPE safely if you are not sure you can read from it faster than your child process writes to it.
In this post ...
You keep all your data in memory ... twice. This may not be effective, especially if it is already in the file.
If you are allowed to use a temporary file, you can easily compare two files one line at a time. This avoids the mess of subprocess.PIPE , and it is much faster because it uses only a little RAM at a time. (The IO from your subprocess may be faster, depending on how your operating system handles output redirection.)
Again, I cannot test run , so here is a slightly older Popen -and- communicate solution (minus main and the rest of your installation):
import io import subprocess import tempfile def are_text_files_equal(file0, file1): ''' Both files must be opened in "update" mode ('+' character), so they can be rewound to their beginnings. Both files will be read until just past the first differing line, or to the end of the files if no differences were encountered. ''' file0.seek(io.SEEK_SET) file1.seek(io.SEEK_SET) for line0, line1 in zip(file0, file1): if line0 != line1: return False # Both files were identical to this point. See if either file # has more data. next0 = next(file0, '') next1 = next(file1, '') if next0 or next1: return False return True def compare_subprocess_output(exe_path, input_path): with tempfile.TemporaryFile(mode='w+t', encoding='utf8') as temp_file: with open(input_path, 'r+t') as input_file: p = subprocess.Popen( [exe_path], stdin=input_file, stdout=temp_file, # No more PIPE. stderr=subprocess.PIPE, # <sigh> universal_newlines=True, ) err = p.communicate()[1] # No need to store output. # Compare input and output files... This must be inside # the `with` block, or the TemporaryFile will close before # we can use it. if are_text_files_equal(temp_file, input_file): print('OK') else: print('Failed: ' + str(err)) return
Unfortunately, since I cannot reproduce your problem, even with a million input, I cannot say if this works. If nothing else, this should give you the wrong answers faster.
Option: regular file
If you want to save the output of your test run to foo.txt (from the example on the command line), you must redirect the output of your subprocess to a regular file instead of TemporaryFile . This solution is recommended in JF Sebastian's answer .
I canโt say for your question if you want foo.txt , or if it is just a side effect of a two-step test, and then diff - your example in the command line saves the test result for a file, and your Python script does not. Saving the output would be convenient if you ever want to investigate a test failure, but for this you need to create a unique file name for each test you run so that they do not overwrite each other.