Python standard idiom for setting sys.stdout buffer to zero does not work with Unicode

Question

Python standard idiom for setting sys.stdout buffer to zero does not work with Unicode

When I write sysadmin scripts in Python, the buffer on sys.stdout, which affects every print () call, is annoying because I don't want to wait for the buffer to be flushed, and then get a large chunk of lines right on the screen, instead I I want to get separate lines of output as soon as the new output script is created. I don’t even want to wait for new lines, so I look at the output.

A commonly used idiom for this in python is

import os import sys sys.stdout = os.fdopen(sys.stdout.fileno(), 'wb', 0)

This has worked great for me for a long time. Now I noticed that it does not work with Unicode. See the following script:

 #!/usr/bin/python # -*- coding: utf-8 -*- from __future__ import print_function, unicode_literals import os import sys print('Original encoding: {}'.format(sys.stdout.encoding)) sys.stdout = os.fdopen(sys.stdout.fileno(), 'wb', 0) print('New encoding: {}'.format(sys.stdout.encoding)) text = b'Eisb\xe4r' print(type(text)) print(text) text = text.decode('latin-1') print(type(text)) print(text)

This leads to the following conclusion:

 Original encoding: UTF-8 New encoding: None <type 'str'> Eisb▒r <type 'unicode'> Traceback (most recent call last): File "./export_debug.py", line 18, in <module> print(text) UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 4: ordinal not in range(128)

It took me a few hours to figure out the reason for this (my original script was much longer than this minimal debugging script). This is a string

 sys.stdout = os.fdopen(sys.stdout.fileno(), 'wb', 0)

which I used for years, so I did not expect any problems with it. Just comment out this line and the correct output should look like this:

 Original encoding: UTF-8 New encoding: UTF-8 <type 'str'> Eisb▒r <type 'unicode'> Eisbär

So what is the script to do? To prepare Python 2.7 code as close to Python 3.x as possible, I always use

 from __future__ import print_function, unicode_literals

which forces python to use the new print () function, but more important: it forces Python to store all strings as Unicode by default. I have a lot of Latin-1 / ISO-8859-1 encoded data, e.g.

 text = b'Eisb\xe4r'

To work with it as intended, I need to first decode it in Unicode, which

 text = text.decode('latin-1')

for. Since the default encoding is UTF-8 on my system, whenever I print a string, python then encodes the Unicode inner string in UTF-8. But first, it should be in perfect Unicode inside.

Now that everything is working fine at all, just not with an output buffer with a zero byte. Any ideas? I noticed that sys.stdout.encoding is disabled after zero buffering, but I don't know how to set it again. This is a read-only attribute, and the LC_ALL or LC_CTYPE OS environment variables are apparently evaluated only at the beginning of the python interpreter.

Btw: "Icebar" is the German word for "polar bear."

+6

python unicode stdout buffer

Marten lehmann Oct 10 '12 at 17:32

source share

1 answer

Martijn pieters · Answer 1 · 2012-10-10T20:02:12+0000

The print function uses a special flag when writing to a file object, forcing the Python C PyFile_WriteObject API function to extract the output encoding to convert Unicode to bytes and replacing stdout you lost the encoding. Unfortunately, you cannot explicitly set it again:

 encoding = sys.stdout.encoding sys.stdout = os.fdopen(sys.stdout.fileno(), 'wb', 0) sys.stdout.encoding = encoding # Raises a TypeError; readonly attribute

You also cannot use the io.open function , since it does not allow disabling buffering, if you want to be able to use the encoding parameter that you need.

The correct way to immediately reset the print function is to use the keyword flush=True :

 print(something, flush=True)

If this is too tedious to add everywhere, consider using a special print function:

 def print(*args, **kw): flush = kw.pop('flush', True) # Python 2.7 doesn't support the flush keyword.. __builtins__.print(*args, **kw) if flush: sys.stdout.flush()

Since the Python 2.7 print() function does not actually support the flush (botheration) keyword, you can simulate this by adding an explicit flash instead of this custom version.

Python standard idiom for setting sys.stdout buffer to zero does not work with Unicode

More articles: