Printing a unicode string in python regardless of environment

I am trying to find a general solution for printing unicode strings from a python script.

The requirements are that it must be executed on both python 2.7 and 3.x on any platform and with any terminal settings and environment variables (for example, LANG = C or LANG = en_US.UTF-8).

The python print function automatically tries to encode the terminal encoding when printing, but if the terminal encoding is ascii, it fails.

For example, the following works when the environment is "LANG = enUS.UTF-8":

x = u'\xea' print(x) 

But it does not work in python 2.7 when "LANG = C":

 UnicodeEncodeError: 'ascii' codec can't encode character u'\xea' in position 0: ordinal not in range(128) 

The following steps are performed regardless of LANG settings, but Unicode characters will not be displayed properly if the terminal uses a different Unicode encoding:

 print(x.encode('utf-8')) 

The desired behavior is to always show unicode in the terminal, if possible, and show some encoding if the terminal does not support unicode. For example, the output will be UTF-8 encoded if the terminal only supports ascii. Basically, the goal is to do the same thing as the python print function when it works, but in cases where the print function does not work, use some standard encoding.

+7
python encoding unicode utf-8
source share
4 answers

You can handle the case LANG=C by reporting sys.stdout by default to UTF-8 in cases where it defaults to ASCII.

 import sys, codecs if sys.stdout.encoding is None or sys.stdout.encoding == 'ANSI_X3.4-1968': utf8_writer = codecs.getwriter('UTF-8') if sys.version_info.major < 3: sys.stdout = utf8_writer(sys.stdout, errors='replace') else: sys.stdout = utf8_writer(sys.stdout.buffer, errors='replace') print(u'\N{snowman}') 

The above snippet meets your requirements: it works in Python 2.7 and 3.4, and it does not interrupt when LANG is in a setting other than UTF-8, for example C

This is not a new technique , but it is surprisingly hard to find in the documentation. As shown above, it actually takes into account settings other than UTF-8, such as ISO 8859-* . It only uses UTF-8 by default if Python will have a default fiction for ASCII, breaking the application.

+8
source share

I do not think you should try to solve this problem at the Python level. Document your application requirements, write down the locales of the systems on which you are running, so that you can include them in error reports and leave them on that.

If you want to go this route, at least distinguish between terminals and pipes; you should never output data to a terminal that the terminal cannot explicitly process; do not output UTF-8, for example, since non-printable code points> U + 007F can be interpreted as control codes during encoding.

For the channel, output UTF-8 by default and configure it.

So, you will find out if TTY is used, and then handle the encoding based on this; install an error handler for the terminal (select one of replace or backslashreplace to provide replacement characters or escape sequences for any characters that cannot be processed). Use a custom codec for the pipe.

 import codecs import os import sys if os.istty(sys.stdout.fileno): output_encoding = sys.stdout.encoding errors = 'replace' else: output_encoding = 'utf-8' # allow override from settings errors = None # perhaps parse from settings, not needed for UTF8 sys.stdout = codecs.getwriter(output_encoding)(sys.stdout, errors=errors) 
+1
source share

You can encode the string yourself using the special parameter 'backslashreplace' so that non-representable characters are converted to escape sequences. In Python 2, you can directly print the result of encode , but in Python 3 you need decode to return it to Unicode.

 import sys encoding = sys.stdout.encoding print(s.encode(encoding, 'backslashreplace').decode(encoding)) 

If sys.stdout.encoding does not pass a value that your terminal can handle, this is a separate issue that you have to deal with.

0
source share

You can handle the exception:

 def always_print(s): try: print(s) except UnicodeEncodeError: print(s.encode('utf-8')) 
-one
source share

All Articles