Python: UnicodeEncodeError when reading from stdin

When I run a Python program that reads from stdin, I get the following error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 320: ordinal not in range(128) 

How can i fix this?

Note. The error occurs inside antlr, and the line looks like this:

  self.strdata = unicode(data) 

Since I do not want to change the source code, I would like to convey something acceptable.

The input code is as follows:

 #!/usr/bin/python import sys import codecs import antlr3 import antlr3.tree from LatexLexer import LatexLexer from LatexParser import LatexParser char_stream = antlr3.ANTLRInputStream(codecs.getreader("utf8")(sys.stdin)) lexer = LatexLexer(char_stream) tokens = antlr3.CommonTokenStream(lexer) parser = LatexParser(tokens) r = parser.document() 
+6
python unicode stdin antlr
source share
3 answers

The problem is that when reading from stdin, python decodes it uses the default system encoding:

 >>> import sys >>> sys.getdefaultencoding() 'ascii' 

Input is very likely UTF-8 or Windows-CP-1252, so the program chokes on non-ASCII characters.

To convert sys.stdin to a stream with an appropriate decoder, I used:

 import codecs char_stream = codecs.getreader("utf-8")(sys.stdin) 

This fixed the problem.

By the way, this ANTLRs method FileStream uses to open a file with a given file name (instead of a given stream):

  fp = codecs.open(fileName, 'rb', encoding) try: data = fp.read() finally: fp.close() 

BTW # 2: for strings I found

 a_string.encode(encoding) 

useful.

+14
source share

You do not get this error at the input, you get this error when trying to display the read data. You should decode the data you are reading and throw unicode, and not process bytes all the time.

+1
source share

Here's a great tip on how Python handles encodings:

How to use UTF-8 with Python

+1
source share

All Articles