The problem is that when reading from stdin, python decodes it uses the default system encoding:
>>> import sys >>> sys.getdefaultencoding() 'ascii'
Input is very likely UTF-8 or Windows-CP-1252, so the program chokes on non-ASCII characters.
To convert sys.stdin to a stream with an appropriate decoder, I used:
import codecs char_stream = codecs.getreader("utf-8")(sys.stdin)
This fixed the problem.
By the way, this ANTLRs method FileStream uses to open a file with a given file name (instead of a given stream):
fp = codecs.open(fileName, 'rb', encoding) try: data = fp.read() finally: fp.close()
BTW # 2: for strings I found
a_string.encode(encoding)
useful.
hansfbaier
source share