Python3 UnicodeDecodeError with readlines () method

Trying to create a tweeter bot that reads lines and publishes them. Using Python3 and tweepy, via virtualenv in my shared server space. This is part of the code that seems to have problems:

#!/foo/env/bin/python3 import re import tweepy, time, sys argfile = str(sys.argv[1]) filename=open(argfile, 'r') f=filename.readlines() filename.close() 

this is the error i get:

 UnicodeDecodeError: 'ascii' codec can't decode byte 0xfe in position 0: ordinal not in range(128) 

The error points to f=filename.readlines() as the source of the error. Any idea what could be wrong? Thanks.

+6
source share
3 answers

I think the best answer (in Python 3) is to use the errors= parameter:

 with open('evil_unicode.txt', 'r', errors='replace') as f: lines = f.readlines() 

Evidence:

 >>> s = b'\xe5abc\nline2\nline3' >>> with open('evil_unicode.txt','wb') as f: ... f.write(s) ... 16 >>> with open('evil_unicode.txt', 'r') as f: ... lines = f.readlines() ... Traceback (most recent call last): File "<stdin>", line 2, in <module> File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/codecs.py", line 319, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 0: invalid continuation byte >>> with open('evil_unicode.txt', 'r', errors='replace') as f: ... lines = f.readlines() ... >>> lines [' abc\n', 'line2\n', 'line3'] >>> 

Note that errors= can be replace or ignore . Here is what ignore looks like:

 >>> with open('evil_unicode.txt', 'r', errors='ignore') as f: ... lines = f.readlines() ... >>> lines ['abc\n', 'line2\n', 'line3'] 
+10
source

Your default encoding looks like ASCII, where input is more than likely UTF-8. When you click on non-ASCII bytes in the input, it throws an exception. It is not that readlines itself is responsible for the problem; rather, it causes read + decoding, and decoding does not work.

This is easy to fix; By default, open in Python 3 allows you to provide known input encoding , replacing the default (ASCII in your case) with any other recognized encoding. Providing this option allows you to read as str (rather than significantly different bytes original binary data), allowing Python to convert from raw disk bytes to true text data:

 # Using with statement closes the file for us without needing to remember to close # explicitly, and closes even when exceptions occur with open(argfile, encoding='utf-8') as inf: f = inf.readlines() 
+5
source

Finished the search for a working answer for myself:

 filename=open(argfile, 'rb') 

This post really helped me.

+1
source

All Articles