UnicodeDecodeError while reading a text file

I am starting to use Python (I am using 3.4). This is an important part of my code.

fileObject = open("countable nouns raw.txt", "rt")
bigString = fileObject.read()
fileObject.close()

Whenever I try to read this file, I get:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 82273: character maps to <undefined>

I read, and it seems to be due to my default encoding not matching the encoding of the text file. I read in another post that you can use this method to read a file with a specific encoding:

import codecs
f = codecs.open("file.txt", "r", "utf-8")

But you must know this in advance. The thing is, I don’t know how the text file is encoded. A few tips suggested using Chardet. I installed it, but I have no idea how to get it to read a text file.

Any ideas on how to get around this?

+4
source share
4 answers

codecs.open(); Python 2.

Python 3 open() encoding:

fileObject = open("countable nouns raw.txt", "rt", encoding='utf8')

, , , . , Python ; , , , , , .

+1

Python open, file.

foo.txt,

ÙÚÛÜ

$ file foo.txt 
foo.txt: UTF-8 Unicode text
$ wc foo.txt
1 1 9 foo.txt

, wc, , , .

0

. , , pip. :

import chardet
import requests
content = requests.get("http://yahoo.co.jp/").content
detect = chardet.detect(content)
print(detect)

, . , 100% , , , . :

open('file.txt', encoding=detect['encoding'])
0

All Articles