I am trying to understand Unicode and all related things. I made a utf-8.txt file, which is obviously encoded in utf-8. It has a "Hello world!" inside. That's what I'm doing:
f = open('utf8.txt', mode = 'r', encoding = 'utf8')
f.read()
What I get is: '\ ufeffHello world!' where did the prefix come from?
Another attempt:
f = open('utf8.txt', 'rb')
byte = f.read()
print byte gives: b '\ xef \ xbb \ xbfHello world!' I assume the prefix came as hex.
byte.decode('utf8')
the above code again gives me: '\ ufeffHello world!'
What am I doing wrong? How to extract python text from utf-8 file?
Thanks for the feedback!
source
share