Binary & # 8594; UTF-8 & # 8594; string

I am trying to understand Unicode and all related things. I made a utf-8.txt file, which is obviously encoded in utf-8. It has a "Hello world!" inside. That's what I'm doing:

f = open('utf8.txt', mode = 'r', encoding = 'utf8')
f.read()

What I get is: '\ ufeffHello world!' where did the prefix come from?

Another attempt:

f = open('utf8.txt', 'rb')
byte = f.read()

print byte gives: b '\ xef \ xbb \ xbfHello world!' I assume the prefix came as hex.

byte.decode('utf8')

the above code again gives me: '\ ufeffHello world!'

What am I doing wrong? How to extract python text from utf-8 file?

Thanks for the feedback!

+4
source share
1 answer

utf-8.txt utf-8-bom, utf-8. utf-8-bom '\ uFEFF' . encoding = 'utf8' encoding = 'utf-8-sig'

f = open('utf8.txt', mode = 'r', encoding = 'utf-8-sig')
print (f.read())
+6

All Articles