There is nothing wrong with your line! You just confuse encode() and decode() . String - significant characters. To convert it to bytes that can be stored in a file or transmitted over the Internet, use encode() with an encoding of type UTF-8. Each encoding is a scheme for converting significant characters to flat output bytes.
When the time comes to do the opposite - take some raw bytes from a file or socket and turn them into characters, such as letters and numbers - you will decode the bytes using the tttstring decode() method in Python 3.
>>> str_version = 'នយោបាយ' >>> str_version.encode('utf-8') b'\xe1\x9e\x93\xe1\x9e\x99\xe1\x9f\x84\xe1\x9e\x94\xe1\x9e\xb6\xe1\x9e\x99'
See what a big long string of bytes? These are the bytes that UTF-8 uses to represent your string if you need to transfer the string over the network or save them in a document. There are many other encodings, but they seem to be the most popular. Each encoding can turn significant characters, such as ន and យោ, into bytes - the small 8-bit numbers with which computers communicate.
>>> rawbytes = str_version.encode('utf-8') >>> rawbytes b'\xe1\x9e\x93\xe1\x9e\x99\xe1\x9f\x84\xe1\x9e\x94\xe1\x9e\xb6\xe1\x9e\x99' >>> rawbytes.decode('utf-8') 'នយោបាយ'
Brandon rhodes
source share