String In python with my unicode?

Question

String In python with my unicode?

Python 3.2 (r32:88445, Feb 20 2011, 21:29:02) [MSC v.1500 32 bit (Intel)] on win32 Type "copyright", "credits" or "license()" for more information. >>> str_version = 'នយោបាយ' >>> type(str_version) <class 'str'> >>> print (str_version) នយោបាយ >>> unicode_version = 'នយោបាយ'.decode('utf-8') Traceback (most recent call last): File "<pyshell#3>", line 1, in <module> unicode_version = 'នយោបាយ'.decode('utf-8') AttributeError: 'str' object has no attribute 'decode' >>>

What is the problem with my unicode string?

+8

python python-3.x unicode

kn3l Mar 26 '11 at 20:57

source share

3 answers

You are reading 2.x documents. str.decode() (and bytes.encode() ) has been reset to 3.x. And str already a Unicode string; no need to decrypt it.

+7

Ignacio Vazquez-Abrams Mar 26 '11 at 21:05

source share

You already have a string in Unicode. In Python 3, str are unicode strings ( unicode in Python 2.x), and single-byte strings (Python 2.x str ) are no longer treated as text, they are now called bytes . The latter can be converted to str with its decode method, but the former is already decoded - it can only be encoded back into bytes .

+3

delnan Mar 26 '11 at 21:11

source share

Brandon rhodes · Accepted Answer · 2011-03-26T21:03:26+0000

There is nothing wrong with your line! You just confuse encode() and decode() . String - significant characters. To convert it to bytes that can be stored in a file or transmitted over the Internet, use encode() with an encoding of type UTF-8. Each encoding is a scheme for converting significant characters to flat output bytes.

When the time comes to do the opposite - take some raw bytes from a file or socket and turn them into characters, such as letters and numbers - you will decode the bytes using the tttstring decode() method in Python 3.

 >>> str_version = 'នយោបាយ' >>> str_version.encode('utf-8') b'\xe1\x9e\x93\xe1\x9e\x99\xe1\x9f\x84\xe1\x9e\x94\xe1\x9e\xb6\xe1\x9e\x99'

See what a big long string of bytes? These are the bytes that UTF-8 uses to represent your string if you need to transfer the string over the network or save them in a document. There are many other encodings, but they seem to be the most popular. Each encoding can turn significant characters, such as ន and យោ, into bytes - the small 8-bit numbers with which computers communicate.

 >>> rawbytes = str_version.encode('utf-8') >>> rawbytes b'\xe1\x9e\x93\xe1\x9e\x99\xe1\x9f\x84\xe1\x9e\x94\xe1\x9e\xb6\xe1\x9e\x99' >>> rawbytes.decode('utf-8') 'នយោបាយ'

String In python with my unicode?

More articles: