Convert Unicode to UTF-8 Python

Question

Convert Unicode to UTF-8 Python

I work with a database that has scattered characters throughout as follows: "I need to take this from the database, convert it to UTF-8, and then import it into another database using python. When printing on the Windows command line, these characters look like this: \ xe2 \ u20ac \ u2122. I tried using various combinations of .decode () ,. encode () and unicode () to convert the data, but Im really stuck.

+4

python database unicode utf-8

Jonathan lerner Jul 19 '11 at 0:02

source share

1 answer

Gareth rees · Accepted Answer · 2011-07-19T01:48:21+0000

Always decode at the input and encode at the output. (To do this, it should be useful to me: perhaps, “remove your code [coat] when you arrive indoors.”)

Input decoding: you say the encoding of the database is "UTF_8_bin". Do you use MySQL-Python ? If so, you can set the use_unicode parameter when connecting to the database. Then all the rows are retrieved from the Unicode database, so you don’t have to worry about decrypting them.

Encode on output: you can find out the current character encoding (or code page "as they call it on Windows) using chcp . Suppose code page 1252. Then you can write

 print text.encode('windows-1252')

to create something you can read from the windows command prompt.

If you write rows back to another MySQL database using MySQL-Python, you don’t need to do anything special: MySQL-Python states that you can always write Unicode strings (regardless of whether you use_unicode when you opened compound).

Convert Unicode to UTF-8 Python

More articles: