How to code (utf8mb4) in Python

How do I encode something in ut8mb4 in Python?

I have two data sets: the data that I transfer to my new MySQL database from Parse, and the data goes forward (this only speaks of my new database). My database is utf8mb4 to store emoji and accented letters.

The first data set is only displayed correctly (when emojis and accents are involved) when I have a script in my python:

MySQLdb.escape_string(unicode(xstr(data.get('message'))).encode('utf-8')) 

and when reading from a MySQL database in PHP:

 $row["message"] = utf8_encode($row["message"]); 

The second data set is only displayed correctly (when emojis and accents are involved) when I DO NOT include the utf8_encode($row["message"]) . I am trying to reconcile them so that both datasets correctly return to my iOS application. Please, help!

+7
python mysql encoding utf-8
source share
4 answers

MySQL utf8mb4 encoding is just standard UTF-8 .

However, they had to add this name to distinguish it from the interrupted UTF-8 character set , which only supports BMP characters.

In other words, when communicating with MySQL, you should always code in UTF-8, but note that the database may not handle Unicode code points outside of U + FFFF, unless you use utf8mb4 on the MySQL side.

Generally speaking, you want to avoid manual encoding and decoding. Configure your connection and collation to handle Unicode for you. For MySQLdb this means setting charset='utf8' (this sets use_unicode=True and the SET NAMES and SET character_set_connection descriptors ) and then treats all the text on the Python side as Unicode text.

+17
source share

I struggled with the correct exchange of the full UTF-8 character set between Python and MySQL for the sake of Emoji and other characters behind code number U + FFFF.

For everything to be fine, I had to do the following:

  • make sure utf8mb4 used for CHAR , VARCHAR and TEXT columns in MySQL
  • provide utf-8 in python
  • apply UTF-8 for use between Python and MySQL

To provide UTF-8 in Python, add the following line as the first or second line of your Python script:

 # -*- coding: utf-8 -*- 

To ensure UTF-8 compliance between Python and MySQL, configure the MySQL connection as follows:

 # Connect to mysql. dbc = MySQLdb.connect(host='###', user='###', passwd='###', db='###', use_unicode=True) # Create a cursor. cursor = dbc.cursor() # Enforce UTF-8 for the connection. cursor.execute('SET NAMES utf8mb4') cursor.execute("SET CHARACTER SET utf8mb4") cursor.execute("SET character_set_connection=utf8mb4") # Do database stuff. # Commit data. dbc.commit() # Close cursor and connection. cursor.close() dbc.close() 

Thus, you do not need to use functions such as encode and utf8_encode .

+22
source share

use_unicode=True did not work for me.

My decision

  • in mysql, change the whole database, table encoding and fields to utf8mb4
  • MySQLdb.connect(host='###' [...], charset='utf8'
  • dbCursor.execute('SET NAMES utf8mb4')
  • dbCursor.execute("SET CHARACTER SET utf8mb4")
+2
source share

You can also enter the desired code type as follows

 mysql.connector.connect(host = '<host>', database = '<db>', user = '<user>', password = '<password>', charset = 'utf8') 

The fields inside & lt;> are your own data. Instead of 'utf8' you can also write 'utf8mb4' depending on the type of encoding your mysqldb wants.

0
source share

All Articles