PostgreSQL, Python, Jinja2 encoding

I have an encoding problem in my application and have not found a solution anywhere on the Internet.

Here is the scenario:

  • PostgreSQL UTF-8 encoded ( CREATE DATABASE xxxx WITH ENCODING 'UTF8' )

  • Python logic also with UTF-8 encoding ( # -*- coding: utf-8 -*- )

  • Jinja2 to show my HTML pages. Python and Jinja2 are used in Flask, which I use in microcards.

The title of my pages is: <meta http-equiv="content-type" content="text/html; charset=utf-8"/>

Well, using psycopg2 to make a simple request and print it to Jinja2, here is what I get:

 {% for company in list %} <li> {{ company }} </li> {% endfor %} 

(1, 'Casa das M \ xc3 \ xa1quinas', 'R. Tr \ xc3 \ xaas, Mineiros - Goi \ xc3 \ xa1s')

(2, 'Ar do Z \ xc3 \ xa9', 'Av. S \ xc3 \ xa9tima, Mineiros - Goi \ xc3 \ xa1s')

If I try to delve into the fields:

 {% for company in list %} <li> {% for field in company %} <li> {{ field }} </li> {% endfor %} </li> {% endfor %} 

I get the following error: UnicodeDecodeError: ascii codec cannot decode byte 0xc3 at position 10: serial number is not in range (128)

However, if I print the list fields before sending them to Jinja2, I get the expected result (which is also presented in postgresql):

1 Casa das Máquinas R. Três, Mineiros - Goiás

2 Ar do Zé Medium. Setima, Mineiros - Goyas

When I get the error, Flask offers the "debug" option. Here the code breaks. File "/home/anonimou/Desktop/flask/lib/python2.7/site-packages/jinja2/_markupsafe/_native.py", line 21, in escape sequence return Markup (unicode (s)

And I can also:

 [console ready] >>> print s Casa das Máquinas >>> s 'Casa das M\xc3\xa1quinas' >>> unicode(s) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128) >>> s.decode('utf-8') u'Casa das M\xe1quinas' >>> s.encode('utf-8') UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128) >>> s.decode('utf-8').encode('utf-8') 'Casa das M\xc3\xa1quinas' >>> print s.decode('utf-8').encode('utf-8') Casa das Máquinas >>> print s.decode('utf-8') Casa das Máquinas 

I already tried to break the list, decode, encode in python code before sending it to Jinja2. The same mistake.

Sooo, not sure what I can do here. = (

Thanks in advance!

+4
source share
1 answer

The problem is that psycopg2 returns the default byte strings in Python 2 :

When reading data from a database in Python 2, the returned rows are usually 8 bits of str objects encoded in the encoding of the database client

So you can:

  • Manual decoding of all data in UTF-8:

     # Decode the byte strings into Unicode objects using # the encoding you know that your database is using. companies = [company.decode("utf-8") for company in companies] return render_template("companies.html", companies=companies) 

or

  • Install encoders the first time you enter psycopg2 as described in the same section of the manual:

    Note In Python 2, if you want to consistently receive all the input in a Unicode database, you can register their associated typifiers all over the world as soon as Psycopg is imported:

     import psycopg2 import psycopg2.extensions psycopg2.extensions.register_type(psycopg2.extensions.UNICODE) psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY) 

    and then forget about this story.

+8
source

All Articles