Encoding issue when loading HTML using mechanize and Python 2.6

Question

Encoding issue when loading HTML using mechanize and Python 2.6

browser = mechanize.Browser() page = browser.open(url) html = page.get_data() print html

It shows some strange characters. I suppose this is a UTF-8 string, but Python does not know this and cannot display correctly.

How to convert this string to unicode string e.g.

 u = u'test'

+4

python encoding unicode utf-8 mechanize

luchaninov Sep 27 '10 at 14:03

source share

3 answers

 u = html.decode('utf-8')

+1

Ned batchelder Sep 27 '10 at 14:23

source share

you need to define an encoding for example:

 #!/usr/bin/python # -*- coding: iso-8859-15 -*-

need mechanization.

for more information check this out http://www.python.org/dev/peps/pep-0263/

+1

Yuda prawira Oct 3 '10 at 12:30

source share

luchaninov · Accepted Answer · 2010-09-27T15:19:03+0000

He was gzipped

 def ungzipResponse(r,b): headers = r.info() if headers['Content-Encoding']=='gzip': import gzip gz = gzip.GzipFile(fileobj=r, mode='rb') html = gz.read() gz.close() headers["Content-type"] = "text/html; charset=utf-8" r.set_data( html ) b.set_response(r) response = browser.open(url) ungzipResponse(response, browser) html = response.read()

Encoding issue when loading HTML using mechanize and Python 2.6

More articles: