Encoding issue when loading HTML using mechanize and Python 2.6

browser = mechanize.Browser() page = browser.open(url) html = page.get_data() print html 

It shows some strange characters. I suppose this is a UTF-8 string, but Python does not know this and cannot display correctly.

How to convert this string to unicode string e.g.

 u = u'test' 
+4
source share
3 answers

He was gzipped

 def ungzipResponse(r,b): headers = r.info() if headers['Content-Encoding']=='gzip': import gzip gz = gzip.GzipFile(fileobj=r, mode='rb') html = gz.read() gz.close() headers["Content-type"] = "text/html; charset=utf-8" r.set_data( html ) b.set_response(r) response = browser.open(url) ungzipResponse(response, browser) html = response.read() 
+4
source
 u = html.decode('utf-8') 
+1
source

you need to define an encoding for example:

 #!/usr/bin/python # -*- coding: iso-8859-15 -*- 

need mechanization.

for more information check this out http://www.python.org/dev/peps/pep-0263/

+1
source

All Articles