As an exercise, I built a small script that requests the Google Suggest JSON API. The code is pretty simple:
query = 'a' url = "http://clients1.google.co.jp/complete/search?hl=ja&q=%s&json=t" %query response = urllib.urlopen(url) result = json.load(response) UnicodeDecodeError: 'utf8' codec can't decode byte 0x83 in position 0: invalid start byte
If I try the read() response object, this is what I have:
'["a",["amazon","ana","au","apple","adobe","alc","\x83A\x83}\x83]\x83\x93","\x83A\x83\x81\x83u\x83\x8d","\x83A\x83X\x83N\x83\x8b","\x83A\x83\x8b\x83N"],["","","","","","","","","",""]]'
Thus, this means that an error occurs when python tries to decode the string. This only happens with google.co.jp and Japanese. I tried the same code with different versions of contry / languages, and I did not get the same problem: when I try to deserialize the object, everything works fine.
- I checked the response headers and they always indicate utf-8 as the response encoding.
- I checked the JSON string with the online parser (http://json.parser.online.fr/) and again all seams are OK
Any ideas to solve this problem? What makes the JSON load() function a throttle?
Thanks in advance.
raben
source share