Python requests to download the wrong sound file from google translate

I use the script below to download the Chinese language 老師, but when I run it, I get a file different from the one present on this URL. I think this is an encoding problem, but since I pointed out UTF-8, I'm not sure what is going on.

#!/usr/bin/python
# -*- coding: utf-8 -*-

import requests

url = "http://translate.google.com/translate_tts?tl=zh-CN&q=老師"

r = requests.get(url)

with open('test.mp3', 'wb') as test:
    test.write(r.content)

UPDATE:

As per @abarnert's suggestion, I checked that the UTF-8 file with the specification and tested the code using 'idna'.

#!/usr/bin/python3
# -*- coding: utf-8 -*-

import requests

url_1 = "http://translate.google.com/translate_tts?tl=zh-CN&q=老師"
url_2 = "http://translate.google.com/translate_tts?tl=zh-CN&q=\u8001\u5e2b"

r_1 = requests.get(url_1)
r_1_b = requests.get(url_1.encode('idna'))
r_2 = requests.get(url_2)
r_2_b = requests.get(url_2.encode('idna'))

# This downloads nonsense:
with open('r_1.mp3', 'wb') as test:
    test.write(r_1.content)

# This throws the error specified at bottom:
with open('r_1_b.mp3', 'wb') as test:
    test.write(r_1_b.content)

# This parses the characters individually, producing
# a file consisting of "u, eight, zero..." in Mandarin
with open('r_2.mp3', 'wb') as test:
    test.write(r_2.content)

# This produces a sound file consisting of "u, eight, zero, zero..." in Mandarin
with open('r_2_b.mp3', 'wb') as test:
    test.write(r_2_b.content)

The error I get is:

Traceback (most recent call last):
  File "/home/MZ/Desktop/tts3.py", line 12, in <module>
    r_1_b = requests.get(url_1.encode('idna'))
  File "/usr/lib64/python2.7/encodings/idna.py", line 164, in encode
    result.append(ToASCII(label))
  File "/usr/lib64/python2.7/encodings/idna.py", line 76, in ToASCII
    label = nameprep(label)
  File "/usr/lib64/python2.7/encodings/idna.py", line 21, in nameprep
    newlabel.append(stringprep.map_table_b2(c))
  File "/usr/lib64/python2.7/stringprep.py", line 197, in map_table_b2
    b = unicodedata.normalize("NFKC", al)
TypeError: must be unicode, not str
[Finished in 15.3s with exit code 1]
+4
source share
1 answer

Python 2 Linux Windows ( , , ). Python 3, , .

: Unicode, , ASCII. Python 2 u ( Python 3 u , ):

url = u"http://translate.google.com/translate_tts?tl=zh-CN&q=老師"

( ):

url_2 = u"http://translate.google.com/translate_tts?tl=zh-CN&q=\u8001\u5e2b"

UTF-8 requests, , UTF-8.

, sys.getdefaultencoding(), , , "ascii", , Mac Linux, . Windows "cp1252" "big5" , mojibake.

. , , UTF-8 Mac, - , "eh eh eh" Linux ( , , U + 00E8, U + 0080, U + 0081?), - , , Windows.

url_2, : 2.x, Unicode, \u8001 escape-, , u, 8, 0, 0 , 1 . Which request` Google, , -, .

u, , .

Python 3 u . (, 3.x b... , -, , , UTF-8 3.x, Big5, mojibakes UTF-8 sys.getdefaultencoding .)

, - , .

0

All Articles