I am trying to pass a script to python 3 that sends the XML feeds found here:
https://developers.google.com/search-appliance/documentation/files/pushfeed_client.py.txt
After starting 2to3.py and making a few minor adjustments to remove any syntax errors, the script fails with the error:
(py33dev) d:\dev\workspace>python pushfeed_client.py --datasource="TEST1" --feedtype="full" --url="http://gsa:19900/xmlfeed" --xmlfilename="test.xml" Traceback (most recent call last): File "pushfeed_client.py", line 108, in <module> main(sys.argv) File "pushfeed_client.py", line 56, in main result = urllib.request.urlopen(request_url) File "C:\Python33\Lib\urllib\request.py", line 156, in urlopen return opener.open(url, data, timeout) File "C:\Python33\Lib\urllib\request.py", line 469, in open response = self._open(req, data) File "C:\Python33\Lib\urllib\request.py", line 487, in _open '_open', req) File "C:\Python33\Lib\urllib\request.py", line 447, in _call_chain result = func(*args) File "C:\Python33\Lib\urllib\request.py", line 1268, in http_open return self.do_open(http.client.HTTPConnection, req) File "C:\Python33\Lib\urllib\request.py", line 1253, in do_open r = h.getresponse() File "C:\Python33\Lib\http\client.py", line 1147, in getresponse response.begin() File "C:\Python33\Lib\http\client.py", line 358, in begin version, status, reason = self._read_status() File "C:\Python33\Lib\http\client.py", line 340, in _read_status raise BadStatusLine(line) http.client.BadStatusLine: <!DOCTYPE html>
Why does he return this exception with a response from the server? Here is the full answer from GSA when I snorted at the session:
<!DOCTYPE html> <html lang=en> <meta charset=utf-8> <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width"> <title>Error 400 (Bad Request)!!1</title> <style> *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}} </style> <a href=//www.google.com/><img src=//www.google.com/images/errors/logo_sm.gif alt=Google></a> <p><b>400.</b> <ins>That's an error.</ins> <p>Your client has issued a malformed or illegal request. <ins>That's all we know.</ins>
And he returned HTTP 400. I can reliably cause this problem whenever there is a utf-8 character in the XML database. It works flawlessly when it's just ascii. Here is the most basic version of the code that I can use to reliably recreate the problem:
import http.client http.client.HTTPConnection.debuglevel = 1 with open("GSA_full_Feed.xml", encoding='utf-8') as xdata: payload = xdata.read() content_length = len(payload) feed_path = "xmlfeed" content_type = "multipart/form-data; boundary=----------boundary_of_feed_data$" headers = {"Content-type": content_type, "Content-length": content_length} conn = http.client.HTTPConnection("gsa", 19900) conn.request("POST", feed_path, body=payload.encode("utf-8"), headers=headers) res = conn.getresponse() print(res.read()) conn.close()
And here is an example of the XML payload that is used to raise an exception:
<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE gsafeed PUBLIC "-//Google//DTD GSA Feeds//EN" "gsafeed.dtd"> <gsafeed> <header> <datasource>TEST1</datasource> <feedtype>full</feedtype> </header> <group> <record action="add" mimetype="text/html" url="https://myschweetassurl.com"> <metadata> <meta content="shit happens, then you die" name="description"/> </metadata> <content>wacky UmlΓ€ut test of non utf-8 characters</content> </record> </group> </gsafeed>
The only delta I can find between versions 2 and 3 is the content length headers for each request. Python 3 version is consistently shorter than version 2, 870 versus 873.