BadStatusLine exception thrown when returning a response from a server in Python 3

I am trying to pass a script to python 3 that sends the XML feeds found here:

https://developers.google.com/search-appliance/documentation/files/pushfeed_client.py.txt

After starting 2to3.py and making a few minor adjustments to remove any syntax errors, the script fails with the error:

(py33dev) d:\dev\workspace>python pushfeed_client.py --datasource="TEST1" --feedtype="full" --url="http://gsa:19900/xmlfeed" --xmlfilename="test.xml" Traceback (most recent call last): File "pushfeed_client.py", line 108, in <module> main(sys.argv) File "pushfeed_client.py", line 56, in main result = urllib.request.urlopen(request_url) File "C:\Python33\Lib\urllib\request.py", line 156, in urlopen return opener.open(url, data, timeout) File "C:\Python33\Lib\urllib\request.py", line 469, in open response = self._open(req, data) File "C:\Python33\Lib\urllib\request.py", line 487, in _open '_open', req) File "C:\Python33\Lib\urllib\request.py", line 447, in _call_chain result = func(*args) File "C:\Python33\Lib\urllib\request.py", line 1268, in http_open return self.do_open(http.client.HTTPConnection, req) File "C:\Python33\Lib\urllib\request.py", line 1253, in do_open r = h.getresponse() File "C:\Python33\Lib\http\client.py", line 1147, in getresponse response.begin() File "C:\Python33\Lib\http\client.py", line 358, in begin version, status, reason = self._read_status() File "C:\Python33\Lib\http\client.py", line 340, in _read_status raise BadStatusLine(line) http.client.BadStatusLine: <!DOCTYPE html> 

Why does he return this exception with a response from the server? Here is the full answer from GSA when I snorted at the session:

 <!DOCTYPE html> <html lang=en> <meta charset=utf-8> <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width"> <title>Error 400 (Bad Request)!!1</title> <style> *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}} </style> <a href=//www.google.com/><img src=//www.google.com/images/errors/logo_sm.gif alt=Google></a> <p><b>400.</b> <ins>That's an error.</ins> <p>Your client has issued a malformed or illegal request. <ins>That's all we know.</ins> 

And he returned HTTP 400. I can reliably cause this problem whenever there is a utf-8 character in the XML database. It works flawlessly when it's just ascii. Here is the most basic version of the code that I can use to reliably recreate the problem:

 import http.client http.client.HTTPConnection.debuglevel = 1 with open("GSA_full_Feed.xml", encoding='utf-8') as xdata: payload = xdata.read() content_length = len(payload) feed_path = "xmlfeed" content_type = "multipart/form-data; boundary=----------boundary_of_feed_data$" headers = {"Content-type": content_type, "Content-length": content_length} conn = http.client.HTTPConnection("gsa", 19900) conn.request("POST", feed_path, body=payload.encode("utf-8"), headers=headers) res = conn.getresponse() print(res.read()) conn.close() 

And here is an example of the XML payload that is used to raise an exception:

 <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE gsafeed PUBLIC "-//Google//DTD GSA Feeds//EN" "gsafeed.dtd"> <gsafeed> <header> <datasource>TEST1</datasource> <feedtype>full</feedtype> </header> <group> <record action="add" mimetype="text/html" url="https://myschweetassurl.com"> <metadata> <meta content="shit happens, then you die" name="description"/> </metadata> <content>wacky UmlΓ€ut test of non utf-8 characters</content> </record> </group> </gsafeed> 

The only delta I can find between versions 2 and 3 is the content length headers for each request. Python 3 version is consistently shorter than version 2, 870 versus 873.

+6
source share
1 answer

After many wires, we found out the reason and solution to this problem - this is the way the content length header was set. In my Python 3-port script, I copied a method that set the length of the content. What is it:

 headers['Content-length']=str(len(body)) 

This is not true! The right way:

 headers['Content-length']=str(len(bytes(body, 'utf-8'))) 

Because the payload must be a byte object. When you encode bytes, the length is different from the string.

 return urllib.request.Request(theurl, bytes(body, 'utf-8'), headers) 

You can safely omit manually setting the content length header when using anything that comes from http.client.HTTPConnection. It has an internal method that checks the header for the length of the content, and if it is missing, set it based on the length of the content body, regardless of shape.

The problem was the translation, but the subtle difference between Python 2 and 3 and how it processes strings and encodes them. It must have been some kind of accident when the regular version of ASCII worked, when the version of utf-8 did not, well.

+7
source

All Articles