Inconsistent behavior with HTTP POST requests in Python

Question

Inconsistent behavior with HTTP POST requests in Python

Trying to make a POST request between Python (WSGI) and a NodeJS + Express application. They are located on different servers.

The problem is that when using different IP addresses (for example, a private network or a public network), the urllib2 request on the public network succeeds, but the same request for the private network fails using 502 Bad Gateway or URLError [32] Broken pipe .

The urllib2 code I'm using is this:

 req = urllib2.Request(url, "{'some':'data'}", {'Content-Type' : 'application/json; charset=utf-8'}) res = urllib2.urlopen(req) print f.read()

Now I have also encoded a request like this using requests :

 r = requests.post(url, headers = {'Content-Type' : 'application/json; charset=utf-8'}, data = "{'some':'data'}") print r.text

And get a response of 200 OK . This alternative method works for both networks.

I am interested to find out if there is any additional configuration needed for urllib2 request that I do not know about, or if I need to study some network configuration that may be absent (I do not believe it this way, since the alternative request method works, but I definitely wrong).

Any suggestions or pointers with this would be greatly appreciated. Thanks!

+6

python http rest node.js urllib2

Juan Carlos Coto Feb 04 '13 at 20:57

source share

1 answer

abarnert · Accepted Answer · 2013-02-05T19:30:35+0000

The problem is that, as Austin Phillips pointed out, the urllib2.Request constructor data parameter:

may be a string indicating additional data to send to the server ... data should be a buffer in the standard format application / x-www-form-urlencoded. The urllib.urlencode () function accepts a mapping or sequence of 2-tuples and returns a string in this format.

By transferring JSON-encoded data instead of data with urlencoded, you are confusing something.

However, Request has an add_data method:

Define the query data for the data. This is ignored by all handlers except HTTP handlers, and there should be a byte string and will change the request as POST, not GET.

If you use this, you should probably also use add_header , and not pass it in the constructor, although this does not seem to be mentioned specifically anywhere in the documentation.

So this should work:

 req = urllib2.Request(url) req.add_data("{'some':'data'}") req.add_header('Content-Type', 'application/json; charset=utf-8') res = urllib2.urlopen(req)

In the comment you said:

The reason I don’t just want to switch to queries without figuring out why I see this problem is that there may be some deeper problem that indicates that it may come back and make it harder to detect problems later .

If you want to find deep core issues, you are not going to do this simply by looking at your client source. The first step to finding out is “Why does X work but Y fail?” with network code is to find out which bytes X and Y each send. You can then try to narrow down what the difference is, and then figure out how much of your code is causing Y to send the wrong data in the appropriate place.

You can do this by registering things on the service (if you control it), running Wireshark, etc., but netcat is the easiest way for simple cases. You will need to read man nc for your system (and on Windows you will need to install and install netcat before you can run it), because the syntax for each version is different, but always something simple, t28>.

Then, on your client, change the URL to use localhost:12345 instead of the host name, and it will connect to netcat and send its HTTP request, which will be reset to the terminal. Then you can copy it and use nc HOST 80 and paste it to see how the real server responds and use it to narrow down the problem. Or if you are stuck, at least you can copy and paste the data into your SO question.

Last thing: this almost does not concern your problem (because you send the exact data using requests and work), but your data is actually not valid JSON, because instead of it, single quotes of double quotes are used. According to docs , string is defined as:

 string "" " chars "

(Documents also have a nice graphical presentation.)

In general, apart from really simple test cases, you do not want to write JSON manually. In many cases (including yours), all you have to do is replace "…" with json.dumps(…) , so this is not a serious problem. So:

 req = urllib2.Request(url) req.add_data(json.dumps({'some':'data'})) req.add_header('Content-Type', 'application/json; charset=utf-8') res = urllib2.urlopen(req)

So why does it work? Well, in JavaScript, single-quoted strings are legal, as well as other things like backslash screens that are not valid in JSON, and any JS code that uses restricted access analysis (or, worse, raw) for syntax analysis, will take it, And since many people are used to writing bad JSON, because of this, many native JSON browsers for browsers and many JSON libraries in other languages have workarounds to allow the distribution of errors. But you should not rely on it.

Inconsistent behavior with HTTP POST requests in Python

More articles: