Python urllib3 and how to handle cookie support?

So I'm looking at urllib3 because it has a connection pool and is thread safe (so performance is better, especially for workaround), but the documentation ... is minimal, at least. Urllib2 has build_opener, so something like:

#!/usr/bin/python import cookielib, urllib2 cj = cookielib.CookieJar() opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) r = opener.open("http://example.com/") 

But urllib3 does not have a build_opener method, so the only way I have guessed so far is to manually put it in the header:

 #!/usr/bin/python import urllib3 http_pool = urllib3.connection_from_url("http://example.com") myheaders = {'Cookie':'some cookie data'} r = http_pool.get_url("http://example.org/", headers=myheaders) 

But I hope there is a better way, and one of you can tell me what it is. Also can someone mark this with "urllib3".

+7
python urllib3
source share
5 answers

You're right, there’s no better way to do it right now. I would be more than happy to accept the patch if you have a congruent improvement.

One thing to keep in mind is urllib3 HTTPConnectionPool is for the "connection pool" for a particular host, not for a client with state. In this context, it makes sense to keep cookie tracking out of the actual pool.

  • shazow (author urllib3)
+9
source share

Are there any problems with multiple cookies?

Some servers return multiple Set-Cookie headers, but urllib3 stores the headers in a dict, and dict does not allow multiple records with the same key.

httplib2 has a similar problem.

Or maybe not: it turns out that the readheaders method of the HTTPMessage class in the httplib package, which is used by both urllib3 and httplib2, has the following comment:

If multiple header fields with the same name appear, they are combined in accordance with RFC 2616 with 4.2:

  Appending each subsequent field-value to the first, each separated by a comma. The order in which header fields with the same field-name are received is significant to the interpretation of the combined field value. 

Thus, no headers are lost.

However, there is a problem if there are commas in the header value. I still do not understand what is happening here, but from skimming RFC 2616 ("Hypertext Transfer Protocol - HTTP / 1.1") and RFC 2965 ("HTTP State Control Mechanism") I get the impression that any commas in the header should have a value indicated.

+2
source share

You must use the query library. It uses urllib3, but does things like adding cookies is trivial.

https://github.com/kennethreitz/requests

 import requests r1 = requests.get(url, cookies={'somename':'somevalue'}) print(r1.content) 
+2
source share

You need to set the 'Cookie' not 'Set-Cookie' , 'Set-Cookie' set by the web server.

And Cookies are one of the headlines, so they don’t do anything wrong with that.

+1
source share

You can use this code:

 def getHtml(url): http = urllib3.PoolManager() r = http.request('GET', url, headers={'User-agent':'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.16 Safari/537.36','Cookie':'cookie_name=cookie_value'}) return r.data #HTML 

You must replace cookie_name and cookie_value

+1
source share

All Articles