Using certifi module with urllib2?

I have a problem loading https pages using the urllib2 module , which seems to be due to urllib2's inability to access the system certificate store.

To work around this problem, one possible solution is to download https pages using pycurl using the certifi module. The following is an example:

def download_web_page_with_curl(url_website): from pycurl import Curl, CAINFO, URL from certifi import where from cStringIO import StringIO response = StringIO() curl = Curl() curl.setopt(CAINFO, where()) curl.setopt(URL, url_website) curl.setopt(curl.WRITEFUNCTION, response.write) curl.perform() curl.close() return response.getvalue() 

Is there a way to use certifi with urllib2 (in a mod comparable to the pycurl example above) that allows me to download https sites? Also, is there another urllib2-based solution that fixes the permissions issue without compromising security?

+2
source share
2 answers

Would recommend using requests for my other answer. However, to answer the original question on how to do this with urllib2:

 import urllib2 import certifi def download_web_page_with_urllib2(url_website): t = urllib2.urlopen(url_website, cafile=certifi.where()) return t.read() text = download_web_page_with_urllib2('https://www.google.com/') 

The same error checking guidelines are used.

+2
source

Comment extension for using requests (which is built on urllib3):

 def download_web_page_with_requests(url_website): import requests r = requests.get(url_website) return r.text 

It is much simpler than anything else and properly handles SSL verification regardless of the platformโ€™s own certificate lists. If certifi , requests will automatically use it. Otherwise, it quietly reverts to a more limited, possibly older set of built-in root certificates. If you verify that certifi is used for you, you can do this:

 r = requests.get(url_website, verify=certifi.where()) 

Please note that the above code does not perform error checking, which you probably should do. Therefore, I will point out that request.get () may throw a number of exceptions for invalid ULRs, unreachable sites, communication errors, and failed certificate verification, so you should be prepared to catch them and deal with them. If he successfully talks to the server, but the server returns a non-OK status code (for example, for a page that does not exist), then an exception will not be thrown, so you also want to check that r. status_code == 200.

+2
source

All Articles