"WindowsError: [Error 5] Access denied" using urllib2

I get the message "WindowsError: [Error 5] Access is denied" when reading a website with urllib2.

from urllib2 import urlopen, Request from bs4 import BeautifulSoup hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11'} req = Request('https://' + url, headers=hdr) soup = BeautifulSoup( urlopen( req ).read() ) 

Full trace:

 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Python27\lib\urllib2.py", line 154, in urlopen return opener.open(url, data, timeout) File "C:\Python27\lib\urllib2.py", line 431, in open response = self._open(req, data) File "C:\Python27\lib\urllib2.py", line 449, in _open '_open', req) File "C:\Python27\lib\urllib2.py", line 409, in _call_chain result = func(*args) File "C:\Python27\lib\urllib2.py", line 1240, in https_open context=self._context) File "C:\Python27\lib\urllib2.py", line 1166, in do_open h = http_class(host, timeout=req.timeout, **http_conn_args) File "C:\Python27\lib\httplib.py", line 1258, in __init__ context = ssl._create_default_https_context() File "C:\Python27\lib\ssl.py", line 440, in create_default_context context.load_default_certs(purpose) File "C:\Python27\lib\ssl.py", line 391, in load_default_certs self._load_windows_store_certs(storename, purpose) File "C:\Python27\lib\ssl.py", line 378, in _load_windows_store_certs for cert, encoding, trust in enum_certificates(storename): WindowsError: [Error 5] Access is denied 

I tried to run the script from the command line with administrator privileges, as suggested here , but this does not fix the problem.

Any suggestions for fixing this error?

+6
source share
2 answers

This appears to be a Windows certificate store inconsistency. httplib - which is called internally by urllib2 - has recently been changed without checking the server certificate, so that by default it checks the server certificate. Therefore, you will encounter this problem in any python script that is based on urllib , httplib and works in your user profile.

However, something seems very wrong in your Windows certificate store. httplib does not work for you when trying to list certificates for the CA certification authority named certificate stores (displayed as Intermediate Certification Authorities in certmgr.msc ), but for ROOT , which is the usual trusted root certificate store (see comments for the question). Therefore, I suggest checking all certificates in certmgr:intermediate certificate authorities for recently added certificates and / or Windows log for common errors. What happens in your case is that urllib2 internally calls httplib , which then tries to set the default ssl context with a forced certificate validation, and as part of this lists the trusted certificate bindings of your system by calling ssl.enum_certificates , This function is implemented in C as _ssl_enum_certificates_impl and internally calls WINAPIs CertOpenSystemStore and CertEnumCertificatesInStore . For the CA certificate store location, it simply fails in one of two winapi access denied calls.

If you want to continue debugging, you can also try to manually call WINAPI:CertOpenSystemStore with LPTCSTR::'CA' as an argument and try to debug it from this side, try other Windows certificate management tools and / or call Microsoft support for support.

There are also signs that others had similar problems interacting with this api call, see google: access denied CertOpenSystemStore

If you just want it to work without fixing the root cause, you could just try using the following workaround, which temporarily fixes _windows_cert_stores , so as not to turn on the damaged CA certstore or completely disable the trust-anchor logic loading. (all other ssl.SSLContext calls will be fixed in the current process)

Note that effectively disables server certificate verification.

 ssl.SSLContext._windows_cert_stores = ("ROOT",) # patch windows_cert_stores default to only include "ROOT" as "CA" is broken for you. #ssl.SSLContext.load_default_certs = lambda s,x:None # alternative, fully NOP load_default_certs to do nothing instead. ctx = ssl.create_default_context() # create new sslcontext, not veryfing any certificates, hostnames. ctx.check_hostname = False ctx.verify_mode = ssl.CERT_NONE hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11'} req = Request('https://' + url, headers=hdr) x = urlopen( req , context=ctx).read() ssl.SSLContext._windows_cert_stores = ("ROOT","CA") # UNDO PATCH 

I hope this information helps you solve the problem. good luck.

+3
source

There are several potential problems using the Windows certificate store. (I found for the case when you run your code from a service account without a full user profile, this is almost impossible). The reasons are somewhat complicated, but they should not be discussed, because there is an easier solution. Disabling SSL verification, as has already been suggested, is a workaround, but probably not the best if you care about the validity of the certificates presented.

Just avoid this using an offline certificate store. For Python, this is a certifi package that is being updated. This is easy to get from the python requests package. Both should be readily available for most common python distributions.

 import requests from bs4 import BeautifulSoup url = "www.google.com" hdr = { 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11'} r = requests.get('https://' + url, headers=hdr, verify=True) soup = BeautifulSoup(r.text) 

Please note that request.get () will throw an exception on invalid addresses, inaccessible sites and failed certificate validation. Therefore, you want to be prepared to catch them. When the site was successfully connected and the certificate was verified, but the page was not found (for example, error 404), you will not receive an exception. Therefore, you should also check that r.status_code == 200 after the request. (30x redirects are handled automatically, so you wonโ€™t see them as status codes unless you tell him not to follow them.) This check is excluded for the sake of clarity.

Also note that you are not explicitly referencing the certifi module. requests will use it if they are installed. If not installed, requests will use a more limited set of root certification authorities.

+2
source

All Articles