SSL Site Bypass with Squeak

I need to scan https://dms.psc.sc.gov/Web/dockets which uses TLS v1.2 using scrapy framework. But when requesting the url, it cannot load and pick up [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>].

There is a problem discussed on git https://github.com/scrapy/scrapy/issues/981 , but this did not work for me. I have scrapy v 0.24.5 and a twisted version> = 14.

When I try to crawl another site that also uses TLS v1.2, it works, but not for https://dms.psc.sc.gov . How to solve this problem?

+4
source share
4 answers

PR, Scrapy . ( 2016 )

, Scrapy , .

, HTTP Scrapy, :

  • Scrapy
  • , Twisted, , Twisted ( 14 , , SSL).

- Scrapy Twisted, ScrapyClientContextFactory - . .

github

+4

1. DOWNLOADER_CLIENTCONTEXTFACTORY='testproject.CustomContext.CustomClientContextFactory'  

2. CustomContext.py       

from OpenSSL import SSL
from twisted.internet.ssl import ClientContextFactory
from twisted.internet._sslverify import ClientTLSOptions
from scrapy.core.downloader.contextfactory import ScrapyClientContextFactory
class CustomClientContextFactory(ScrapyClientContextFactory):

    def getContext(self, hostname=None, port=None):
        ctx = ClientContextFactory.getContext(self)
        # Enable all workarounds to SSL bugs as documented by
        # http://www.openssl.org/docs/ssl/SSL_CTX_set_options.html
        ctx.set_options(SSL.OP_ALL)
        if hostname:
            ClientTLSOptions(hostname, ctx)
        return ctx

.. https Windows, Ubuntu 14.04, , : -

from twisted.internet._sslverify import ClientTLSOptions
exceptions.ImportError: cannot import name ClientTLSOptions

, - .

EDIT:

from twisted.internet._sslverify import ClientTLSOptions

try:
    # available since twisted 14.0
    from twisted.internet._sslverify import ClientTLSOptions
except ImportError:
    ClientTLSOptions = None
+3

Anyone having "TypeError: unbound method getContext () should be called with an instance of ClientContextFactory as the first argument ..."

Replace ctx = ClientContextFactory.getContext(self)

with ctx = ScrapyClientContextFactory.getContext(self)

0
source
The answer to the question of Vinodh Velumayil is right. But I had to edit this line:
ctx = ClientContextFactory.getContext(self)

:

inst = ClientContextFactory()
ctx = inst.getContext()
0
source

All Articles