Create a unit test for the scrapy CrawlSpider method

Initial task

I write the CrawlSpider class (using the library scrapy) and rely on a lot of asynchronous magic scrapyto make it work. Here it is divided:

class MySpider(CrawlSpider):
    rules = [Rule(LinkExtractor(allow='myregex'), callback='parse_page')]
    # some other class attributes

    def __init__(self, *args, **kwargs):
        super(MySpider, self).__init__(*args, **kwargs)
        self.response = None
        self.loader = None

    def parse_page_section(self):
        soup = BeautifulSoup(self.response.body, 'lxml')
        # Complicated scraping logic using BeautifulSoup
        self.loader.add_value(mykey, myvalue)

    # more methods parsing other sections of the page
    # also using self.response and self.loader

    def parse_page(self, response):
        self.response = response
        self.loader = ItemLoader(item=Item(), response=response)
        self.parse_page_section()
        # call other methods to collect more stuff
        self.loader.load_item()

The class attribute ruletells my spider to follow certain links and go to the callback function after loading web pages. My goal is to test a parsing method called parse_page_sectionwithout starting the crawler or even creating real HTTP requests.

What i tried

mock. , , , ( - ...), , . MySpider , parse_page_section .

response ItemLoader self.response.body BeautifulSoup. , :

from argparse import Namespace

my_spider = MySpider(CrawlSpider)
my_spider.response = NameSpace(body='<html>...</html>')

BeautifulSoup, ItemLoader. .

? , , . .

+4
1

?

, . :

def parse(self, response):
    """ This function parses a sample response. Some contracts are mingled
    with this docstring.

    @url http://www.amazon.com/s?field-keywords=selfish+gene
    @returns items 1 16
    @returns requests 0 0
    @scrapes Title Author Year Price
    """

check .

, - .

+1

All Articles