If we are talking specifically about how to test spiders (rather than pipelines or loaders), then what we have done is provided with a "fake answer" from the local HTML file. Code example:
import os from scrapy.http import Request, TextResponse def fake_response(file_name=None, url=None): """Create a Scrapy fake HTTP response from a HTML file""" if not url: url = 'http://www.example.com' request = Request(url=url) if file_name: if not file_name[0] == '/': responses_dir = os.path.dirname(os.path.realpath(__file__)) file_path = os.path.join(responses_dir, file_name) else: file_path = file_name file_content = open(file_path, 'r').read() else: file_content = '' response = TextResponse(url=url, request=request, body=file_content, encoding='utf-8') return response
Then, in the TestCase class, call the fake_response() function and respond to the parse() callback:
from unittest.case import TestCase class MyTestCase(TestCase): def setUp(self): self.spider = MySpider() def test_parse(self): response = fake_response('input.html') item = self.spider.parse(response) self.assertEqual(item['title'], 'My Title')
In addition, you should start using Item Loaders with input and output processors - this will help to achieve better modularity and, therefore, isolation - the spider will simply give out instance instances, data collection and modification will be encapsulated inside the bootloader, which you will test separately.
alecxe
source share