Initial task
I write the CrawlSpider class (using the library scrapy) and rely on a lot of asynchronous magic scrapyto make it work. Here it is divided:
class MySpider(CrawlSpider):
rules = [Rule(LinkExtractor(allow='myregex'), callback='parse_page')]
def __init__(self, *args, **kwargs):
super(MySpider, self).__init__(*args, **kwargs)
self.response = None
self.loader = None
def parse_page_section(self):
soup = BeautifulSoup(self.response.body, 'lxml')
self.loader.add_value(mykey, myvalue)
def parse_page(self, response):
self.response = response
self.loader = ItemLoader(item=Item(), response=response)
self.parse_page_section()
self.loader.load_item()
The class attribute ruletells my spider to follow certain links and go to the callback function after loading web pages. My goal is to test a parsing method called parse_page_sectionwithout starting the crawler or even creating real HTTP requests.
What i tried
mock. , , , ( - ...), , . MySpider , parse_page_section .
response ItemLoader self.response.body BeautifulSoup. , :
from argparse import Namespace
my_spider = MySpider(CrawlSpider)
my_spider.response = NameSpace(body='<html>...</html>')
BeautifulSoup, ItemLoader. .
? , , . .