The difference between BaseSpider and CrawlSpider

I am trying to understand the concept of using BaseSpider and CrawlSpider in web recycling. I read the docs . But there is no mention on BaseSpider. It would be very helpful if someone explains the differences between BaseSpider and CrawlSpider .

+7
python web-scraping scrapy scrapy-spider
source share
1 answer

BaseSpider has existed before and is now deprecated (since 0.22 ) - use scrapy.Spider instead:

 import scrapy class MySpider(scrapy.Spider): # ... 

scrapy.Spider is the simplest spider that will basically visit the URLs defined in start_urls or return start_requests() .

Use CrawlSpider when you need the "crawl" behavior - extract links and their subsequent:

This is the most commonly used spider for crawling regular websites, as it provides a convenient mechanism for following links by defining a set of rules . This may not be the most suitable for your particular network. sites or a project, but its common set for several cases, so you can start with it and redefine it as necessary for more functionality or just implement your own spider.

+8
source share

All Articles