The difference between BaseSpider and CrawlSpider

Question

The difference between BaseSpider and CrawlSpider

I am trying to understand the concept of using BaseSpider and CrawlSpider in web recycling. I read the docs . But there is no mention on BaseSpider. It would be very helpful if someone explains the differences between BaseSpider and CrawlSpider .

+7

python python-2.7 web-scraping scrapy scrapy-spider

ni8mr Sep 17 '15 at 13:49

source share

1 answer

alecxe · Accepted Answer · 2015-09-17T13:51:03+0000

BaseSpider has existed before and is now deprecated (since 0.22 ) - use scrapy.Spider instead:

 import scrapy class MySpider(scrapy.Spider): # ...

scrapy.Spider is the simplest spider that will basically visit the URLs defined in start_urls or return start_requests() .

Use CrawlSpider when you need the "crawl" behavior - extract links and their subsequent:

This is the most commonly used spider for crawling regular websites, as it provides a convenient mechanism for following links by defining a set of rules . This may not be the most suitable for your particular network. sites or a project, but its common set for several cases, so you can start with it and redefine it as necessary for more functionality or just implement your own spider.

The difference between BaseSpider and CrawlSpider

More articles: