BaseSpider has existed before and is now deprecated (since 0.22 ) - use scrapy.Spider instead:
import scrapy class MySpider(scrapy.Spider):
scrapy.Spider is the simplest spider that will basically visit the URLs defined in start_urls or return start_requests() .
Use CrawlSpider when you need the "crawl" behavior - extract links and their subsequent:
This is the most commonly used spider for crawling regular websites, as it provides a convenient mechanism for following links by defining a set of rules . This may not be the most suitable for your particular network. sites or a project, but its common set for several cases, so you can start with it and redefine it as necessary for more functionality or just implement your own spider.
alecxe
source share