I am trying to launch my Scrapy spider if the closed reason is connected to my Internet connection (at night the Internet takes 5 minutes). When the Internet goes down, the spider closes after 5 attempts.
I am trying to use this function inside my spider definition, trying to restart the spider on close:
def handle_spider_closed(spider, reason): relaunch = False for key in spider.crawler.stats._stats.keys(): if 'DNSLookupError' in key: relaunch = True break if relaunch: spider = mySpider() settings = get_project_settings() crawlerProcess = CrawlerProcess(settings) crawlerProcess.configure() crawlerProcess.crawl(spider) spider.crawler.queue.append_spider(another_spider)
I tried a lot of things, such as a re-instance of a spider, but got a Reactor error already working or something like that.
I thought about executing the spider from the script, and when the spider finishes repeating it, but it doesnβt work due to the fact that the reactor is still in use.
- My intention is to reset the spider after it closes (the spider closes because it has lost its internet connection)
Does anyone know a good and easy way to do this?
source share