I have a scrapy spider, but it sometimes does not return requests.
I found that adding log messages before submitting a request and after receiving a response.
The spider has iteration through pages and a link to disassemble the elements for each page.
Here is a piece of code
SampleSpider(BaseSpider):
....
def parse_page(self, response):
...
request = Request(target_link, callback=self.parse_item_general)
request.meta['date_updated'] = date_updated
self.log('parse_item_general_send {url}'.format(url=request.url), level=log.INFO)
yield request
def parse_item_general(self, response):
self.log('parse_item_general_recv {url}'.format(url=response.url), level=log.INFO)
sel = Selector(response)
...
I compared the amount of each log message and parse_item_general_send is greater than parse_item_general_recv
There are no 400 or 500 errors in the final statistics, the status code of all answers is only 200. It seems that the requests just disappear.
I also added these parameters to minimize possible errors:
CONCURRENT_REQUESTS_PER_DOMAIN = 1
DOWNLOAD_DELAY = 0.8
- , .
: Python Scrapy ,