Scrapy: how to debug lost requests

Question

Scrapy: how to debug lost requests

I have a scrapy spider, but it sometimes does not return requests.

I found that adding log messages before submitting a request and after receiving a response.

The spider has iteration through pages and a link to disassemble the elements for each page.

Here is a piece of code

SampleSpider(BaseSpider):
    ....
    def parse_page(self, response):
        ...
        request = Request(target_link, callback=self.parse_item_general)
        request.meta['date_updated'] = date_updated
        self.log('parse_item_general_send {url}'.format(url=request.url), level=log.INFO)
        yield request

    def parse_item_general(self, response):
        self.log('parse_item_general_recv {url}'.format(url=response.url), level=log.INFO)
        sel = Selector(response)
        ...

I compared the amount of each log message and parse_item_general_send is greater than parse_item_general_recv

There are no 400 or 500 errors in the final statistics, the status code of all answers is only 200. It seems that the requests just disappear.

I also added these parameters to minimize possible errors:

CONCURRENT_REQUESTS_PER_DOMAIN = 1
DOWNLOAD_DELAY = 0.8

- , . : Python Scrapy ,

+4

python twisted scrapy

Nikolay Golub 21 . '13 20:46

1

IamnotBatman · Answer 1 · 2014-01-29T20:01:08+0000

., , Rho,

DUPEFILTER_CLASS = 'scrapy.dupefilter.BaseDupeFilter'

"settings.py", URL-. , scrapy , , .

Scrapy: how to debug lost requests

More articles: