How to debug a rule in crawlspider?

scrapy shell is a great tool for debugging an xpath expression, but is there any tool or method for debugging a rule in crawlspider? which means that I can know that the rule works as I wish.

My rules:

rules = ( Rule(SgmlLinkExtractor(allow=r'/search*',restrict_xpaths="//a[@id='pager_page_next']"), follow=False), #Rule(SgmlLinkExtractor(allow=r'/chart/[\d]+s$'), callback='parse_toplist_page', follow=True), ) 

and it doesn't match the links I wanted, so how to debug any example?

+4
source share
2 answers

Have you tried the Scrapy parse team?

 scrapy parse <URL> 

Where <URL> is the URL you want to check.

It will return all retrieved links (which will be respected) from this URL.

You can use the --noitems argument to show only links, and the --spider argument to indicate the spider explicitly.

 scrapy parse <URL> --noitems --spider <MYSPIDER> 

For more information on debugging spiders, see http://doc.scrapy.org/en/latest/topics/debug.html

This answer was provided by Pablo Hoffman in a user group: https://groups.google.com/forum/?fromgroups=#!topic/scrapy-users/tOdk4Xw2Z4Y

+4
source

I don’t believe that, I usually have to start spiders and check which sites it gets on the command line. Sometimes I cannot kill a program using the C control and have to raise my task manager and kill the entire command line. It is a pain.

+1
source

All Articles