How to debug a rule in crawlspider?

Question

How to debug a rule in crawlspider?

scrapy shell is a great tool for debugging an xpath expression, but is there any tool or method for debugging a rule in crawlspider? which means that I can know that the rule works as I wish.

My rules:

rules = ( Rule(SgmlLinkExtractor(allow=r'/search*',restrict_xpaths="//a[@id='pager_page_next']"), follow=False), #Rule(SgmlLinkExtractor(allow=r'/chart/[\d]+s$'), callback='parse_toplist_page', follow=True), )

and it doesn't match the links I wanted, so how to debug any example?

+4

python scrapy

young001 Jan 2 '13 at 9:07

source share

2 answers

Steven almeroth · Answer 1 · 2013-02-04T22:26:11+0000

Have you tried the Scrapy parse team?

 scrapy parse <URL>

Where <URL> is the URL you want to check.

It will return all retrieved links (which will be respected) from this URL.

You can use the --noitems argument to show only links, and the --spider argument to indicate the spider explicitly.

 scrapy parse <URL> --noitems --spider <MYSPIDER>

For more information on debugging spiders, see http://doc.scrapy.org/en/latest/topics/debug.html

This answer was provided by Pablo Hoffman in a user group: https://groups.google.com/forum/?fromgroups=#!topic/scrapy-users/tOdk4Xw2Z4Y

Chris hawkes · Answer 2 · 2013-01-02T14:34:13+0000

I don’t believe that, I usually have to start spiders and check which sites it gets on the command line. Sometimes I cannot kill a program using the C control and have to raise my task manager and kill the entire command line. It is a pain.

How to debug a rule in crawlspider?

More articles: