Scipping Django Limit link crawled

I just got set up and running scrapy and it works fine, but I have two questions (noob). First I have to say that I am completely new to screening and spider sites.

  • Can you limit the number of crawled links? I have a website that does not use pagination and simply lists many links (which I crawl) on their home page. It’s bad for me to crawl all of these links when I really need to crawl the first 10 or so.

  • How do you launch several spiders at once? I am using the command now scrapy crawl example.com, but I also have spiders for example2.com and example3.com. I would like to launch all my spiders using one command. Is it possible?

+5
source share
2 answers

for # 1: do not use the rule attribute to retrieve links and follow, write your rule in the parsing function and return or return the Requests object.

for # 2: try scrapyd

+2
source

The loan goes to Shane, here https://groups.google.com/forum/?fromgroups#!topic/scrapy-users/EyG_jcyLYmU

Using CloseSpider should allow you to specify restrictions of this type.

http://doc.scrapy.org/en/latest/topics/extensions.html#module-scrapy.contrib.closespider

I have not tried it, since I do not need it. It looks like you can also include as an extension (see the top of the same page) in your settings file.

0
source

All Articles