How to pass custom argument to scrapy spider

I am trying to pass a user-defined argument to a scrapy spider. Can anyone suggest how to do this?

I read about the -a option somewhere, but have no idea how to use it.

+82
python web-crawler scrapy
Mar 25 '13 at 9:35
source share
4 answers

Spider arguments are passed in the crawl command using the -a option. For example:

 scrapy crawl myspider -a category=electronics -a domain=system 

Spiders can refer to arguments as attributes:

 class MySpider(scrapy.Spider): name = 'myspider' def __init__(self, category='', **kwargs): self.start_urls = [f'http://www.example.com/{category}'] # py36 super().__init__(**kwargs) # python3 def parse(self, response) self.log(self.domain) # system 

Adapted from the Scrapy document: http://doc.scrapy.org/en/latest/topics/spiders.html#spider-arguments

Update 2013 : add second argument

2015 update : adjust wording

Update 2016 : use the new base class and add super, thanks @Birla

2017 Update : Use Python3 Super

 # previously super(MySpider, self).__init__(**kwargs) # python2 

Update 2018 : as @eLRuLL points out , spiders can refer to arguments as attributes

+143
Mar 25 '13 at 15:21
source share

The previous answers were correct, but you do not need to declare a constructor ( __init__ ) every time you want to encode a spider spider, you can simply specify the parameters as before:

 scrapy crawl myspider -a parameter1=value1 -a parameter2=value2 

and in your spider code you can just use them as spider arguments:

 class MySpider(Spider): name = 'myspider' ... def parse(self, response): ... if self.parameter1 == value1: # this is True # or also if getattr(self, parameter2) == value2: # this is also True 

And it just works.

+15
Dec 13 '16 at
source share

To pass arguments using a scan command

myspider scan scrapy -a category = 'mycategory' -a domain = 'example.com'

To pass arguments to run on scrapyd, replace -a with -d

curl http://your.ip.address.here:port/schedule.json -d spider = myspider -d category = 'mycategory' -d domain = 'example.com'

The spider will receive arguments in its constructor.

 class MySpider(Spider): name="myspider" def __init__(self,category='',domain='', *args,**kwargs): super(MySpider, self).__init__(*args, **kwargs) self.category = category self.domain = domain 

Scrapy places all arguments as attributes of the spider, and you can completely skip the init method. Beware of using the getattr method to get these attributes so that your code doesn't break .

 class MySpider(Spider): name="myspider" start_urls = ('https://httpbin.org/ip',) def parse(self,response): print getattr(self,'category','') print getattr(self,'domain','') 
+7
Sep 08 '15 at 11:33
source share

Spider arguments are passed during the crawl command using the -a option. For example, if I want to pass the domain name as an argument to my spider, then I will do this -

scrapy crawl myspider -a domain = "http://www.example.com"

And get the arguments in the spider constructors:

 class MySpider(BaseSpider): name = 'myspider' def __init__(self, domain='', *args, **kwargs): super(MySpider, self).__init__(*args, **kwargs) self.start_urls = [domain] # 

...

it will work :)

+6
Sep 12 '15 at 16:48
source share



All Articles