How to pass custom argument to scrapy spider

Question

How to pass custom argument to scrapy spider

I am trying to pass a user-defined argument to a scrapy spider. Can anyone suggest how to do this?

I read about the -a option somewhere, but have no idea how to use it.

+82

python web-crawler scrapy

L Lawliet Mar 25 '13 at 9:35

source share

4 answers

The previous answers were correct, but you do not need to declare a constructor ( __init__ ) every time you want to encode a spider spider, you can simply specify the parameters as before:

 scrapy crawl myspider -a parameter1=value1 -a parameter2=value2

and in your spider code you can just use them as spider arguments:

 class MySpider(Spider): name = 'myspider' ... def parse(self, response): ... if self.parameter1 == value1: # this is True # or also if getattr(self, parameter2) == value2: # this is also True

And it just works.

+15

eLRuLL Dec 13 '16 at

source share

To pass arguments using a scan command

myspider scan scrapy -a category = 'mycategory' -a domain = 'example.com'

To pass arguments to run on scrapyd, replace -a with -d

curl http://your.ip.address.here:port/schedule.json -d spider = myspider -d category = 'mycategory' -d domain = 'example.com'

The spider will receive arguments in its constructor.

 class MySpider(Spider): name="myspider" def __init__(self,category='',domain='', *args,**kwargs): super(MySpider, self).__init__(*args, **kwargs) self.category = category self.domain = domain

Scrapy places all arguments as attributes of the spider, and you can completely skip the init method. Beware of using the getattr method to get these attributes so that your code doesn't break .

 class MySpider(Spider): name="myspider" start_urls = ('https://httpbin.org/ip',) def parse(self,response): print getattr(self,'category','') print getattr(self,'domain','')

+7

Hassan Raza Sep 08 '15 at 11:33

source share

Spider arguments are passed during the crawl command using the -a option. For example, if I want to pass the domain name as an argument to my spider, then I will do this -

scrapy crawl myspider -a domain = "http://www.example.com"

And get the arguments in the spider constructors:

 class MySpider(BaseSpider): name = 'myspider' def __init__(self, domain='', *args, **kwargs): super(MySpider, self).__init__(*args, **kwargs) self.start_urls = [domain] #

...

it will work :)

+6

SrmHitter9062 Sep 12 '15 at 16:48

source share

Steven Almeroth · Accepted Answer · 2013-03-25 15:21

Spider arguments are passed in the crawl command using the -a option. For example:

 scrapy crawl myspider -a category=electronics -a domain=system

Spiders can refer to arguments as attributes:

 class MySpider(scrapy.Spider): name = 'myspider' def __init__(self, category='', **kwargs): self.start_urls = [f'http://www.example.com/{category}'] # py36 super().__init__(**kwargs) # python3 def parse(self, response) self.log(self.domain) # system

Adapted from the Scrapy document: http://doc.scrapy.org/en/latest/topics/spiders.html#spider-arguments

Update 2013 : add second argument

2015 update : adjust wording

Update 2016 : use the new base class and add super, thanks @Birla

2017 Update : Use Python3 Super

 # previously super(MySpider, self).__init__(**kwargs) # python2

Update 2018 : as @eLRuLL points out , spiders can refer to arguments as attributes

How to pass custom argument to scrapy spider

More articles: