Headless selenium launch with several spiders

I have many spider spiders that work in parallel with scrapyd. I am doing something like the following code.

My question is: do I really need to start showing for each spider and how does the driver know to start using which display? Should I just run one display all over the world and run multiple instances of webdriver on one display?

def __init__(self): dispatcher.connect(self.spider_closed, signals.spider_closed) def spider_closed(self, spider): if self.driver: self.driver.quit() if self.display: self.display.stop() def parse(self, response): self.display = Display(visible=0, size=(1024, 768)) self.display.start() self.driver = webdriver.Firefox() self.driver.get(response.url) page = Selector(text=self.driver.page_source) # doing all parsing etc 
+7
python selenium scrapy
source share
1 answer

I suggest using a splinter browser handler; it's a wrapper around selenium. It exactly solves your problem since the display processing is done by the batch.

With a few more package installations, you can also completely remove the Display, which means that the shard is now headless (the browser window does not open, and it is much faster). Check out Splinter docs for how to do this without a head. I personally suggest the PhantomJS driver, although you will have to install a PhantomJS program other than Python.

+3
source share

All Articles