We used scrapy-splash to transfer the cleaned HTML source code through the Splash javascript engine, which runs inside the docker container.
If we want to use Splash in a spider, we configure several required project parameters and issue a Request specifying specific meta arguments :
yield Request(url, self.parse_result, meta={ 'splash': { 'args': { # set rendering arguments here 'html': 1, 'png': 1, # 'url' is prefilled from request url }, # optional parameters 'endpoint': 'render.json', # optional; default is render.json 'splash_url': '<url>', # overrides SPLASH_URL 'slot_policy': scrapyjs.SlotPolicy.PER_DOMAIN, } })
This works as documented. But how can we use scrapy-splash inside the Scrapy Shell ?
python web-scraping scrapy scrapy-shell scrapy-splash
alecxe
source share