def parse(self, response): for sel in response.xpath('//tbody/tr'): item = HeroItem() item['hclass'] = response.request.url.split("/")[8].split('-')[-1] item['server'] = response.request.url.split('/')[2].split('.')[0] item['hardcore'] = len(response.request.url.split("/")[8].split('-')) == 3 item['seasonal'] = response.request.url.split("/")[6] == 'season' item['rank'] = sel.xpath('td[@class="cell-Rank"]/text()').extract()[0].strip() item['battle_tag'] = sel.xpath('td[@class="cell-BattleTag"]//a/text()').extract()[1].strip() item['grift'] = sel.xpath('td[@class="cell-RiftLevel"]/text()').extract()[0].strip() item['time'] = sel.xpath('td[@class="cell-RiftTime"]/text()').extract()[0].strip() item['date'] = sel.xpath('td[@class="cell-RiftTime"]/text()').extract()[0].strip() url = 'https://' + item['server'] + '.battle.net/' + sel.xpath('td[@class="cell-BattleTag"]//a/@href').extract()[0].strip() yield Request(url, callback=self.parse_profile) def parse_profile(self, response): sel = Selector(response) item = HeroItem() item['weapon'] = sel.xpath('//li[@class="slot-mainHand"]/a[@class="slot-link"]/@href').extract()[0].split('/')[4] return item
Well, I clear the whole table in the main parsing method, and I took a few fields from this table. One of these fields is a URL, and I want to examine it to get a whole new group of fields. How do I pass my already created ITEM object to a callback function so that the last element retains all fields?
As shown in the above code, I can save the fields inside the url (code at the moment) or only those that are listed in the table (just write yield item ) but I canβt give only one object with all the fields together.
I tried this, but obviously this does not work.
yield Request(url, callback=self.parse_profile(item)) def parse_profile(self, response, item): sel = Selector(response) item['weapon'] = sel.xpath('//li[@class="slot-mainHand"]/a[@class="slot-link"]/@href').extract()[0].split('/')[4] return item
python callback arguments scrapy
vic
source share