Python Scrapy cannot extract text from class
Check out this html code:
<header class="online"> <img src="http://static.flv.com/themes/h5/img/iconos/online.png"> <span>online</span> <img src="http://static.flv.com/themes/h5/img/iconos/ojo16.png"> 428 <p>xxfantasia</p> </header> I want to get the text inside (428, in this case). I used this:
def parse(self, response): sel = Selector(response) cams = sel.css('header.online') for cam in cams: print cam.css('text').extract() I think I used the correct css selector, but I got an empty result.
Any help?
+6
1 answer
CSS selectors usually do not have syntax for extracting text content .
But Scrapy extends the CSS selector with the pseudo-element ::text , so you want to use cam.css('::text').extract() , which should give you the same thing as cam.xpath('.//text()').extract()
Note. Scrapy also adds a functional pseudo-element ::attr(attribute_name) to retrieve the attribute value (which is also not possible with standard CSS selectors)
+20