Python Scrapy cannot extract text from class

Question

Python Scrapy cannot extract text from class

Check out this html code:

<header class="online"> <img src="http://static.flv.com/themes/h5/img/iconos/online.png"> <span>online</span> <img src="http://static.flv.com/themes/h5/img/iconos/ojo16.png"> 428 <p>xxfantasia</p> </header>

I want to get the text inside (428, in this case). I used this:

  def parse(self, response): sel = Selector(response) cams = sel.css('header.online') for cam in cams: print cam.css('text').extract()

I think I used the correct css selector, but I got an empty result.

Any help?

+6

python css python-2.7 css-selectors scrapy

buly Feb 05 '14 at 11:09

source share

1 answer

paul trmbrth · Accepted Answer · 2014-02-05T11:30:55+0000

CSS selectors usually do not have syntax for extracting text content .

But Scrapy extends the CSS selector with the pseudo-element ::text , so you want to use cam.css('::text').extract() , which should give you the same thing as cam.xpath('.//text()').extract()

Note. Scrapy also adds a functional pseudo-element ::attr(attribute_name) to retrieve the attribute value (which is also not possible with standard CSS selectors)

Python Scrapy cannot extract text from class

More articles: