How to select the next node using scrapy

Question

How to select the next node using scrapy

My html looks like this:

<h1>Text 1</h1> <div>Some info</div> <h1>Text 2</h1> <div>...</div>

I understand how to extract information from h1 using scrapy:

 content.select("//h1[contains(text(),'Text 1')]/text()").extract()

But my goal is to extract content from <div>Some info</div>

My problem is that I do not have any specific information about the div. All I know is that it happens after <h1>Text 1</h1> . Can I use a selector to get a NEXT element in a tree? An element sibling in the DOM tree?

Something like:

 a = content.select("//h1[contains(text(),'Text 1')]/text()") a.next("//div/text()").extract() Some info

+7

python dom html parsing scrapy

Skyfox Nov 04 '13 at 12:12

source share

1 answer

kev · Accepted Answer · 2013-11-04T13:09:36+0000

Try xpath :

 //h1[contains(text(), 'Text 1')]/following-sibling::div[1]/text()

How to select the next node using scrapy

More articles: