How to select the next node using scrapy

My html looks like this:

<h1>Text 1</h1> <div>Some info</div> <h1>Text 2</h1> <div>...</div> 

I understand how to extract information from h1 using scrapy:

 content.select("//h1[contains(text(),'Text 1')]/text()").extract() 

But my goal is to extract content from <div>Some info</div>

My problem is that I do not have any specific information about the div. All I know is that it happens after <h1>Text 1</h1> . Can I use a selector to get a NEXT element in a tree? An element sibling in the DOM tree?

Something like:

 a = content.select("//h1[contains(text(),'Text 1')]/text()") a.next("//div/text()").extract() Some info 
+7
python dom html parsing scrapy
source share
1 answer

Try xpath :

 //h1[contains(text(), 'Text 1')]/following-sibling::div[1]/text() 
+13
source share

All Articles