My html looks like this:
<h1>Text 1</h1> <div>Some info</div> <h1>Text 2</h1> <div>...</div>
I understand how to extract information from h1 using scrapy:
content.select("//h1[contains(text(),'Text 1')]/text()").extract()
But my goal is to extract content from <div>Some info</div>
My problem is that I do not have any specific information about the div. All I know is that it happens after <h1>Text 1</h1> . Can I use a selector to get a NEXT element in a tree? An element sibling in the DOM tree?
Something like:
a = content.select("//h1[contains(text(),'Text 1')]/text()") a.next("//div/text()").extract() Some info
python dom html parsing scrapy
Skyfox
source share