Data scrap via xpath from div that contains javascript in scrapy python

Question

Data scrap via xpath from div that contains javascript in scrapy python

I am working on scrapy, I clean the site and use xpath to clean items. But some of the div contain javascript , so when I used xpath, until the div id that contains the javascript code does not return an empty list, and without including this div element (which contains javascript) it is able to extract HTML data

HTML code

 <div class="subContent2"> <div id="contentDetails"> <div class="eventDetails"> <h2> <a href="javascript:;" onclick="jdevents.getEvent(117032)">Some data</a> </h2> </div> </div> </div>

Spider code

 class ExampleSpider(BaseSpider): name = "example" domain_name = "www.example.com" start_urls = ["http://www.example.com/jkl/index.php"] def parse(self, response): hxs = HtmlXPathSelector(response) required_data = hxs.select('//div[@class="subContent2"]/div[@id="contentDetails"]/div[@class="eventDetails"]')

So, how can I get text(Some data) from the anchor tag inside the h2 element , as mentioned above, is there an alternative way to get data from elements containing javascript in scrapy

+2

javascript python xpath scrapy

shiva krishna Jun 12 '12 at 12:08

source share

1 answer

warvariuc · Answer 1 · 2012-06-12T13:55:51+0000

 <div class="subContent2"> <div id="contentDetails"> <div class="eventDetails"> <h2> <a href="javascript:;" onclick="jdevents.getEvent(117032)">Some data</a> </h2> </div> </div> </div>

The problem is not the javascript code in this case to get the string “Some data”.

You need to either get a subnode:

 required_data = hxs.select('//div[@class="subContent2"]/div[@id="contentDetails"]/div[@class="eventDetails"]/h2/a/text()')

enter image description here

or use the string function:

 required_data = hxs.select('string(//div[@class="subContent2"]/div[@id="contentDetails"]/div[@class="eventDetails"])')

Data scrap via xpath from div that contains javascript in scrapy python

More articles: