I am working on scrapy, I clean the site and use xpath to clean items. But some of the div contain javascript , so when I used xpath, until the div id that contains the javascript code does not return an empty list, and without including this div element (which contains javascript) it is able to extract HTML data
HTML code
<div class="subContent2"> <div id="contentDetails"> <div class="eventDetails"> <h2> <a href="javascript:;" onclick="jdevents.getEvent(117032)">Some data</a> </h2> </div> </div> </div>
Spider code
class ExampleSpider(BaseSpider): name = "example" domain_name = "www.example.com" start_urls = ["http://www.example.com/jkl/index.php"] def parse(self, response): hxs = HtmlXPathSelector(response) required_data = hxs.select('//div[@class="subContent2"]/div[@id="contentDetails"]/div[@class="eventDetails"]')
So, how can I get text(Some data) from the anchor tag inside the h2 element , as mentioned above, is there an alternative way to get data from elements containing javascript in scrapy
source share