Newbie: how to overcome Javascript onclick button to clear a web page?

Question

Newbie: how to overcome Javascript onclick button to clear a web page?

This is the link I want to clear: http://www.prudential.com.hk/PruServlet?module=fund&purpose=searchHistFund&fundCd=MMFU_U

The English version tab is located in the upper right corner to display the English version of the web page.

There is a button that I have to click in order to read the funds information on the web page. If not, the view is blocked, and using the scrapy shell always returns empty [].

<div onclick="AgreeClick()" style="width:200px; padding:8px; border:1px black solid; background-color:#cccccc; cursor:pointer;">Confirmed</div>

And the function of the Click Agreement:

 function AgreeClick() { var cookieKey = "ListFundShowDisclaimer"; SetCookie(cookieKey, "true", null); Get("disclaimerDiv").style.display = "none"; Get("blankDiv").style.display = "none"; Get("screenDiv").style.display = "none"; //Get("contentTable").style.display = "block"; ShowDropDown();

How to overcome this onclick = "AgreeClick ()" function to clear a web page?

+7

python web-scraping scrapy

Terence ng May 07 '13 at 14:04

source share

2 answers

Use the spynner library for Python to emulate the browser and execute client-side javascript.

 import spynner browser = spynner.Browser() url = "http://www.prudential.com/path/?args=values" browser.load(url) browser.runjs("AgreeClick();") markup = browser._get_html()

As you can see, you can programmatically use any Javascript function available in the page source.

If you also need to analyze the results, I highly recommend BeautifulSoup .

+4

pztrick May 07, '13 at 14:15

source share

alecxe · Accepted Answer · 2013-05-07T18:59:38+0000

You cannot just click the link inside scrapy (see. Click the button in Scrapy ).

First of all, check if the data you need is needed in html (this is in the background - the way it is).

Another option is selenium :

 from selenium import webdriver import time browser = webdriver.Firefox() browser.get("http://www.prudential.com.hk/PruServlet?module=fund&purpose=searchHistFund&fundCd=MMFU_U") elem = browser.find_element_by_xpath('//*[@id="disclaimer"]/div/div') elem.click() time.sleep(0.2) elem = browser.find_element_by_xpath("//*") print elem.get_attribute("outerHTML")

Another option is to use mechanize . It cannot execute js code, but according to the source code, AgreeClick just sets the ListFundShowDisclaimer cookie to true . This is the starting point (not sure if it works):

 import cookielib import mechanize br = mechanize.Browser() cj = cookielib.CookieJar() ck = cookielib.Cookie(version=0, name='ListFundShowDisclaimer', value='true', port=None, port_specified=False, domain='www.prudential.com.hk', domain_specified=False, domain_initial_dot=False, path='/', path_specified=True, secure=False, expires=None, discard=True, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False) cj.set_cookie(ck) br.set_cookiejar(cj) br.open("http://www.prudential.com.hk/PruServlet?module=fund&purpose=searchHistFund&fundCd=MMFU_U") print br.response().read()

You can then analyze the result with BeautifulSoup or whatever you prefer.

Newbie: how to overcome Javascript onclick button to clear a web page?

More articles: