You cannot just click the link inside scrapy (see. Click the button in Scrapy ).
First of all, check if the data you need is needed in html (this is in the background - the way it is).
Another option is selenium :
from selenium import webdriver import time browser = webdriver.Firefox() browser.get("http://www.prudential.com.hk/PruServlet?module=fund&purpose=searchHistFund&fundCd=MMFU_U") elem = browser.find_element_by_xpath('//*[@id="disclaimer"]/div/div') elem.click() time.sleep(0.2) elem = browser.find_element_by_xpath("//*") print elem.get_attribute("outerHTML")
Another option is to use mechanize . It cannot execute js code, but according to the source code, AgreeClick just sets the ListFundShowDisclaimer cookie to true . This is the starting point (not sure if it works):
import cookielib import mechanize br = mechanize.Browser() cj = cookielib.CookieJar() ck = cookielib.Cookie(version=0, name='ListFundShowDisclaimer', value='true', port=None, port_specified=False, domain='www.prudential.com.hk', domain_specified=False, domain_initial_dot=False, path='/', path_specified=True, secure=False, expires=None, discard=True, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False) cj.set_cookie(ck) br.set_cookiejar(cj) br.open("http://www.prudential.com.hk/PruServlet?module=fund&purpose=searchHistFund&fundCd=MMFU_U") print br.response().read()
You can then analyze the result with BeautifulSoup or whatever you prefer.
alecxe
source share