I have a browser instance that opened the page. I would like to download and save all the links (these are PDF files). Does anyone know how to do this?
thanks
There may not be the answer you are looking for, but I used the lxml and query libraries to automatically bind the binding:
Relevant lxml examples http://lxml.de/lxmlhtml.html#examples (replace urllib with queries)
And the request library home page is http://docs.python-requests.org/en/latest/index.html
It is not as compact as mechanization, but offers more control.
import urllib, urllib2,cookielib, re #http://www.crummy.com/software/BeautifulSoup/ - required from BeautifulSoup import BeautifulSoup HOST = 'https://www.adobe.com/' cj = cookielib.CookieJar() opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) req = opener.open( HOST + 'pdf' ) responce = req.read() soup = BeautifulSoup( responce ) pdfs = soup.findAll(name = 'a', attrs = { 'href': re.compile('\.pdf') }) for pdf in pdfs: if 'https://' not in pdf['href']: url = HOST + pdf['href'] else: url = pdf['href'] try: #http://docs.python.org/library/urllib.html#urllib.urlretrieve urllib.urlretrieve(url) except Exception, e: print 'cannot obtain url %s' % ( url, ) print 'from href %s' % ( pdf['href'], ) print e else: print 'downloaded file' print url