Download file with mechanized

Question

Download file with mechanized

I have a browser instance that opened the page. I would like to download and save all the links (these are PDF files). Does anyone know how to do this?

thanks

+4

python mechanize

Dave Oct 3 '11 at 21:17

source share

2 answers

import urllib, urllib2,cookielib, re #http://www.crummy.com/software/BeautifulSoup/ - required from BeautifulSoup import BeautifulSoup HOST = 'https://www.adobe.com/' cj = cookielib.CookieJar() opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) req = opener.open( HOST + 'pdf' ) responce = req.read() soup = BeautifulSoup( responce ) pdfs = soup.findAll(name = 'a', attrs = { 'href': re.compile('\.pdf') }) for pdf in pdfs: if 'https://' not in pdf['href']: url = HOST + pdf['href'] else: url = pdf['href'] try: #http://docs.python.org/library/urllib.html#urllib.urlretrieve urllib.urlretrieve(url) except Exception, e: print 'cannot obtain url %s' % ( url, ) print 'from href %s' % ( pdf['href'], ) print e else: print 'downloaded file' print url

+3

cetver Oct 3 '11 at 21:42

source share

David · Accepted Answer · 2011-10-03T21:34:43+0000

There may not be the answer you are looking for, but I used the lxml and query libraries to automatically bind the binding:

Relevant lxml examples http://lxml.de/lxmlhtml.html#examples (replace urllib with queries)

And the request library home page is http://docs.python-requests.org/en/latest/index.html

It is not as compact as mechanization, but offers more control.

Download file with mechanized

More articles: