Best way to load all artifacts listed in HTML5 cache.manifest file?

Question

Best way to load all artifacts listed in HTML5 cache.manifest file?

I'm trying to see how the HTML5 application works, and any attempts to save the page inside the webkit browsers (chrome, Safari) contain some, but not all cache.manifest resources. Is there a library or set of code that will analyze the cache.manifest file and load all resources (images, scripts, css)?

(source code moved to respond ... noob error>. <)

+7

python html5 parsing

rockhowse 12 sept '11 at 10:40

source share

1 answer

rockhowse · Accepted Answer · 2011-12-21T20:17:57+0000

I originally posted this as part of the question ... (without a beginner, the postoverflow poster EVER does this;)

because there were a huge number of answers. Here you are:

I managed to come up with the following python script, but any input would be appreciated =) (This is my first hit in python code, so there might be a better way)

import os import urllib2 import urllib cmServerURL = 'http://<serverURL>:<port>/<path-to-cache.manifest>' # download file code taken from stackoverflow # http://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python def loadURL(url, dirToSave): file_name = url.split('/')[-1] u = urllib2.urlopen(url) f = open(dirToSave, 'wb') meta = u.info() file_size = int(meta.getheaders("Content-Length")[0]) print "Downloading: %s Bytes: %s" % (file_name, file_size) file_size_dl = 0 block_sz = 8192 while True: buffer = u.read(block_sz) if not buffer: break file_size_dl += len(buffer) f.write(buffer) status = r"%10d [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size) status = status + chr(8)*(len(status)+1) print status, f.close() # download the cache.manifest file # since this request doesn't include the Conent-Length header we will use a different api =P urllib.urlretrieve (cmServerURL+ 'cache.manifest', './cache.manifest') # open the cache.manifest and go through line-by-line checking for the existance of files f = open('cache.manifest', 'r') for line in f: filepath = line.split('/') if len(filepath) > 1: fileName = line.strip() # if the file doesn't exist, lets download it if not os.path.exists(fileName): print 'NOT FOUND: ' + line dirName = os.path.dirname(fileName) print 'checking dirctory: ' + dirName if not os.path.exists(dirName): os.makedirs(dirName) else: print 'directory exists' print 'downloading file: ' + cmServerURL + line, loadURL (cmServerURL+fileName, fileName)

Best way to load all artifacts listed in HTML5 cache.manifest file?

More articles: