Can someone help me parse the html file to get links for all the images in the file in python?
Preferably with a third-party module ...
Thanks!
You can use Beautiful Soup . I know what you said without a third-party module. However, it is an ideal tool for parsing HTML.
import urllib2 from BeautifulSoup import BeautifulSoup page = BeautifulSoup(urllib2.urlopen("http://www.url.com")) page.findAll('img')
using PSL only
from html.parser import HTMLParser class MyParse(HTMLParser): def handle_starttag(self, tag, attrs): if tag=="img": print(dict(attrs)["src"]) h=MyParse() page=open("index.html").read() h.feed(page)
In general, it is generally accepted that lxml is faster than Beautiful Soup (ref) . His tutorial can be found here: (link) You can also take a look at https://stackoverflow.com/a/167444/2/ .