Python - getting all images from html file

Can someone help me parse the html file to get links for all the images in the file in python?

Preferably with a third-party module ...

Thanks!

+7
python image urllib
source share
3 answers

You can use Beautiful Soup . I know what you said without a third-party module. However, it is an ideal tool for parsing HTML.

import urllib2 from BeautifulSoup import BeautifulSoup page = BeautifulSoup(urllib2.urlopen("http://www.url.com")) page.findAll('img') 
+9
source share

using PSL only

 from html.parser import HTMLParser class MyParse(HTMLParser): def handle_starttag(self, tag, attrs): if tag=="img": print(dict(attrs)["src"]) h=MyParse() page=open("index.html").read() h.feed(page) 
+10
source share

In general, it is generally accepted that lxml is faster than Beautiful Soup (ref) . His tutorial can be found here: (link) You can also take a look at https://stackoverflow.com/a/167444/2/ .

+2
source share

All Articles