How to check a url or a link to a webpage or a link to a file in python

Suppose I have links as follows:

http://example.com/index.html http://example.com/stack.zip http://example.com/setup.exe http://example.com/news/ 

In the above links, the first and fourth links are links to a web page, and the second and third are a file link.

These are just some examples of links to iezip and .exe files, but there can be many other files.

Is there any standard way to distinguish between a link to a file or a link to a web page? Thanks in advance.

+2
python url file web hyperlink
source share
2 answers
 import urllib import mimetypes def guess_type_of(link, strict=True): link_type, _ = mimetypes.guess_type(link) if link_type is None and strict: u = urllib.urlopen(link) link_type = u.headers.gettype() # or using: u.info().gettype() return link_type 

Demo:

 links = ['http://stackoverflow.com/q/21515098/538284', # It a html page 'http://upload.wikimedia.org/wikipedia/meta/6/6d/Wikipedia_wordmark_1x.png', # It a png file 'http://commons.wikimedia.org/wiki/File:Typing_example.ogv', # It a html page 'http://upload.wikimedia.org/wikipedia/commons/e/e6/Typing_example.ogv' # It an ogv file ] for link in links: print(guess_type_of(link)) 

Output:

 text/html image/x-png text/html application/ogg 
+4
source share
 import urllib mytest = urllib.urlopen('http://www.sec.gov') mytest.headers.items() ('content-length', '20833'), ('expires', 'Sun, 02 Feb 2014 19:36:12 GMT'), ('server', 'SEC'), ('connection', 'close'), ('cache-control', 'max-age=0'), ('date', 'Sun, 02 Feb 2014 19:36:12 GMT'), ('content-type', 'text/html')] 

mytest.headers.items () is a list of tuples, in my example you can see that the last element in the list describes the contents

I'm not sure if the length varies, so you can iterate over it to find one that has a 'content-type' in it.

+1
source share

All Articles