How to check a url or a link to a webpage or a link to a file in python

Question

How to check a url or a link to a webpage or a link to a file in python

Suppose I have links as follows:

http://example.com/index.html http://example.com/stack.zip http://example.com/setup.exe http://example.com/news/

In the above links, the first and fourth links are links to a web page, and the second and third are a file link.

These are just some examples of links to iezip and .exe files, but there can be many other files.

Is there any standard way to distinguish between a link to a file or a link to a web page? Thanks in advance.

+2

python url file web hyperlink

Bishwash Feb 02 '14 at 19:26

source share

2 answers

 import urllib mytest = urllib.urlopen('http://www.sec.gov') mytest.headers.items() ('content-length', '20833'), ('expires', 'Sun, 02 Feb 2014 19:36:12 GMT'), ('server', 'SEC'), ('connection', 'close'), ('cache-control', 'max-age=0'), ('date', 'Sun, 02 Feb 2014 19:36:12 GMT'), ('content-type', 'text/html')]

mytest.headers.items () is a list of tuples, in my example you can see that the last element in the list describes the contents

I'm not sure if the length varies, so you can iterate over it to find one that has a 'content-type' in it.

+1

Pynewbie Feb 02 '14 at 19:38

source share

Omid raha · Accepted Answer · 2014-02-02T20:31:16+0000

 import urllib import mimetypes def guess_type_of(link, strict=True): link_type, _ = mimetypes.guess_type(link) if link_type is None and strict: u = urllib.urlopen(link) link_type = u.headers.gettype() # or using: u.info().gettype() return link_type

Demo:

 links = ['http://stackoverflow.com/q/21515098/538284', # It a html page 'http://upload.wikimedia.org/wikipedia/meta/6/6d/Wikipedia_wordmark_1x.png', # It a png file 'http://commons.wikimedia.org/wiki/File:Typing_example.ogv', # It a html page 'http://upload.wikimedia.org/wikipedia/commons/e/e6/Typing_example.ogv' # It an ogv file ] for link in links: print(guess_type_of(link))

Output:

 text/html image/x-png text/html application/ogg

How to check a url or a link to a webpage or a link to a file in python

More articles: