How to get the true URL of a file on the Internet. (Python)

Question

How to get the true URL of a file on the Internet. (Python)

I notice that sometimes audio files on the Internet have a “fake” URL.

http://garagaeband.com/3252243

And this will be 302 for the real URL:

 http://garageband.com/michael_jackson4.mp3

My question is ... when it comes with a fake URL, how can you get the REAL URL from the headers ?

This is currently my code for reading file headers. I do not know if this code will receive what I want to execute. How to parse the "real" url from response headers?

 import httplib conn = httplib.HTTPConnection(head) conn.request("HEAD",tail) res = conn.getresponse()

This is a 302 redirect: http://www.garageband.com/mp3cat/.UZCMYiqF7Kum/01_No_pierdas_la_fuente_del_gozo.mp3

+6

python http linux unix http-headers

TIMEX Nov 17 '09 at 10:26

source share

4 answers

Mark Pilgrim advises using httplib2 in " Dive Into Python3 " because it handles many things (including redirects) in a more reasonable way.

 >>> import httplib2 >>> h = httplib2.Http() >>> response, content = h.request("http://garagaeband.com/3252243") >>> response["content-location"] "http://garageband.com/michael_jackson4.mp3"

+2

tosh Nov 17 '09 at 22:45

source share

You should read the answer, understand that you have 302 (FOUND), and parse the real URL from the response headers, then select the resource using the new URI.

0

Jim garrison Nov 17 '09 at 10:31

source share

I decided the answer.

  import urllib2 req = urllib2.Request('http://' + theurl) opener = urllib2.build_opener() f = opener.open(req) print 'the real url is......' + f .url

0

TIMEX Nov 17 '09 at 23:16

source share

Chris lacasse · Accepted Answer · 2009-11-17T22:35:47+0000

Use urllib.getUrl ()

edit Sorry, I haven't done this for a while:

 import urllib urllib.urlopen(url).geturl()

For example:

 >>> f = urllib2.urlopen("http://tinyurl.com/oex2e") >>> f.geturl() 'http://www.amazon.com/All-Creatures-Great-Small-Collection/dp/B00006G8FI' >>>

How to get the true URL of a file on the Internet. (Python)

More articles: