How to get the true URL of a file on the Internet. (Python)

I notice that sometimes audio files on the Internet have a “fake” URL.

http://garagaeband.com/3252243 

And this will be 302 for the real URL:

 http://garageband.com/michael_jackson4.mp3 

My question is ... when it comes with a fake URL, how can you get the REAL URL from the headers ?

This is currently my code for reading file headers. I do not know if this code will receive what I want to execute. How to parse the "real" url from response headers?

 import httplib conn = httplib.HTTPConnection(head) conn.request("HEAD",tail) res = conn.getresponse() 

This is a 302 redirect: http://www.garageband.com/mp3cat/.UZCMYiqF7Kum/01_No_pierdas_la_fuente_del_gozo.mp3

+6
python linux unix
source share
4 answers

Use urllib.getUrl ()

edit Sorry, I haven't done this for a while:

 import urllib urllib.urlopen(url).geturl() 

For example:

 >>> f = urllib2.urlopen("http://tinyurl.com/oex2e") >>> f.geturl() 'http://www.amazon.com/All-Creatures-Great-Small-Collection/dp/B00006G8FI' >>> 
+8
source share

Mark Pilgrim advises using httplib2 in " Dive Into Python3 " because it handles many things (including redirects) in a more reasonable way.

 >>> import httplib2 >>> h = httplib2.Http() >>> response, content = h.request("http://garagaeband.com/3252243") >>> response["content-location"] "http://garageband.com/michael_jackson4.mp3" 
+2
source share

You should read the answer, understand that you have 302 (FOUND), and parse the real URL from the response headers, then select the resource using the new URI.

0
source share

I decided the answer.

  import urllib2 req = urllib2.Request('http://' + theurl) opener = urllib2.build_opener() f = opener.open(req) print 'the real url is......' + f .url 
0
source share

All Articles