Defining a redirected URL in Python

I made a small parser using HTMLparser and I would like to know where the link is redirected. I do not know how to explain this, so please see this example:

On my page I have a link to the source: http://www.myweb.com?out=147 , which redirects to http://www.mylink.com . I can easily parse http://www.myweb.com?out=147 , but I donโ€™t know how to get http://www.mylink.com .

+6
redirect python parsing
source share
2 answers

You can use urllib2 ( urllib.request in Python 3) and the HTTPRedirectHandler to find out where the URL will be redirected. Here is the function that does this:

 import urllib2 def get_redirected_url(url): opener = urllib2.build_opener(urllib2.HTTPRedirectHandler) request = opener.open(url) return request.url print get_redirected_url("http://google.com/") # prints "http://www.google.com/" 
+11
source share

You cannot get the redirect URL by parsing the HTML source. Redirects are initiated by the server, NOT by the client. You need to make an HTTP request to the corresponding URL and check the serverโ€™s HTTP response - in particular, for the status code HTTP 304 (Redirect) and the new URL.

+3
source share

All Articles