Using wget to register shell script redirected URLs

I tried to find a way to use wget to register a list of redirected website URLs in a single file. For instance:

www.website.com/1234 now redirected to www.newsite.com/a2as4sdf6nonsense

and

www.website.com/1235 now redirected to www.newsite.com/ab6haq7ah8nonsense

Wget displays a redirect, but does not register a new location. I get this in terminal:

 HTTP request sent, awaiting response...301 moved permanently Location: http.www.newsite.com/a2as4sdf6 

...

I would just write this new url to a file.

I used something like this:

  for i in `seq 1 9999`; do wget http://www.website.com/$i -O output.txt done 

But this outputs the source code of each web page to this file. I'm trying to just get redirect information only. In addition, I would like to add a new line to the same output file every time it receives a new URL.

I would like the output to look something like:

  www.website.com/1234 www.newsite.com/a2as4sdf6nonsense www.website.com/1235 www.newsite.com/ab6haq7ah8nonsense 

...

+4
source share
1 answer

This is not an ideal solution, but it works:

 wget http://tinyurl.com/2tx --server-response -O /dev/null 2>&1 |\ awk '(NR==1){SRC=$3;} /^ Location: /{DEST=$2} END{ print SRC, DEST}' 

wget not an ideal tool for this. curl will be better.

Here's how it works: we get the url, but redirect all the output (page content) to / dev / null. We request the HTTP response headers of the server (to get the Loaction header), then we pass it awk. Please note that there may be several redirects. I suggested that you need the latter. Awk gets the URL you requested from the first line (NR == 1) and the destination URL from each location header. In the end, we print both SRC and DESC as you wish.

+2
source

All Articles