Script to get HTTP status code of URL list?

I have a list of URLs that I need to check to see if they still work or not. I would like to write a bash script that does this for me.

I only need the returned HTTP status code, i.e. 200, 404, 500 and so on. Nothing more.

EDIT Please note that there is a problem if the page says "404 not found" but returns a 200 OK message. This is a misconfigured web server, but you may need to consider this case.

For more information, see the section " Verifying the URL has been redirected to a page containing the text" 404 "."

+82
May 26 '11 at 8:58 a.m.
source share
7 answers

Curl has a special option, --write-out , for this:

 $ curl -o /dev/null --silent --head --write-out '%{http_code}\n' <url> 200 
  • -o/dev/null throws normal output
  • --silent throws a progress indicator
  • --head makes an HTTP HEAD request instead of GET
  • --write-out '%{http_code}\n' prints the required status code

To wrap this in a complete Bash script:

 #!/bin/bash while read LINE; do curl -o /dev/null --silent --head --write-out "%{http_code} $LINE\n" "$LINE" done < url-list.txt 

(Attentive readers will notice that this uses one curl process for each URL, which imposes penalties for branching and TCP connection. It would be faster if several URLs were combined into one curl, but there was no place for recording a monstrous repetition of options, which curl requires for this.)

+183
May 26 '11 at 10:07
source
 wget --spider -S "http://url/to/be/checked" 2>&1 | grep "HTTP/" | awk '{print $2}' 

only displays a status code for you

+35
Feb 25 '12 at 10:40
source

Extension of the answer already provided by Phil. Adding parallelism to it does not make sense in bash if you use xargs to call.

Here is the code:

 xargs -n1 -P 10 curl -o /dev/null --silent --head --write-out '%{url_effective}: %{http_code}\n' < url.lst 

-n1: use only one value (from the list) as an argument to call curl

-P10: Save 10 curling processes at any time (e.g. 10 parallel connections)

Check the write_out parameter in the curl manual for more data that you can extract from it (times, etc.).

In case this helps someone, this is the call I'm using now:

 xargs -n1 -P 10 curl -o /dev/null --silent --head --write-out '%{url_effective};%{http_code};%{time_total};%{time_namelookup};%{time_connect};%{size_download};%{speed_download}\n' < url.lst | tee results.csv 

It simply outputs a bunch of data into a csv file that can be imported into any office tool.

+27
Mar 13 '14 at 13:20
source

This is based on the widely available wget presented almost everywhere, even on Alpine Linux.

 wget --server-response --spider --quiet "${url}" 2>&1 | awk 'NR==1{print $2}' 

The following are explanations:

--quiet

Disable Wget output.

Source - wget man pages

--spider

[...] it will not load pages, just check that they are. [...]

Source - wget man pages

--server-response

Print the headers sent by the HTTP servers and the responses sent by the FTP servers.

Source - wget man pages

What they don't say about --server-response is that the output of these headers prints with standard error (sterr) , so you need to redirect to standard input.

The output is sent to standard input, we can direct it to awk to extract the HTTP status code. This code:

  • second ( $2 ) non-empty character group: {$2}
  • in the very first line of the header: NR==1

And because we want to print this ... {print $2} .

 wget --server-response --spider --quiet "${url}" 2>&1 | awk 'NR==1{print $2}' 
+11
Nov 18 '18 at 5:25
source

Use curl to get only the HTTP header (not the whole file) and parse it:

 $ curl -I --stderr /dev/null http://www.google.co.uk/index.html | head -1 | cut -d' ' -f2 200 
+7
May 26 '11 at 9:25
source

wget -S -i *file* will provide you with headers from each URL in the file.

Filter at least grep for the status code.

+4
May 26 '11 at 9:10
source

Due to https://mywiki.wooledge.org/BashPitfalls#Non-atomic_writes_with_xargs_-P (exiting parallel jobs in xargs risks being mixed), I would use GNU Parallel instead of xargs to parallelize:

 cat url.lst | parallel -P0 -q curl -o /dev/null --silent --head --write-out '%{url_effective}: %{http_code}\n' > outfile 

In this particular case, it may be safe to use xargs because the output is very short, so the problem with using xargs rather that if someone later changes the code to do something more, it will no longer be safe. Or, if someone reads this question and thinks that he can replace curl with something else, then it can also be unsafe.

0
Sep 07 '19 at 6:36
source



All Articles