I am trying to find all broken links on a webpage using Java. Here is the code:
private static boolean isLive(String link){ HttpURLConnection urlconn = null; int res = -1; String msg = null; try{ URL url = new URL(link); urlconn = (HttpURLConnection)url.openConnection(); urlconn.setConnectTimeout(10000); urlconn.setRequestMethod("GET"); urlconn.connect(); String redirlink = urlconn.getHeaderField("Location"); System.out.println(urlconn.getHeaderFields()); if(redirlink != null && !url.toExternalForm().equals(redirlink)) return isLive(redirlink); else return urlconn.getResponseCode()==HttpURLConnection.HTTP_OK; }catch(Exception e){ System.out.println(e.getMessage()); return false; }finally{ if(urlconn != null) urlconn.disconnect(); } } public static void main(String[] s){ String link = "http://www.somefakesite.net"; System.out.println(isLive(link)); }
The code is listed at http://nscraps.com/Java/146-program-code-broken-link-checker.htm .
This code provides HTTP status 200 for all web pages, including broken ones. For example, http://www.somefakesite.net/ contains the following header fields:
{null = [HTTP / 1.1 200 OK], Date = [Sun, May 15, 2011 18:51:29 GMT], Transfer-Encoding = [chunked], Keep-Alive = [timeout = 4, max = 100], Connection = [Keep-Alive], Content-Type = [text / html], Server = [Apache / 2.2.15 (Win32) PHP / 5.2.12], X-Powered-By = [PHP / 5.2.9 -1] }
Even if such sites do not exist, how to classify them as a broken link?
user754740
source share