How to check url in C # (error 404)

I need to write a tool that will report a damaged URL in C #. The URL should only report violations if the user sees a 404 error in the browser. I believe that there may be tricks for handling web servers that rewrite URLs. Here is what I have. Since you can see that only some URLs are checked incorrectly.

string url = ""; // TEST CASES //url = "http://newsroom.lds.org/ldsnewsroom/eng/news-releases-stories/local-churches-teach-how-to-plan-for-disasters"; //Prints "BROKEN", although this is getting re-written to good url below. //url = "http://beta-newsroom.lds.org/article/local-churches-teach-how-to-plan-for-disasters"; // Prints "GOOD" //url = "http://"; //Prints "BROKEN" //url = "google.com"; //Prints "BROKEN" althought this should be good. //url = "www.google.com"; //Prints "BROKEN" althought this should be good. //url = "http://www.google.com"; //Prints "GOOD" try { if (url != "") { WebRequest Irequest = WebRequest.Create(url); WebResponse Iresponse = Irequest.GetResponse(); if (Iresponse != null) { _txbl.Text = "GOOD"; } } } catch (Exception ex) { _txbl.Text = "BROKEN"; } 
+4
source share
4 answers

Firstly, Irequest and Iresponse should not be called that way. They should be webRequest and webResponse , or even just request and response . The capital prefix "I" is usually used only for the interface name, and not for variable variables.

To validate URLs, use UriBuilder to get Uri . Then you should use HttpWebRequest and HttpWebResponse so that you can check the response to the typed status code. Finally, you should be a little more informative about what was broken.

Here are links to some of the additional .NET materials that I submitted:

Example:

 try { if (!string.IsNullOrEmpty(url)) { UriBuilder uriBuilder = new UriBuilder(url); HttpWebRequest request = HttpWebRequest.Create(uriBuilder.Uri); HttpWebResponse response = request.GetResponse(); if (response.StatusCode == HttpStatusCode.NotFound) { _txbl.Text = "Broken - 404 Not Found"; } if (response.StatusCode == HttpStatusCode.OK) { _txbl.Text = "URL appears to be good."; } else //There are a lot of other status codes you could check for... { _txbl.Text = string.Format("URL might be ok. Status: {0}.", response.StatusCode.ToString()); } } } catch (Exception ex) { _txbl.Text = string.Format("Broken- Other error: {0}", ex.Message); } 
+6
source

Prepare http:// or https:// for the URL and pass it to the WebClient.OpenRead method. It would WebException if the URL is incorrect.

  private WebClient webClient = new WebClient(); try { Stream strm = webClient.OpenRead(URL); } catch (WebException we) { throw we; } 
0
source

The problem is that most of these โ€œshould be goodโ€, in fact, are considered at the level of the browser that I believe in. If you omit "http: //" its an invalid request, but the browser will put it for you.

So perhaps you could do a similar check that the browser would do:

  • Make sure you have "http: //" at the beginning
  • Make sure that "www." Exists. at the beginning
-1
source

Use RegEx ...

 public static bool IsUrl(string Url) { string strRegex = "^(https?://)" + "?(([0-9a-z_!~*'().&=+$%-]+: )?[0-9a-z_!~*'().&=+$%-] +@ )?" // user@ + @"(([0-9]{1,3}\.){3}[0-9]{1,3}" // IP- 199.194.52.184 + "|" // allows either IP or domain + @"([0-9a-z_!~*'()-]+\.)*" // tertiary domain(s)- www. + @"([0-9a-z][0-9a-z-]{0,61})?[0-9a-z]\." // second level domain + "[az]{2,6})" // first level domain- .com or .museum + "(:[0-9]{1,4})?" // port number- :80 + "((/?)|" // a slash isn't required if there is no file name + "(/[0-9a-z_!~*'().;?:@&=+$,%#-]+)+/?)$"; Regex re = new Regex(strRegex); if (re.IsMatch(Url)) return (true); else return (false); } 

Selected from here: http://www.osix.net/modules/article/?id=586

there are many different regular expressions if you check, for example link text

-1
source

All Articles