Removing HTML Comments

How could one remove comments from HTML files?

They can occupy only one line, however I am sure that I will encounter situations when a comment can span several lines:

<!-- Single line comment. --> <!-- Multi- ple line comment. Lots '""' ' " ` ~ |}{556 of !@#$%^&*()) lines in this comme- nt! --> 
+8
html comments c # winforms
source share
3 answers

You can use the Html Agility Pack . NET Here's an article that explains how to use it in SO: How to use the HTML agility package

This is the C # code to remove comments:

  HtmlDocument doc = new HtmlDocument(); doc.Load("yourFile.htm"); // get all comment nodes using XPATH foreach (HtmlNode comment in doc.DocumentNode.SelectNodes("//comment()")) { comment.ParentNode.RemoveChild(comment); } doc.Save(Console.Out); // displays doc w/o comments on console 
+14
source share

This function with minor settings should work: -

  private string RemoveHTMLComments(string input) { string output = string.Empty; string[] temp = System.Text.RegularExpressions.Regex.Split(input, "<!--"); foreach (string s in temp) { string str = string.Empty; if (!s.Contains("-->")) { str = s; } else { str = s.Substring(s.IndexOf("-->") + 3); } if (str.Trim() != string.Empty) { output = output + str.Trim(); } } return output; } 

Not sure if this is the best solution ...

+4
source share

Not the best solution there, but a simple algo walkthrough. gotta do the trick

 List<string> output = new List<string>(); bool flag = true; foreach ( string line in System.IO.File.ReadAllLines( "MyFile.html" )) { int index = line.IndexOf( "<!--" ); if ( index > 0 )) { output.Add( line.Substring( 0, index )); flag = false; } if ( flag ) { output.Add( line ); } if ( line.Contains( "-->" )) { output.Add( line.Substring( line.IndexOf( "-->" ) + 3 )); flag = true; } } System.IO.File.WriteAllLines( "MyOutput.html", output ); 
+3
source share

All Articles