What is the best way to remove tags from the end of a line?

The .NET web system I'm working on allows end users to enter HTML text in some situations. In some of these places we want to leave all the tags, but discard any trailing tags (but leave any breaks inside the body of the text.)

What is the best way to do this? (I can figure out how to do this, but I'm sure they are not the best.)

+6
string xhtml
source share
7 answers

Like @ Mitch said

// using System.Text.RegularExpressions; /// <summary> /// Regular expression built for C# on: Thu, Sep 25, 2008, 02:01:36 PM /// Using Expresso Version: 2.1.2150, http://www.ultrapico.com /// /// A description of the regular expression: /// /// Match expression but don't capture it. [\<br\s*/?\>], any number of repetitions /// \<br\s*/?\> /// < /// br /// Whitespace, any number of repetitions /// /, zero or one repetitions /// > /// End of line or string /// /// /// </summary> public static Regex regex = new Regex( @"(?:\<br\s*/?\>)*$", RegexOptions.IgnoreCase | RegexOptions.CultureInvariant | RegexOptions.IgnorePatternWhitespace | RegexOptions.Compiled ); regex.Replace(text, string.Empty); 
+12
source share

A small change in bdukes that should be faster since it is not returning.

 public static Regex regex = new Regex( @"(?:\<br[^>]*\>)*$", RegexOptions.IgnoreCase | RegexOptions.CultureInvariant | RegexOptions.IgnorePatternWhitespace | RegexOptions.Compiled ); regex.Replace(text, string.Empty); 
+4
source share

I am sure that this is not the best way either, but it should work if you do not have spaces or anything else.

 while (myHtmlString.EndsWith("<br>")) { myHtmlString = myHtmlString.SubString(0, myHtmlString.Length - 4); } 
+3
source share

I am trying to ignore the ambiguity in your original question and read it literally. Here is an extension method that overloads TrimEnd to take a string.

 static class StringExtensions { public static string TrimEnd(this string s, string remove) { if (s.EndsWith(remove)) { return s.Substring(0, s.Length - remove.Length); } return s; } } 

Here are some tests showing that it works:

  Debug.Assert("abc".TrimEnd("<br>") == "abc"); Debug.Assert("abc<br>".TrimEnd("<br>") == "abc"); Debug.Assert("<br>abc".TrimEnd("<br>") == "<br>abc"); 

I want to point out that this solution is easier to read than regex, perhaps faster than regex (you should use a profiler rather than speculation if you're concerned about performance), and is useful for removing other things from line ends.

regex becomes more appropriate if your problem is more general than what you stated (for example, if you want to remove <BR> and </BR> and deal with trailing spaces or something else.

+3
source share

You can use a regular expression to find and delete text using a set of regular expression matches to snap at the end of a line.

+2
source share

You can also try (if the markup is most likely a valid tree), something similar to:

 string s = "<markup><div>Text</div><br /><br /></markup>"; XmlDocument doc = new XmlDocument(); doc.LoadXml(s); Console.WriteLine(doc.InnerXml); XmlElement markup = doc["markup"]; int childCount = markup.ChildNodes.Count; for (int i = childCount -1; i >= 0; i--) { if (markup.ChildNodes[i].Name.ToLower() == "br") { markup.RemoveChild(markup.ChildNodes[i]); } else { break; } } Console.WriteLine("---"); Console.WriteLine(markup.InnerXml); Console.ReadKey(); 

The above code is a bit of a β€œscratch”, but if you cut it and paste it into the console application and run it, it works: =)

+1
source share

you can use RegEx or check if the ending line is a break and delete it

0
source share

All Articles