How can I remove characters between <and> using regex in C #?
Using regex to parse html is not recommended
regex used for regularly occurring patterns. html not regular with its format (except for xhtml ). For example, html files are valid even if you do not have a closing tag ! This may break your code.
Use an html parser like htmlagilitypack
WARNING {Do not attempt to use this in your code}
To solve the problem with regex!
<.*> replaces < and then 0 with many characters (i.e. u>rag</u ) until last >
You must replace it with this regular expression
<.*?> .* greedy , that is, it will use as many characters as it matches
.*? lazy , meaning it will consume less characters
Your code was almost correct, a small modification makes it work:
string input = "<u>ragu</u>"; string regex = @"<.*?\>"; string output = Regex.Replace(input, regex, string.empty); The conclusion is "ragu".
EDIT : This solution may not be the best. An interesting comment from the-land-of-devils-srilanka: do not use regular expression to parse HTML. Indeed, see Also RegEx tags that open tags, with the exception of stand-alone XHTML tags .