rag" . Now I want to get only the string "rag" . How can ...">

How can I remove characters between <and> using regex in C #?

I have a string str="<u>rag</u>" . Now I want to get only the string "rag" . How can I get it using regex?

My code is here.

I got the output = ""

Thanks in advance.

C # code:

 string input="<u>ragu</u>"; string regex = "(\\<.*\\>)"; string output = Regex.Replace(input, regex, ""); 
+4
source share
5 answers

Using regex to parse html is not recommended

regex used for regularly occurring patterns. html not regular with its format (except for xhtml ). For example, html files are valid even if you do not have a closing tag ! This may break your code.

Use an html parser like htmlagilitypack


WARNING {Do not attempt to use this in your code}

To solve the problem with regex!

<.*> replaces < and then 0 with many characters (i.e. u>rag</u ) until last >

You must replace it with this regular expression

 <.*?> 

.* greedy , that is, it will use as many characters as it matches

.*? lazy , meaning it will consume less characters

+4
source
 const string HTML_TAG_PATTERN = "<.*?>"; Regex.Replace (str, HTML_TAG_PATTERN, string.Empty); 
+7
source

You do not need to use regex for this.

 string input = "<u>rag</u>".Replace("<u>", "").Replace("</u>", ""); Console.WriteLine(input); 
+1
source

Of course you can:

  string input = "<u>ragu</u>"; string regex = "(\\<[/]?[az]\\>)"; string output = Regex.Replace(input, regex, ""); 
0
source

Your code was almost correct, a small modification makes it work:

  string input = "<u>ragu</u>"; string regex = @"<.*?\>"; string output = Regex.Replace(input, regex, string.empty); 

The conclusion is "ragu".

EDIT : This solution may not be the best. An interesting comment from the-land-of-devils-srilanka: do not use regular expression to parse HTML. Indeed, see Also RegEx tags that open tags, with the exception of stand-alone XHTML tags .

0
source

All Articles