How can I remove characters between <and> using regex in C #?

Question

How can I remove characters between <and> using regex in C #?

I have a string str="<u>rag</u>" . Now I want to get only the string "rag" . How can I get it using regex?

My code is here.

I got the output = ""

Thanks in advance.

C # code:

 string input="<u>ragu</u>"; string regex = "(\\<.*\\>)"; string output = Regex.Replace(input, regex, "");

+4

string c # regex

ragu Apr 10 '13 at 12:13

source share

5 answers

 const string HTML_TAG_PATTERN = "<.*?>"; Regex.Replace (str, HTML_TAG_PATTERN, string.Empty);

+7

cosset Apr 10 '13 at 12:16

source share

You do not need to use regex for this.

 string input = "<u>rag</u>".Replace("<u>", "").Replace("</u>", ""); Console.WriteLine(input);

+1

Soner gönül Apr 10 '13 at 12:17

source share

Of course you can:

  string input = "<u>ragu</u>"; string regex = "(\\<[/]?[az]\\>)"; string output = Regex.Replace(input, regex, "");

0

Piotr stapp Apr 10 '13 at 12:16

source share

Your code was almost correct, a small modification makes it work:

  string input = "<u>ragu</u>"; string regex = @"<.*?\>"; string output = Regex.Replace(input, regex, string.empty);

The conclusion is "ragu".

EDIT : This solution may not be the best. An interesting comment from the-land-of-devils-srilanka: do not use regular expression to parse HTML. Indeed, see Also RegEx tags that open tags, with the exception of stand-alone XHTML tags .

0

L-four Apr 10 '13 at 12:18

source share

Anirudha · Accepted Answer · 2013-04-10T12:19:56+0000

Using regex to parse html is not recommended

regex used for regularly occurring patterns. html not regular with its format (except for xhtml ). For example, html files are valid even if you do not have a closing tag ! This may break your code.

Use an html parser like htmlagilitypack

WARNING {Do not attempt to use this in your code}

To solve the problem with regex!

<.*> replaces < and then 0 with many characters (i.e. u>rag</u ) until last >

You must replace it with this regular expression

 <.*?>

.* greedy , that is, it will use as many characters as it matches

.*? lazy , meaning it will consume less characters

How can I remove characters between <and> using regex in C #?

More articles: