How to get a string from an html file?

How can I search and retrieve a string from html files using C # in asp.net ? This is the code:

private string getHtml(string key) { StreamReader f = new StreamReader("path"); string htmlTag = key; string str = f.ReadToEnd().ToString(); Match m = Regex.Match(str, "<" + htmlTag + ">" + "(.*)" + "</" + htmlTag + ">", RegexOptions.Singleline); Console.WriteLine(m.Groups[0]); return str; } 
-1
source share
2 answers

In your RegEx, try changing this:

 "(.*)" 

:

 "([^<]*)" 

So, instead of matching ANY character, you match any characters until (but not including) the next one less than a character.

You can also change this:

 "</" + htmlTag + ">" 

to that

 "</ ?" + htmlTag + ">" 

To resolve the space after the slash (you can ignore this second sentence if you have full control over the HTML documents and know exactly how they were encoded)

0
source

You can use the Html Agility Pack, available here: http://htmlagilitypack.codeplex.com/

0
source

All Articles