You are confused between HTML / XML escaping and UTF-8 / Unicode.
If the page is valid for XML, life will be simpler - you can simply parse it like any other XML document, and then just get the corresponding text nodes ... all XML escaping will be "uninsulated" when you receive the text.
If this is arbitrary - and possibly invalid - HTML, then life is a little more complicated. You might want to normalize it first in valid HTML, then parse it and query the text nodes again.
If you can give us a more concrete example, it will be easier for you to advise.
The HtmlDecode method suggested in other answers may be very useful to you, but you definitely need to understand what happens first. For example, you may only want to decode some HTML fragments - if you decode the entire document, then you can get text that looks like HTML tags, but actually just contained the text in the original document.
source share