C # string HTML & # 8594; length without html

I have a line with HTML images, for example:

string str = "There is some nice <img alt='img1' src='img/img1.png' /> images in this <img alt='img2' src='img/img2.png' /> string. I would like to ask you <img alt='img3' src='img/img3.png' /> how Can I can I get the Lenght of the string?";

I would like to get the string length without images and the number of images. So, the result should be:

int strLenght = 111;
int imagesCount= 3;

Can you show me the most efficient way please?

thank

+4
source share
5 answers

I would suggest using a real HTML parser, for example HtmlAgilityPack. Then it is simple:

string html = "There is some nice <img alt='img1' src='img/img1.png' /> images in this <img alt='img2' src='img/img2.png' /> string. I would like to ask you <img alt='img3' src='img/img3.png' /> how Can I can I get the Lenght of the string?";

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
int length = doc.DocumentNode.InnerText.Length;               // 114
int imageCount = doc.DocumentNode.Descendants("img").Count(); // 3

This is what DocumentNode.InnerTextreturns to your sample, you missed a few spaces:

There is some nice  images in this  string. I would like to ask you  how Can I can I get the Lenght of the string?
+3
source

I had a similar problem and created this method. You can use it to cut HTML tags and count the string.

public static string StripHtmlTags(string source)
{
  if (string.IsNullOrEmpty(source))
  {
    return string.Empty;
  }

  var array = new char[source.Length];
  int arrayIndex = 0;
  bool inside = false;
  for (int i = 0; i < source.Length; i++)
  {
    char let = source[i];
    if (let == '<')
    {
      inside = true;
      continue;
    }

    if (let == '>')
    {
      inside = false;
      continue;
    }

    if (!inside)
    {
      array[arrayIndex] = let;
      arrayIndex++;
    }
  }

  return new string(array, 0, arrayIndex);
}

:

int strLength = StripHtmlTags(str).Count;
+2

Add the link (COM) to MSHTML (Microsoft HTML object lib) and you can:

var doc = (IHTMLDocument2)new HTMLDocument();
doc.write(str);

Console.WriteLine("Length: {0}", doc.body.innerText.Length);
Console.WriteLine("Images: {0}", doc.images.length);
+2
source

If you want to do this with RegularExpression, as I mentioned in my comment above. Use the following code

var regex = new System.Text.RegularExpressions.Regex("<img[^>]*/>");
var plainString = regex.Replace(str, ""); 

// plainString.length will be string length without images
    var cnt = regex.Matches(str).Count; // cnt will be number of images
+1
source

I liked John Smith's solution, but I had to add Trim()at the end to match the result of MS Word.

Use this:

return new string(array, 0, arrayIndex).Trim();
0
source

All Articles