Register sites for most images?

This may seem a little complicated, but I want to find all the <a> containing <img> , so that first the images that are in the same node with the most other images will be selected.

For example, if my page looks like this:

http://img684.imageshack.us/img684/5678/imagechart.gif

If the blue squares are <div> and the pink squares are <img> , then the middle div contains most of the images, then these images are selected first. Since they are not nested anywhere deeper, they simply appear in the order in which they appear on the page. Then the first div is selected (contains the 2nd most of the images), etc ... does that make sense?

We can think of it recursively. First, body is selected, as it will always contain most of the images, then each of the direct children is examined to see what the majority of the descendants of the images (not necessarily direct) contain, then we go to this node and repeat ...

+4
source share
3 answers

Current solution:

  private static int Count(HtmlNodeCollection nc) { return nc == null ? 0 : nc.Count; } private static void BuildList(HtmlNode node, ref List<HtmlNode> list) { var sortedNodes = from n in node.ChildNodes orderby Count(n.SelectNodes(".//a[@href and img]")) descending select n; foreach (var n in sortedNodes) { if (n.Name == "a") list.Add(n); else if (n.HasChildNodes) BuildList(n, ref list); } } 

Usage example:

  private static void ProcessDocument(HtmlDocument doc, Uri baseUri) { var linkNodes = new List<HtmlNode>(100); BuildList(doc.DocumentNode, ref linkNodes); // ... 

This is a little inefficient, although due to the fact that he retells a lot, but good.

0
source

You can try to see the number of images for each node.

  public static XmlNode FindNodeWithMostImages(XmlNodeList 

nodes) {

  var greatestImageCount = 0; XmlNode nodeWithMostImages = null; foreach (XmlNode node in nodes) { var currentNode = node; var currentNodeImageCount = node.SelectNodes("*/child::img").Count; if (currentNodeImageCount > greatestImageCount) { greatestImageCount = currentNodeImageCount; nodeWithMostImages = node; } } return nodeWithMostImages; } 
+1
source

XPATH 1.0 does not provide the ability to sort collections. You will need to use XPATH with something else.

Here is an example XSLT solution in which all elements containing descendant elements of <img> , and then sorts them according to the account of their descendants <img> in descending order.

  <xsl:template match="/"> <!--if only want <a>, then select //a[descendant::img] --> <xsl:for-each select="//*[descendant::img]"> <xsl:sort select="count(descendant::img)" order="descending" /> <!--Example output to demonstrate what elements have been selected--> <xsl:value-of select="name()"/><xsl:text> has </xsl:text> <xsl:value-of select="count(.//img)" /> <xsl:text> descendant images </xsl:text> </xsl:for-each> </xsl:template> </xsl:stylesheet> 

I was not clear from your question and examples whether you want to find any element with a descendant <img> or just <a> with a stream <img> .

If you just want to find <a> elements with descendant <img> elements, then configure XPATH in for each to select: //a[descendant::img]

+1
source

All Articles