TITLE> ...">

Getting meta tag attribute using Agility Pack using XPATH

  META HTTP-EQUIV = "Content-Type" CONTENT = "text / html; charset = iso-8859-1" />
 TITLE> Microsoft Corporation
 META http-equiv = "PICS-Label" content = "(PICS-1.1" http://www.rsac.org/ratingsv01.html "l gen true r (n 0 s 0 v 0 l 0))" />
 META NAME = "KEYWORDS" CONTENT = "products; headlines; downloads; news; Web site; what new; solutions; services; software; contests; corporate news;"  />
 META NAME = "DESCRIPTION" CONTENT = "The entry page to Microsoft Web site. Find software, solutions, answers, support, and Microsoft news."  />
 META NAME = "MS.LOCALE" CONTENT = "EN-US" />
 META NAME = "CATEGORY" CONTENT = "home page" />

I would like to know what XPATH I will need to get the value of the Content attribute of the Category meta tags using the HTML Agility Pack. (I deleted the first <of each line in the html code so that it is published).

+4
source share
4 answers

For a long time, HtmlAgilityPack was not able to directly request the attribute value . You had to iterate over the list of meta nodes. Here is one way -

var doc = new HtmlDocument(); doc.LoadHtml(htmlString); var list = doc.DocumentNode.SelectNodes("//meta"); foreach (var node in list) { string content = node.GetAttributeValue("content", ""); } 

But there seems to be an experimental version of xpath that will allow you to do this.

 doc.Document.SelectNodes("//meta/@content") 

will return a list of HtmlAttribute objects.

+12
source

Thanks for the quick reply Rohit Agarwal (I saw that he answered only a few hours after I asked, but could not verify it until today).

I originally implemented your proposal as follows (it's on vb.net)

Dim result As String = webClient.DownloadString(url) Dim doc As New HtmlDocument() doc.LoadHtml(result)

  Dim list = doc.DocumentNode.SelectNodes("//meta") Dim node As Object For Each node In list Dim metaname As String = node.GetAttributeValue("name", String.Empty) If metaname <> String.Empty Then If (metaname = "title") Then title = node.GetAttributeValue("content", String.Empty) //more elseif thens End if End if Next (node) 

code>

However, I found that // meta [@ name = 'title'] will give me the same result

Dim result As String = webClient.DownloadString(url)

Dim doc As New HtmlDocument () doc.LoadHtml (result)

title = doc.DocumentNode.SelectNodes ("// meta [@ name = 'title']") (0) .GetAttributeValue ("content", String.Empty)

Thanks for putting me on the right track = D

+2
source

If you want the meta tag to display a title, description and keywords, use

  if (metaTags != null) { foreach (var tag in metaTags) { if ((tag.Attributes["name"] != null) & (tag.Attributes["content"] != null)) { Panel divPage = new Panel(); divPage.InnerHtml = divPage.InnerHtml + "<br /> " + "<b> Page " + tag.Attributes["name"].Value + " </b>: " + tag.Attributes["content"].Value + "<br />"; } } } 

If you want to get og:tags from the link, add this code after that

  if ((tag.Attributes["property"] != null) & (tag.Attributes["content"] != null)) { if (tag.Attributes["property"].Value == "og:image") { img.ImageUrl = tag.Attributes["content"].Value; } } 

This is a great experience ... I like it :) this code ever

+2
source

Without error checking:

 doc.DocumentNode.SelectSingleNode("//meta[@name='description']").Attributes["content"].Value; 

Of course, if Node is Null or if the content attribute is missing, this will create a problem.

0
source

Source: https://habr.com/ru/post/1315452/


All Articles