Getting meta tag attribute using Agility Pack using XPATH
META HTTP-EQUIV = "Content-Type" CONTENT = "text / html; charset = iso-8859-1" /> TITLE> Microsoft Corporation META http-equiv = "PICS-Label" content = "(PICS-1.1" http://www.rsac.org/ratingsv01.html "l gen true r (n 0 s 0 v 0 l 0))" /> META NAME = "KEYWORDS" CONTENT = "products; headlines; downloads; news; Web site; what new; solutions; services; software; contests; corporate news;" /> META NAME = "DESCRIPTION" CONTENT = "The entry page to Microsoft Web site. Find software, solutions, answers, support, and Microsoft news." /> META NAME = "MS.LOCALE" CONTENT = "EN-US" /> META NAME = "CATEGORY" CONTENT = "home page" />
I would like to know what XPATH I will need to get the value of the Content attribute of the Category meta tags using the HTML Agility Pack. (I deleted the first <of each line in the html code so that it is published).
For a long time, HtmlAgilityPack was not able to directly request the attribute value . You had to iterate over the list of meta nodes. Here is one way -
var doc = new HtmlDocument(); doc.LoadHtml(htmlString); var list = doc.DocumentNode.SelectNodes("//meta"); foreach (var node in list) { string content = node.GetAttributeValue("content", ""); } But there seems to be an experimental version of xpath that will allow you to do this.
doc.Document.SelectNodes("//meta/@content") will return a list of HtmlAttribute objects.
Thanks for the quick reply Rohit Agarwal (I saw that he answered only a few hours after I asked, but could not verify it until today).
I originally implemented your proposal as follows (it's on vb.net)
Dim result As String = webClient.DownloadString(url) Dim doc As New HtmlDocument() doc.LoadHtml(result)
Dim list = doc.DocumentNode.SelectNodes("//meta") Dim node As Object For Each node In list Dim metaname As String = node.GetAttributeValue("name", String.Empty) If metaname <> String.Empty Then If (metaname = "title") Then title = node.GetAttributeValue("content", String.Empty) //more elseif thens End if End if Next (node) code>
However, I found that // meta [@ name = 'title'] will give me the same result
Dim result As String = webClient.DownloadString(url)
Dim doc As New HtmlDocument () doc.LoadHtml (result)
title = doc.DocumentNode.SelectNodes ("// meta [@ name = 'title']") (0) .GetAttributeValue ("content", String.Empty)
Thanks for putting me on the right track = D
If you want the meta tag to display a title, description and keywords, use
if (metaTags != null) { foreach (var tag in metaTags) { if ((tag.Attributes["name"] != null) & (tag.Attributes["content"] != null)) { Panel divPage = new Panel(); divPage.InnerHtml = divPage.InnerHtml + "<br /> " + "<b> Page " + tag.Attributes["name"].Value + " </b>: " + tag.Attributes["content"].Value + "<br />"; } } } If you want to get og:tags from the link, add this code after that
if ((tag.Attributes["property"] != null) & (tag.Attributes["content"] != null)) { if (tag.Attributes["property"].Value == "og:image") { img.ImageUrl = tag.Attributes["content"].Value; } } This is a great experience ... I like it :) this code ever