Why is the HTML Agility Pack HtmlDocument.DocumentNode NULL?

Question

Why is the HTML Agility Pack HtmlDocument.DocumentNode NULL?

I use this code to change the href attribute of an HTML stream.

I first load the full html page using this code: (webpage url)

HttpWebRequest myHttpWebRequest = (HttpWebRequest)WebRequest.Create(URL);
HttpWebResponse myHttpWebResponse = 
                         (HttpWebResponse)myHttpWebRequest.GetResponse();

Stream s = myHttpWebResponse.GetResponseStream();

then I process this:

HtmlDocument doc = new HtmlDocument();

doc.Load(s);
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("/a"))
{
    string att = link.Attributes["href"].Value;
    link.Attributes["href"].Value = "http://ahmadalli.somee.com/default.aspx?url=" + att;
}
doc.Save(s);

s - html stream.

but I have an exception that says it doc.DocumentNodeis null!

I have tried many sites but doc.DocumentNodenull for

+5

c # asp.net html-agility-pack

ahmadali shafiee Feb 04 '12 at 7:21

source share

5 answers

//a /a.

XPath , , .

Update:

:

        var myHttpWebRequest = (HttpWebRequest)WebRequest.Create("http://google.com");
        var myHttpWebResponse = (HttpWebResponse)myHttpWebRequest.GetResponse();

        var s = myHttpWebResponse.GetResponseStream();

        var doc = new HtmlDocument();

        doc.Load(s);
        foreach (var link in doc.DocumentNode.SelectNodes("//a"))
        {
            var att = link.Attributes["href"].Value;
            link.Attributes["href"].Value = "http://ahmadalli.somee.com/default.aspx?url=" + att;

            Console.WriteLine(link.Attributes["href"].Value);
        }

0

GolfWolf 04 . '12 9:05

: HTML Agility Pack Null.

0

PraveenVenu 03 . '12 15:15

:

HtmlDocument htmlDoc = new HtmlDocument
        {
            OptionAddDebuggingAttributes = false,
            OptionAutoCloseOnEnd = true,
            OptionFixNestedTags = true,
            OptionReadEncoding = true
        };
        try
        {
            using (Stream reader = myHttpWebResponse.GetResponseStream())
            {
                reader.Seek(0, SeekOrigin.Begin);
                htmlDoc.Load(reader, true);
            }
            HtmlNode node = htmlDoc.DocumentNode;
            if (node != null)
            {
                foreach (var href in doc.DocumentNode.Descendants("a").Select(x =>x.Attributes["href"]))
                 {
                     href.Value = "http://ahmadalli.somee.com/default.aspx?url=" +HttpUtility.UrlEncode(href.Value);
                 }
            }
        }
        catch { }

HtmlAgility: 1.4.0

? , . Else .

0

Sunil Raj 05 . '12 4:23

- :

...doc.DocumentNode.SelectNodes("/a")    //incorrect
...doc.DocumentNode.SelectNodes("//a")   //correct
...doc.DocumentNode.SelectNodes(@"/a")   //also correct

The source code cannot select any nodes and is null; this should be checked to prevent the failure of, say, a document where there are no links at all (as if this is unlikely :)

var anchors = doc.DocumentNode.SelectNodes("//a");
if (anchors != null)
{
    foreach (HtmlNode link in anchors)
    {
        /*do stuff*/
    } 
}

0

ov Mar 6 '12 at 8:58

source share

Lb · Accepted Answer · 2012-03-03T17:45:07+0000

This works for me.

using(WebClient client = new WebClient())
{
    client.Encoding = System.Text.Encoding.UTF8;
    var doc = new HtmlAgilityPack.HtmlDocument();
    doc.LoadHtml(client.DownloadString("http://www.google.com?q=stackoverflow"));
    foreach (var href in doc.DocumentNode.Descendants("a").Select(x => x.Attributes["href"]))
    {
        if (href == null) continue;
        href.Value = "http://ahmadalli.somee.com/default.aspx?url=" + HttpUtility.UrlEncode(href.Value);
    }
    StringWriter writer = new StringWriter();
    doc.Save(writer);
    var finalHtml = writer.ToString();
}

Also see HttpUtility.UrlEncodeto be able to correctly return the url. Otherwise, some parameters of the source URL may cause problems.

Use HttpUtility.UrlDecodeto decode it.

Why is the HTML Agility Pack HtmlDocument.DocumentNode NULL?

More articles: