I just spent hours trying to get HtmlAgilityPack to display some kind of dynamic ajax content from a web page, and I switched from one useless mail to another until I found this one.
The answer is hidden in the comment under the first post, and I thought I should straighten it out.
This is the method that I used initially and did not work:
private void LoadTraditionalWay(String url) { WebRequest myWebRequest = WebRequest.Create(url); WebResponse myWebResponse = myWebRequest.GetResponse(); Stream ReceiveStream = myWebResponse.GetResponseStream(); Encoding encode = System.Text.Encoding.GetEncoding("utf-8"); TextReader reader = new StreamReader(ReceiveStream, encode); HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.Load(reader); reader.Close(); }
WebRequest will not execute or execute ajax requests that display missing content.
This is the solution that worked:
private void LoadHtmlWithBrowser(String url) { webBrowser1.ScriptErrorsSuppressed = true; webBrowser1.Navigate(url); waitTillLoad(this.webBrowser1); HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); var documentAsIHtmlDocument3 = (mshtml.IHTMLDocument3)webBrowser1.Document.DomDocument; StringReader sr = new StringReader(documentAsIHtmlDocument3.documentElement.outerHTML); doc.Load(sr); } private void waitTillLoad(WebBrowser webBrControl) { WebBrowserReadyState loadStatus; int waittime = 100000; int counter = 0; while (true) { loadStatus = webBrControl.ReadyState; Application.DoEvents(); if ((counter > waittime) || (loadStatus == WebBrowserReadyState.Uninitialized) || (loadStatus == WebBrowserReadyState.Loading) || (loadStatus == WebBrowserReadyState.Interactive)) { break; } counter++; } counter = 0; while (true) { loadStatus = webBrControl.ReadyState; Application.DoEvents(); if (loadStatus == WebBrowserReadyState.Complete && webBrControl.IsBusy != true) { break; } counter++; } }
The idea is to load using WebBrowser, which is able to display ajax content, and then wait until the page is fully rendered, and then using the Microsoft.mshtml library, re-parse the HTML in the flexibility package.
This was the only way to access dynamic data.
Hope this helps someone
Nick
source share