HtmlAgilityPack.HtmlDocument Cookies

This applies to cookies set inside the script (possibly inside the script tag).

System.Windows.Forms.HtmlDocument executes these scripts, and a set of cookies (for example, document.cookie=etc... ) can be obtained through the Cookies property.

I assume that HtmlAgilityPack.HtmlDocument does not do this (execution). I wonder if there is an easy way to emulate the capabilities of System.Windows.Forms.HtmlDocument (part of the cookies).

Is anyone

+4
source share
2 answers

When I need to use Cookies and HtmlAgilityPack together or just create user requests (for example, set the User-Agent property, etc.), here is what I do:

  • Create a class that encapsulates the request / response. Let This WebQuery
  • Observe the personal cookie CookieCollection (in your case public ) inside this class
  • Create a method inside the class that manually executes the request. Signature may be:

...

 public HtmlAgilityPack.HtmlDocument GetSource(string url); 

What do we need to do inside this method?

Well, using HttpWebRequest and HttpWebResponse, generate the HTTP request manually (there are several examples of how to do this on the Internet), create an instance of the HtmlDocument class using the constructor that receives the stream.

Which stream should we use? Well, the one that returns:

 httpResponse.GetResponseStream(); 

If you use HttpWebRequest to execute the request, you can easily set its CookieContainer property for the variable that you declared before each visit to a new page, and thus all cookies set by the sites you access will be correctly saved in the CookieContainer variable, which you specified in the WebQuery class , assuming that you are using only one instance of the WebQuery class.

I hope you find this explanation helpful. Bear in mind that with this you can do whatever you want, regardless of whether the HtmlAgilityPack supports it or not.

+4
source

I also worked with Rohit Agarwal BrowserSession along with HtmlAgilityPack. But for me, subsequent calls to "Get-function" did not work, because every time new cookies were set. This is why I added some features myself. (My solution is far from ideal - it's just a quick and dirty fix) But it worked for me, and if you don't want to spend a lot of time researching the BrowserSession , this is what I did:

The added / changed functions are as follows:

 class BrowserSession{ private bool _isPost; private HtmlDocument _htmlDoc; public CookieContainer cookiePot; //<- This is the new CookieContainer ... public string Get2(string url) { HtmlWeb web = new HtmlWeb(); web.UseCookies = true; web.PreRequest = new HtmlWeb.PreRequestHandler(OnPreRequest2); web.PostResponse = new HtmlWeb.PostResponseHandler(OnAfterResponse2); HtmlDocument doc = web.Load(url); return doc.DocumentNode.InnerHtml; } public bool OnPreRequest2(HttpWebRequest request) { request.CookieContainer = cookiePot; return true; } protected void OnAfterResponse2(HttpWebRequest request, HttpWebResponse response) { //do nothing } private void SaveCookiesFrom(HttpWebResponse response) { if ((response.Cookies.Count > 0)) { if (Cookies == null) { Cookies = new CookieCollection(); } Cookies.Add(response.Cookies); cookiePot.Add(Cookies); //-> add the Cookies to the cookiePot } } 

What he does: he basically saves cookies from the initial "post-response" and adds the same CookieContainer to the request, which is called later. I donโ€™t quite understand why it didnโ€™t work in the original version, because it somehow does the same in the AddCookiesTo function. (if (Cookies! = null & & Cookies.Count> 0) request.CookieContainer.Add (Cookies);) In any case, it should work fine with these added functions.

It can be used as follows:

 //initial "Login-procedure" BrowserSession b = new BrowserSession(); b.Get("http://www.blablubb/login.php"); b.FormElements["username"] = "yourusername"; b.FormElements["password"] = "yourpass"; string response = b.Post("http://www.blablubb/login.php"); 

all subsequent calls should use:

 response = b.Get2("http://www.blablubb/secondpageyouwannabrowseto"); response = b.Get2("http://www.blablubb/thirdpageyouwannabrowseto"); ... 

I hope this helps when you come across the same problem.

+2
source

All Articles