C # code to save an entire web page? (with images / formatting)

I am trying to find a sample C # code (I am using C # Visual Studio 2008 Express ) that can programmatically save the entire web page (given the URL), including images and formatting (e.g. CSS). It is assumed that in the next step I would send this (I don’t know how to do it yet) so that it can be viewed later through the browser.

Is there an example of the simplest approach (using .NET Framework methods) to save an entire web page? Saving as a single page with a subdirectory for images or otherwise. Basically the same thing you get with browsers when you say “save entire web page”.

+4
source share
3 answers

The easiest way is to add WebBrowser Control to the application and specify it on the page you want to save using the Navigate() method.

Then, when the document has loaded, call the ShowSaveAsDialog method . Then the user can save the page as a single file or a file with images in a subdirectory.

[Update]

Now, having noticed “programmatically” in your question, the above approach is not ideal, as it requires user participation or understanding of the Windows API to send input using SendKeys or similar.

There is nothing built into the .NET Framework that does everything you ask.

So my revised approach would be as follows:

  • Use System.NET.HttpWebRequest to get the main HTML document as a string or stream (easy).
  • Download this into an HTMLAgilityPack document, where now you can easily request a document to get lists of all image elements, style links, etc.
  • Then create a separate web request for each of these files and save them in a subdirectory.
  • Finally, update all relevant links on the home page to point to items in a subdirectory.

In essence, you will implement a very simple web browser. You may encounter problems with pages using JavaScript to dynamically change or request the contents of a page, but for most pages this should give acceptable results.

+6
source

From Project Code: ZetaWebSpider

+1
source

This is definitely not elegant, but you can navigate to System.Windows.Forms.WebBrowser at the URL and then call its ShowSaveAsDiagog() method to save the page.

0
source

All Articles