Get the final generated HTML source using C # or vb.net

using VB.net or C #, How to get the generated HTML source?

To get the html source of the page, I can use this below, but this will not lead to the creation of the generated source, it will not contain any of the html that javascript dynamically added in the browser. How to get the final generated HTML source?

thanks

WebRequest req = WebRequest.Create("http://www.asp.net"); WebResponse res = req.GetResponse(); StreamReader sr = new StreamReader(res.GetResponseStream()); string html = sr.ReadToEnd(); 

If I try this below, it will return the document without the JavaScript code entered

 Public Class Form1 Dim WB As WebBrowser = Nothing Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load WB = New WebBrowser() Me.Controls.Add(WB) AddHandler WB.DocumentCompleted, AddressOf WebBrowser1_DocumentCompleted WB.Navigate("mysite/Default.aspx") End Sub Private Sub WebBrowser1_DocumentCompleted(sender As Object, e As WebBrowserDocumentCompletedEventArgs) 'Dim htmlcode As String = WebBrowser1.Document.Body.OuterHtml() Dim s As String = WB.DocumentText End Sub End Class 

HTML returned

 <!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"> <head runat="server"> <title></title> </head> <body> <form id="form1" runat="server"> <div id="center_text_panel"> //test text this text should be here </div> </form> </body> </html> <script type="text/javascript"> document.getElementById("center_text_panel").innerText = "test text"; </script> 
+7
source share
3 answers

You can use WebKit.NET

Look here for official tutorials

This can not only capture the source, but also handle javascript through the pageload event.

 webKitBrowser1.Navigate(MyURL) 

Then handle the DocumentCompleted event and:

 private documentContent = webKitBrowser1.DocumentText 

Change This might be the best open source WebKit option: http://code.google.com/p/open-webkit-sharp/

+2
source

Just put the webbrowser element in your form and you will skip the code:

  webBrowser1.Navigate("YourLink"); private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e) { string htmlcode= webBrowser1.Document.Body.InnerHtml;//Or Each Filed Or element..//WebBrowser.DocumentText } 

Edited

to get also html code that is generated dynamically using java script code, you have two ways:

  • run current code after webBrowser1_DocumentCompleted Event
  StringBuilder htmlcode = new StringBuilder(); foreach (HtmlElement item in webBrowser1.Document.All) { htmlcode.Append( item.InnerHtml); } 
  • write javascript code to return document.documentElement.innerHTML and using the InvolkeScript function to return the result:
  var htmlcode = webBrowser1.Document.InvokeScript("javascriptcode"); 
+1
source

You can use this code:

 webBrowser1.Document.Body.OuterHtml 
0
source

All Articles