I am trying to use System.Windows.Forms.HTMLDocument in a console application. First, is this possible? If so, how can I load a page from the Internet into it? I tried to use WebBrowser , but he told me:
Unhandled exception: System.Threading.ThreadStateException: ActiveX control '885 6f961-340a-11d0-a96b-00c04fd705a2' cannot be the current read is not in a single-threaded apartment.
There seems to be a serious lack of tutorials on the HTMLDocument object (or Google just leads to useless results).
Just opened mshtml.HTMLDocument.createDocumentFromUrl but it threw me
Unhandled exception: System.Runtime.InteropServices.COMException (0x80010105): The server threw an exception. (Exception from HRESULT: 0x80010105 (RPC_E_SERVERF AULT)) with System.RuntimeType.ForwardCallToInvokeMember (String memberName, BindingFla flags gs, Object target, Int32 [] aWrapperTypes, MessageData & msgData) in mshtmlFrrlFl ) in iget.Program.Main (String [] args)
What the hell? All I want is a list of <a> tags on the page. Why is it so hard?
For those who are interested, here is the solution I came across thanks to TrueWill :
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Net; using System.IO; using HtmlAgilityPack; namespace iget { class Program { static void Main(string[] args) { WebClient wc = new WebClient(); HtmlDocument doc = new HtmlDocument(); doc.Load(wc.OpenRead("http://google.com")); foreach(HtmlNode a in doc.DocumentNode.SelectNodes("//a[@href]")) { Console.WriteLine(a.Attributes["href"].Value); } } } }
source share