C # .net Use HTMLDocument from console?

I am trying to use System.Windows.Forms.HTMLDocument in a console application. First, is this possible? If so, how can I load a page from the Internet into it? I tried to use WebBrowser , but he told me:

Unhandled exception: System.Threading.ThreadStateException: ActiveX control '885 6f961-340a-11d0-a96b-00c04fd705a2' cannot be the current read is not in a single-threaded apartment.

There seems to be a serious lack of tutorials on the HTMLDocument object (or Google just leads to useless results).


Just opened mshtml.HTMLDocument.createDocumentFromUrl but it threw me

Unhandled exception: System.Runtime.InteropServices.COMException (0x80010105): The server threw an exception. (Exception from HRESULT: 0x80010105 (RPC_E_SERVERF AULT)) with System.RuntimeType.ForwardCallToInvokeMember (String memberName, BindingFla flags gs, Object target, Int32 [] aWrapperTypes, MessageData & msgData) in mshtmlFrrlFl ) in iget.Program.Main (String [] args)

What the hell? All I want is a list of <a> tags on the page. Why is it so hard?


For those who are interested, here is the solution I came across thanks to TrueWill :

 using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Net; using System.IO; using HtmlAgilityPack; namespace iget { class Program { static void Main(string[] args) { WebClient wc = new WebClient(); HtmlDocument doc = new HtmlDocument(); doc.Load(wc.OpenRead("http://google.com")); foreach(HtmlNode a in doc.DocumentNode.SelectNodes("//a[@href]")) { Console.WriteLine(a.Attributes["href"].Value); } } } } 
+4
source share
3 answers

Alternatively, you can use the free Html Agility Pack . This can parse HTML and allow you to query it using LINQ. I used an older version for home and it worked great.

EDIT: You can also use the WebClient or WebRequest classes to load a web page. See my blog post Web Scraper in .NET . (Please note that I have not tried this in a console application.)

+6
source

Add the [STAThread] attribute to your main method

  [STAThread] static void Main(string[] args) { } 

That should fix it.

+3
source

If xhtml loads it into an XDocument and parses the anchor tags, or you can also do it with RegEx if all you need is anchor tags.

-one
source

All Articles