An alternative to the Html Agility Pack is CsQuery , the CQ jQuery port, of which I am the main author. It allows you to use the CSS selector and the full request API to access and manage the DOM, which is easier for many than XPATH. In addition, this HTML parser is designed specifically for various purposes and there are several options for parsing HTML: as a complete document ( html, body tags will be added, and any lost content moves inside the body); as a content block (this means that it will not be wrapped as a complete document, but additional tags are added, such as tbody , which are still required in the DOM, as well as browsers), and as a true fragment, where there is no tags are created (for example, if you just work with building blocks).
See creating a new DOM for details.
In addition, the CsQuery HTML parser was designed with HTML5 specifications for additional closing tags. For example, closing p tags is optional, but there are certain rules that determine when a block should be closed. To create the same DOM as the browser, the parser must implement the same rules. CsQuery does this to provide a high degree of browser DOM compatibility for a given source.
Using CsQuery is very simple, for example.
CQ docFromString = CQ.Create(htmlString); CQ docFromWeb = CQ.CreateFromUrl(someUrl); // there are other methods for asynchronous web gets, creating from files, streams, etc. // css selector: the indexer [] is like jQuery $(..) CQ lastCellInFirstRow = docFromString["table tr:first-child td:last-child"]; // Text() is a jQuery method returning text contents of selection string textOfCell = lastCellInFirstRow.Text();
Finally, CsQuery indexes documents by class selector, id, attribute, and tag-making very quickly compared to the Html Agility Pack.
Jamie Treworgy
source share