Parsing a small web page with xml2 raises XML_PARSE_HUGE error

Question

Parsing a small web page with xml2 raises XML_PARSE_HUGE error

Recently, a user of my rNOMADS package in R began to receive unexpected errors:

Error: Excessive depth in document: 256 use XML_PARSE_HUGE option [1]

We tracked the problem to this command:

 html.tmp <- xml2::read_html("http://nomads.ncep.noaa.gov/cgi-bin/filter_rap.pl?dir=%2Frap.20151120")

By the link, it seems that the analyzed web page is no larger than others that work fine and much less than the 1 megabyte limit that the XML_PARSE_HUGE option should require. Besides,

 xml2::read_html

doesn't really have an XML_PARSE_HUGE parameter. The only other potential solution described here is not suitable for the official R package.

What is the reason for this error, and is it possible to resolve it without resorting to solutions outside the official CRAN repository?

+1

r xml-parsing web-scraping libxml2

glossarch Nov 20 '15 at 4:39

source share

1 answer

glossarch · Answer 1 · 2015-12-31T00:29:56+0000

The best I can do so far is to install shabbychef a forked version of xml2 that forces XML_PARSE_HUGE. You can install this version of xml2 through

 library(drat) drat:::add("shabbychef") install.packages('xml2')

For now, please use this job if you encounter XML_PARSE_HUGE errors in rNOMADS.

Parsing a small web page with xml2 raises XML_PARSE_HUGE error

More articles: