At first, I would like to apologize for the new question, as my profile still does not allow me to comment on other people's comments, especially on the two SO posts that I saw. So please bring this older guy :-)
I am trying to read a list of 100 character files ranging in size from 90 KB to 2 MB, and then using the qdap package qdap make some statistics with the text that I extract from the files, namely, counting sentences, words, etc. Files using RSelenium::remoteDriver$getPageSource() , which was previously cleared by the original page and saved to the file using write(pgSource, fileName.txt) . I read the files in a loop using:
pgSource <- readChar(file.path(fPath, fileNames[i]), nchars = 1e6) doc <- read_html(pgSource)
that for some files is thrown
Error in eval(substitute(expr), envir, enclos) : Excessive depth in document: 256 use XML_PARSE_HUGE option [1]
I saw these posts, SO33819103 and SO31419409 , that point to similar issues, but cannot fully understand how to use the @shabbychef workaround as suggested in both posts using the snippet suggested by @glossarch in the first link above.
library(drat) drat:::add("shabbychef"); install.packages('xml2') library("xml2")
EDIT: I noticed that when I ran another script earlier that scraped real-time data from web pages using a URL, I did not encounter this problem. The code was the same, I just read doc <- read_html(pgSource) after reading it from RSelenium remoteDriver .
What I would like to ask in this gentle community is whether I follow the correct steps to install and download xml2 after adding shabbychef drat or do I need to add some other step as suggested in SO17154308 . Any help or suggestions are welcome. Thanks.