Handling htmlParse errors (HTTP resource failed to load)

I am trying to clear the page. However, from time to time my loop does not work, because the parser "does not load the HTTP resource." The problem is that the page does not load in my browser, so this is not a problem with the code.

However, it is very unpleasant to restart the process after creating an exception on each page where I find an error. I wonder if there is a way to set the if condition. I am thinking of something like: if an error occurs, restart the loop in the next step.

I look at the help page for htmlParse and found that there is an error argument, but could not figure out how to use it. Any ideas for my if condition?

The following is a reproducible example:

if(require(RCurl) == F) install.packages('RCurl') if(require(XML) == F) install.packages('XML') if(require(seqinr) == F) install.packages('seqinr') for (i in 575:585){ currentPage <- i # define pagina inicial da busca # Link que ser? procurado link <- paste("http://www.cnj.jus.br/improbidade_adm/visualizar_condenacao.php?seq_condenacao=", currentPage, sep='') doc <- htmlParse(link, encoding = "UTF-8") #this will preserve characters tables <- readHTMLTable(doc, stringsAsFactors = FALSE) if(length(tables) != 0) { tabela2 <- as.data.frame(tables[10]) tabela2[,1] <- gsub( "\\n", " ", tabela2[,1] ) tabela2[,2] <- gsub( "\\n", " ", tabela2[,2] ) tabela2[,2] <- gsub( "\\t", " ", tabela2[,2] ) listofTabelas[[i]] <- tabela2 tabela1 <- do.call("rbind", listofTabelas) names(tabela1) <- c("Variaveis", "status") } } 
+6
source share
1 answer

You might be better off using the httr package.

 library(httr) library(XML) url <- "http://www.cnj.jus.br/improbidade_adm/visualizar_condenacao.php" for (i in 575:585){ response<- GET(url,path="/",query=c(seq_condenacao=as.character(i))) if (response$status_code!=200){ # HTTP request failed!! # do some stuff... print(paste("Failure:",i,"Status:",response$status_code)) next } doc <- htmlParse(response, encoding = "UTF-8") # do some other stuff print(paste("Success:",i,"Status:",response$status_code)) } # [1] "Success: 575 Status: 200" # [1] "Success: 576 Status: 200" # [1] "Success: 577 Status: 200" # [1] "Success: 578 Status: 200" # [1] "Success: 579 Status: 200" # [1] "Success: 580 Status: 200" # [1] "Success: 581 Status: 200" # [1] "Success: 582 Status: 200" # [1] "Success: 583 Status: 200" # [1] "Success: 584 Status: 200" # [1] "Success: 585 Status: 200" 
+8
source

All Articles