RSelenium coding example
Here is an example of stand-alone code using the website the question links to.
Observation: Please do not run this code.
Why? Having 1k Stack users on a website is a DDOS attack.
Introduction Background
In the code below, RSelenium will be installed before running the code you need:
- Install Firefox
- Add Selenium IDE Plugin
- Install RStudio [Recommendation]
- Create a project and open the code file below
The code below will lead you from the second page [ http://appscvs.supercias.gob.ec/portaldeinformacion/consulta_cia_param.zul] to the last page, where the information you are interested in ...
Useful links:
If you are interested in using RSelenium, I highly recommend that you read the following links, thanks for John Harrison for developing the RSelenium package.
Code example
# We want to make this as easy as possible to use # So we need to install required packages for the user... # if (!require(RSelenium)) install.packages("RSelenium") if (!require(XML)) install.packages("XML") if (!require(RJSONIO)) install.packages("RSJONIO") if (!require(stringr)) install.packages("stringr") # Data # mainPage <- "http://appscvs.supercias.gob.ec/portalInformacion/sector_societario.zul" businessPage <- "http://appscvs.supercias.gob.ec/portaldeinformacion/consulta_cia_param.zul" # StartServer # We assume RSelenium is not setup, so we check if the RSelenium # server is available, if not we install RSelenium server. checkForServer() # OK. now we start the server RSelenium::startServer() remDr <- RSelenium::remoteDriver$new() # We assume the user has installed Firefox and the Selenium IDE # https:
Now we are on the landing page [See picture]
Retrieving table values ...
The next step is to extract the table values. To do this, we extract the data .z-listitem css-selector . Now we can check if we see data rows. We do, so now we can extract the return values and populate the list or Dataframe.
here, this is the result:
> lineText <- stringr::str_split(modalWindow[[1]]$getElementText()[1], '\n') > lineText [[1]] [1] "10" [2] "OPERACIONES DE INGRESO CON PARTES RELACIONADAS EN PARAÍSOS FISCALES, JURISDICCIONES DE MENOR IMPOSICIÓN Y REGÍMENES FISCALES PREFERENTES" [3] "0.00"
Work with hidden data.
Selenium WebDriver and therefore RSelenium only interact with the visible elements of a web page. If we try to read the entire table, we will return only those table elements that are visible (not closed).
We can navigate this issue by scrolling it to the bottom of the table. We make the table populate due to the scroll action. Then we can extract the complete table.
# Select the .z-listbox-body modalWindow <- remDr$findElements(using = 'css selector', ".z-listbox-body")
What the code does.
The above code example should be self-sufficient. By this I mean that he must install everything he needs, including the necessary packages. After installing the dependent R packages, the R code will checkForServer() ; if Selenium is not installed, the call will install it. This may take some time
My recommendation is that you go through the code since I have not added any delays (you would like to be in production), also note that I am not optimized for speed, but rather for some clarity [from my point of view] .. .
It has been shown that the code works:
- Mac OS X 10.11.5
- RStudio 0.99.893
- R version 3.2.4 (2016-03-10) - "Very safe meals"
