R A brilliant application for memory or noSQL

I am running a tiny web application using an R-frame. The tool does not do so much. It simply filters data frames with given parameters from the user interface. Now the problem is as follows. If the user accesses the application via http, it takes a long time to start the application. Since the data that I upload to global.R is pretty big (~ 5 GB). After the initial launch, the application runs smoothly, also with repeated access for a given time (the application seems to be completely in memory for several minutes). Since I have enough memory and my data does not change when interacting with the user, I ask myself if I can save the full application in memory. Can this be forced? My server runs centOS 6. Also, the problem is not in the file system, hard drive, etc. - I created a disk with a disk for loading data, but the increase in performance is negligible. Thus, the neck of the bottle appears to be R when processing data.

Now I have two ideas that can solve the problem.

  • As I mentioned, is it possible to save the complete application in memory?
  • Do not save Data objects as R, instead use a fast noSQL DB, for example. Redis which is in memory

Perhaps one of you has experience loading big data. I would be grateful if I could start a discussion. If possible, I would like to avoid using external software such as Redis so that everything is as simple as possible.

With all the best

Mario

+7
r nosql shiny redis in-memory-database
source share
2 answers

I have no experience with noSQL databases, but here is how I combine brilliant information with an Oracle database to speed up my applications:

User inputs are transferred to an SQL query, which is sent to an extremely fast database, and only the output of this query is read in R. In many cases (especially if sql includes a group by expression), this reduces the number of observations read from several million to several hundreds. Therefore, data loading is very fast.

In the example below, users first select questionnaires and a date range. This generates a sql statement that filters the relevant observations and counts the frequency of answers to the question and questionnaire. These frequencies are read in R and displayed as data in a brilliant application.

 library(shiny) library(ROracle) library(DT) drv <- dbDriver("Oracle") con <-dbConnect(drv, username = "...", password = '...', dbname = "...") query <- 'select distinct questionnaire from ... order by questionnaire' questionnaire.list <- dbGetQuery(con, query)$questionnaire ui <- fluidPage( selectInput('questionnaire_inp','Questionnaire', choices=questionnaire.list,selected=questionnaire.list,multiple=T), dateRangeInput("daterange_inp", "Date range", start='2016-01-01', end=Sys.Date()), dataTableOutput('tbl') ) server <- function(input, output) { output$tbl <- renderDataTable({ query <- paste0( "select questionnaire, question, answer, count(*) from ... where title in (", paste0(shQuote(input$questionnaire_inp), collapse=","), ") and date between to_date('", input$daterange_inp[1] ,"','YYYY-MM-DD') and to_date ('", input$daterange_inp[1] ,"','YYYY-MM-DD') group by questionnaire, question, answer") dt <- dbGetQuery(con, query) datatable(dt) }) shinyApp(ui = ui, server = server) 
0
source share

You can set the timeout to be a longer value . I'm not sure if an infinite value is possible (or a sufficiently long value).

Other ways not related to the database could be:

  • use data.table fread if you are reading csv. It can be several times faster than read.csv . Specifying column classes can increase speed even further.

  • Or use the binary .RDS format, which should be fast and smaller in size, thereby reading faster.

If you use .RDS .Rdata , there is not much in this aspect.

0
source share

All Articles