Big Dataset Package

The storage by columns in the inst/extdatapackage directory , as proposed by Jan , is now implemented in the dfunbindpackage .

I use data-rawidiom to do all the analysis from raw data to reproducible results. To do this, the data sets are first packaged in R packets, which can then be loaded with library().

One of the data sets that I use is quite large, with about 8 million cases with approximately 80 attributes. For my current analysis, I need only a small part of the attributes, but I would still like to pack the entire data set.

Now, if it is simply packed as a data frame (for example, c devtools::use_data()), it will be loaded completely the first time it is accessed. What would be the best approach to a package of this kind of data so that I can lazily load at the column level? (Only those columns that I actually access are loaded, the rest happily remain on disk and do not take up RAM.) Will ffpackage help? Can someone point me to a working example?

+5
source share
1 answer

, inst/extdata. , . , : system.file("extdata", "yourfile", package = "yourpackage"). ( , ).

, , . . :

  • sqlite: sqlite. , rsqlite.
  • ff: ff (, , save.ffdf ffbase; load.ffdf ). ff ( ). , , Intel, .
  • CSV: CSV . , LaF. , , , ff, .
  • RDS: RDS ( saveRDS) , readRDS, , - R-. . , ( ).

, RDS.

RDS

, :

load_data <- function(dataset, columns) { 
  result <- vector("list", length(columns));
  for (i in seq_along(columns)) {
    col <- columns[i]
    fn <- system.file("extdata", dataset, paste0(col, ".RDS"), package = "lazydata")
    result[[i]] <- readRDS(fn)
  }
  names(result) <- columns
  as.data.frame(result)
}

store_data <- function(package, name, data) {
  dir <- file.path(package, "inst", "exdata", name)
  dir.create(dir, recursive = TRUE)
  for (col in names(data)) {
    saveRDS(data[[col]], file.path(dir, paste0(col, ".RDS")))
  }
}

packagename <- "lazyload"
package.skeleton(packagename, "load_data")
store_data(packagename, "iris", iris)

( , , ), :

library(lazyload)
data <- load_data("iris", "Sepal.Width")

Sepal.Width .

, load_data: , , , , , , .

+5

All Articles