Predict memory usage in R

I uploaded a huge file from the Dataset library to the UCI Machine. (~ 300mb).

Is there a way to predict the memory needed to load a dataset before loading it into R-memory?

Googled has a lot, but wherever I could find how to calculate memory with an R-profiler and several other packages, but after loading the objects in R.

+4
source share
3 answers

based on the R-programming course, U can calculate the approximate memory usage using the number of rows and columns inside the data β€œU can get this information from the code / metafile file”

memory required = no. Column * No. strings * 8 bytes / numeric

, 150000 120 , 1,34 .

U , .

+5

csv, object.size. wc , :

top.size <- object.size(read.csv("simulations.csv", nrow=1000))
lines <- as.numeric(gsub("[^0-9]", "", system("wc -l simulations.csv", intern=T)))
size.estimate <- lines / 1000 * top.size

, , , size.estimate csv; , top.size. , , 1000 .

+4

R has an object.size () function that provides an estimate of the memory that is used to store the object R. You can use the following:

  predict_data_size <- function(numeric_size, number_type = "numeric") {
  if(number_type == "integer") {
    byte_per_number = 4
  } else if(number_type == "numeric") {
    byte_per_number = 8 #[ 8 bytes por numero]
  } else {
    stop(sprintf("Unknown number_type: %s", number_type))
  }
  estimate_size_in_bytes = (numeric_size * byte_per_number)
  class(estimate_size_in_bytes) = "object_size"
  print(estimate_size_in_bytes, units = "auto")
}
# Example
# Matrix (rows=2000000, cols=100)
predict_data_size(2000000*100, "numeric") # 1.5 Gb
0
source

All Articles