How to save an R list object to a database?

Suppose I have a list of R objects, which themselves are lists. Each list has a specific structure: data, a model that corresponds to data and some attributes for identifying data. One example is the time series of certain economic indicators in specific countries. Therefore, my list object has the following elements:

data - historical time series for an economic indicator

country - name of the country, for example, USA

name - indicator name, for example, GDP

model - ARIMA orders found in auto.arima in the appropriate format can also be a list.

This is just an example. As I said, suppose I have a number of such objects in a list. I would like to save it in some suitable format. The obvious solution is to just use save , but it does not scale very well for a large number of objects. For example, if I only wanted to check a subset of objects, I would need to load all the objects into memory.

If my data is data.frame , I could save it to the database. If I wanted to work with a specific subset of the data, I would use SELECT and rely on the database to deliver the necessary subset. In this regard, I really liked SQLite. Is it possible to reproduce this for my described list object using some fantastic database like MongoDB? Or do I just need to think about how to convert my list into several related tables?

My motivation for this is to be able to easily create various reports on mounted models. I can write a bunch of functions that create some report for a given object, and then just use lapply in my list of objects. Ideally, I would like to parallelize this process, but this is another problem.

+4
source share
3 answers

I think I explained the basics of this somewhere earlier - the bottom line is that

  • R has built-in support for serialization and deserialization, so you can actually take any existing R object and turn it into binary or text serialization. My digest package uses this to turn serialization into a hash using various functions.

  • R has all the db connection you need.

Now, what suitable form and layout of db ... will depend on your features. But there is (as usual) nothing in R stopping you :)

+4
source

This question has been inactive for a long time. Since I have had the same issue lately, I want to add the information I learned. I understand these three requirements in the question:

  • so that data is stored in a suitable structure
  • scalability in terms of size and access time
  • the ability to efficiently read only subsets of data

Besides the possibility of using a relational database, you can also use the HDF5 file HDF5 , which is designed to store a large number of possible large objects. The choice depends on the type of data and how to access it.

Relational databases should be preferred if:

  • atomic data elements small
  • different data elements have the same structure.
  • no expectation when subsets will be considered data
  • convenient data transfer from one computer to another is not a problem, or computers that need data have access to the database.

HDF5 format should be preferred if:

  • Atomic data elements themselves are large objects (for example, matrices).
  • data elements are heterogeneous, they cannot be combined into a table similar to the representation
  • most of the time, data is read in groups that are known in advance
  • moving data from one computer to another does not require much effort.

In addition, it is possible to distinguish between relational and hierarchial relational , where the latter is contained in the former. In the HDF5 file, information fragments can be arranged hierarchically, for example:

 /Germany/GDP/model/... /Germany/GNP/data /Austria/GNP/model/... /Austria/GDP/data 

The rhdf5 package for processing HDF5 files is available on Bioconductor . General information on the HDF5 format is available here .

+1
source

Not sure if this is the same, but I had good experience working with time series objects:

 str() 

Maybe you can look at that.

0
source

All Articles