Difference between ff and filehash package in R

Question

Difference between ff and filehash package in R

I have a data framework of 25 columns and ~ 1M rows divided into 12 files, now I need to import them and then use some reshape package to manage some data. Each file is too large, and I need to look for some kind of "non-RAM" solution for importing and processing data, currently I do not need to do any regressions, I will have some descriptive statistics for data only.

I searched a bit and found two packages: ff and filehash , first read the filehash instruction and found that it seems simple, some code was just added when importing the dataframe into a file, the rest seems to be like normal R operations.

I have not tried ff yet, as it comes with many different classes, and I wonder if it is worth investing time to understand ff before starting my present work. But the filehash package seems to be static at one time and this package is little discussed, I wonder if filehash less popular or even outdated.

Can someone help me choose which package to use? Or can someone tell me what is the difference between the "pluses and minuses" between them? Thanks.

update 01

I am currently using filehash to import a data frame and understand that it imported with filehash will count as readonly since all further changes to this data frame will not be saved back to the file, unless you save it again. which is not very convenient, in my opinion, since I need to remind myself to make savings. Any comments on this?

+7

import r bigdata filehash

lokheart Mar 29 '12 at 2:47

source share

No one has answered this question yet.

See similar questions:

2

R packages for fast loading of big data

0

How to handle an array of arrays in R?

or similar:

657