Reading in large text files in r

I want to read in a large ido file that has a little over 110,000,000 rows and 8 columns. Columns consist of 2 integer columns and 6 logical columns. The file uses the delimiter "|" . I tried using read.big.matrix and it took forever. I also tried dumpDf and it ran out of RAM. I tried ff , which I heard was a good package, and I'm struggling with bugs. I would like to do some analysis with this table if I can read it somehow. If anyone has any suggestions that would be great. Regards, Lorcan

+4
source share
2 answers

Thanks for all your suggestions. I managed to find out why I can’t get the error to work. I will give you all the answers and suggestions so that no one can repeat my stupid mistake.

First of all, the data that gave me contained some errors, so I was doomed to fail from the very beginning. I did not know until a colleague met him in another software. The column containing the integers had several letters, so when the package read.table.ff tried to read in the data set, it was somehow confused or I don’t know. Regardless, they gave me another sample of data, 16,000,000 rows and 8 columns with the correct records, and it worked perfectly. The code I ran looks like this and takes about 30 seconds to read:

 setwd("D:/data test") library(ff) ffdf1 <- read.table.ffdf(file = "test.ido", header = TRUE, sep = "|") 

Thank you all for your time, and if you have any questions about the answer, feel free to ask and I will do my best to help.

+8
source

Do you really need all the data for your analysis? Perhaps you can fill out your data set (say, from minute values ​​to daily average values). This aggregation needs to be done only once, and I hope this will be done in pieces. Thus, you need to immediately load all your data into memory.

Reading in chunks can be done using scan , important arguments are skip and n . Also, put your data in the database and extract the chunks this way. You could even use the functions from the plyr package to run pieces in parallel, see this blog post for an example.

+2
source

All Articles