Store big data in redis via R

I have a number of large data frames in R that I planned to store using redis. I am completely new to redis, but read about it today and used the R rredis package.

I played with small data and saved and retrieved small data frames using the redisSet() and redisGet() functions. However, when it came to saving my larger data frames (the largest of which is 4.3 million rows and 365 MB when saved as a .RData file) using the redisSet('bigDF', bigDF) code redisSet('bigDF', bigDF) , I get the following message about error:

 Error in doTryCatch(return(expr), name, parentenv, handler) : ERR Protocol error: invalid bulk length In addition: Warning messages: 1: In writeBin(v, con) : problem writing to connection 2: In writeBin(.raw("\r\n"), con) : problem writing to connection 

Presumably due to too much data to save. I know that redisSet writes a dataframe as a string, which may not be the best way to do this with large data frames. Does anyone know how to do this?

EDIT: I recreated the error by creating a very large dummy frame:

 bigDF <- data.frame( 'lots' = rep('lots',40000000), 'of' = rep('of',40000000), 'data' = rep('data',40000000), 'here'=rep('here',40000000) ) 

Running redisSet('bigDF',bigDF) gives me an error:

  Error in .redisError("Invalid agrument") : Invalid agrument 

the first time, then starting it again right after that, I get an error

 Error in doTryCatch(return(expr), name, parentenv, handler) : ERR Protocol error: invalid bulk length In addition: Warning messages: 1: In writeBin(v, con) : problem writing to connection 2: In writeBin(.raw("\r\n"), con) : problem writing to connection 

thanks

+6
source share
1 answer

In short: you cannot. Redis can store a maximum of 512 MB of data in a string value , and your serial frame of demo data is larger than this:

 > length(serialize(bigDF, connection = NULL)) / 1024 / 1024 [1] 610.352 

Technical Reference:

serialize is called in the package's .cerealize function via redisSet and rredis:::.redisCmd :

 > rredis:::.cerealize function (value) { if (!is.raw(value)) serialize(value, ascii = FALSE, connection = NULL) else value } <environment: namespace:rredis> 

Offtopic: why do you need to store such a large dataset in redis? Redis is for small key-value pairs. On the other hand, I had some success by storing large R arrays in CouchDB and MongoDB (with GridFS) by adding compressed RData as an attachment.

+7
source

All Articles