I have a Ruby hash that reaches about 10 megabytes if it is written in a file using Marshal.dump. After gzip compression, it is about 500 kilobytes.
Iterating and changing this hash is very fast in ruby (fractions of a millisecond). Even copying is very fast.
The problem is that I need to split the data in this hash between the Ruby on Rails processes. To do this using the Rails cache (file_store or memcached), I need to first specify the Marshal.dump file, however this leads to a delay of 1000 milliseconds when serializing the file and 400 millisecond delay when it is serialized.
Ideally, I would like to be able to save and load this hash from every process in less than 100 milliseconds.
One idea is to create a new Ruby process to store this hash, which provides an API to other processes to modify or process the data inside it, but I want to avoid this if I'm not sure there are no other ways to quickly access this object.
Is there a way by which I can more directly share this hash between processes without the need for serialization or deserialization?
Here is the code that I use to generate a hash similar to the one I'm working with:
@a = [] 0.upto(500) do |r| @a[r] = [] 0.upto(10_000) do |c| if rand(10) == 0 @a[r][c] = 1 # 10% chance of being 1 else @a[r][c] = 0 end end end @c = Marshal.dump(@a) # 1000 milliseconds Marshal.load(@c) # 400 milliseconds
Update:
Since my initial question did not receive many answers, I assume that there is no solution as easy as I would have hoped.
I am currently considering two options:
- Create a Sinatra application to store this hash with an API to modify / access it.
- Create a C application to do the same as # 1, but much faster.
The scope of my problem has increased, so the hash may be larger than my original example. So # 2 may be necessary. But I have no idea where to start from the point of writing a C application that provides the appropriate API.
A good walk through how best to implement # 1 or # 2 can get a better answer.
Update 2
I ended up implementing this as a standalone application written in Ruby 1.9, which has a DRb interface for communicating with application instances. I use Daemons Stone to create DRb instances when the web server starts. When launched, the DRb application downloads the necessary data from the database, and then contacts the client to return results and stay up to date with the latest events. Now it works well. Thanks for the help!
performance c ruby ruby-on-rails serialization
Gdeglin
source share