Ruby concurrency / asynchronous processing (with a simple use case)

Question

Ruby concurrency / asynchronous processing (with a simple use case)

I explored the possibilities of ruby parallel / asynchronous processing and read many articles and blog posts. I looked through EventMachine, Fibers, Revactor, Reia, etc. Etc. Unfortunately, I could not find a simple, effective (and non-IO-blocking) solution for this very simple use case:

File.open('somelogfile.txt') do |file| while line = file.gets # (R) Read from IO line = process_line(line) # (P) Process the line write_to_db(line) # (W) Write the output to some IO (DB or file) end end

You can see my little script does three reads ( R ), process ( P ) and writes ( W ). Suppose, for simplicity, that each operation takes exactly 1 unit of time (for example, 10 ms), so the current code will do something like this (5 lines):

 Time: 123456789012345 (15 units in total) Operations: RPWRPWRPWRPWRPW

But I would like him to do something like this:

 Time: 1234567 (7 units in total) Operations: RRRRR PPPPP WWWWW

Obviously, I could start three processes (reader, processor, and writer) and transfer the read lines from the reader to the processor queue, and then transfer the processed lines to the write queue (all are coordinated, for example, using RabbitMQ). But the precedent is so simple that it just does not feel good.

Any tips on how this can be done (without switching from Ruby to Erlang, Closure or Scala)?

+6

ruby asynchronous concurrency fiber eventmachine

Dim Oct 25 '10 at 12:14

source share

2 answers

Jeh · Answer 1 · 2010-10-25T17:41:47+0000

If you need it to be truly parallel (from one process), I believe that you will have to use JRuby to get true native threads and without GIL.

You can use something like DRb to distribute processing across multiple processes / cores, but for your use case this is not much. Instead, you can try connecting several processes using pipes:

 $ cat somelogfile.txt | ruby ./proc-process | ruby ./proc-store

In this scenario, each part is its own process that can run in parallel, but interacts using STDIN / STDOUT. This is perhaps the easiest (and fastest) approach to your problem.

 # proc-process while line = $stdin.gets do # do cpu intensive stuff here $stdout.puts "data to be stored in DB" $stdout.flush # this is important end # proc-store while line = $stdin.gets do write_to_db(line) end

Mark thomas · Answer 2 · 2010-10-25T13:06:04+0000

Check the peach ( http://peach.rubyforge.org/ ). Performing the "each" parallel could not be simpler. However, as the documentation says, you need to run under JRuby in order to use your own JVM thread record.

Check out Jorg Mittag's answer to this SO question for details on the multithreading capabilities of various Ruby interpreters.

Ruby concurrency / asynchronous processing (with a simple use case)

More articles: