The fastest way to parse a large file in Ruby

Question

The fastest way to parse a large file in Ruby

I have a simple text file of ~ 150 MB in size. My code will read every line, and if it matches certain regular expressions, it is written to the output file. But right now, it will take a long time to repeat all lines of the file (several minutes), doing it like

File.open(filename).each do |line| # do some stuff end

I know this is a loop over the lines of a file, which takes some time, because even if I do nothing with the data in "# some some stuff", it still takes a lot of time.

I know that some unix programs can parse large files like this almost instantly (like grep), so I wonder why Ruby (MRI 1.9) takes so long to read a file, and is there a way to make it faster?

+7

ruby

Davis dimitriov May 10, '11 at 20:24

source share

3 answers

It is not fair to compare with grep because it is a very customizable utility that only scans data and does not store it. When you read this file using Ruby, you end up allocating memory for each line and then releasing it during the garbage collection cycle. grep is a fairly complex and average regex engine.

You may find that you can achieve the desired speed with an external program, for example, grep , called with system or through a pipe:

 `grep ABC bigfile`.split(/\n/).each do |line| # ... (called on each matching line) ... end

+4

tadman May 10, '11 at 20:40

source share

You have to read it in memory and then analyze it. Of course, it depends on what you are looking for. Do not expect miracle performance from ruby, especially compared to c / C ++ programs that have been optimized over the past 30 years; -)

-2

Zepplock May 10, '11 at 20:48

source share

steenslag · Accepted Answer · 2011-05-10T20:52:46+0000

 File.readlines.each do |line| #do stuff with each line end

Reads the entire file into a single array of strings. It should be much faster, but more memory is required.

The fastest way to parse a large file in Ruby

More articles: