What is the fastest way to read a large file in Ruby?

I saw the answers to this question, but I could not understand which of the answers the fastest would perform. These are the answers I saw - which is better?

  • Read one line at a time using each or each line
  • Read one line at a time using gets
  • Save it all as an array of strings using readlines and then use each
  • Use grep (not sure what to do with grep ...)
  • Use sed (not sure what to do with sed ...)
  • Something else?

Also, would it be better to just use a different language or Ruby in order?

EDIT:

Details: Each line contains something like "id1 attr1_1 attr2_1 id2 attr1_2 attr2_2 ... idn attr1_n attr2_n" (n is very large), and I need to insert them into the database. For this example row, I will need to insert n rows into the database.

+6
source share
2 answers

Ruby will most likely use the same or very similar low-level code (written in C) to actually read from disk for the first three parameters, so they should work in a similar way. Given this, you should choose what is most convenient for you; The ability to do this is what makes languages ​​such as Ruby so useful! You will read a large amount of data from disk, so I would suggest using each_line and process each line when reading it.

I would not recommend bringing grep , sed or any other similar external utilities to the picture, unless you have a good reason, as they will make your code less portable and lead you to failures that can be difficult to diagnose.

+4
source

If you use Ruby, you need not worry about performance. The language is such that it approaches an iterative approach to reading a file, line by line, and works very well. As long as you use the language in the way it was designed, you can let people translators worry about performance. Completion of the assignment.

If you need some specific readLargeFileFast method, then this should be because it really interferes with the program. Now you are writing a C program to do this, and popen is like a separate process in your ruby ​​code. You can name it read_large.c and (possibly) use command line arguments to tell you how to behave.

This defends the idea that the scripting language is used for quick development, and not for quick start. Since such a developer can be very productive due to the rapid "prototyping" of the program in something like Ruby, and only later rewriting the components guarantees some low-level code. Often, however, once it runs in a script, you don’t need to do anything at all.

Ruby Docs describe how to start a separate process and process it as a file. It is easy! A good start is the introductory paragraph of Art of Linux Programming on program modularity . This book is also a great example of using a standard linux thread editor called sed , which you could probably use with Ruby now.

If you need to parse or edit a lot of text, many interpreters or editors have been written around the sed function. Also, it can save you a lot of effort by writing something super-efficient if you don’t know C. Good - Introduction to SED by Bruce Barnett.

+2
source

All Articles