The most efficient way to write data to a file

I want to write 2TB data to a single file, in the future it may be petabytes.

Data consists of all '1' . For example, 2 TB data consisting of "1111111111111......11111" (each byte is represented by "1").

The following is my way:

 File.open("data",File::RDWR||File::CREAT) do |file| 2*1024*1024*1024*1024.times do file.write('1') end end 

This means that File.write is called 2TB times. From a Ruby point of view, is there a better way to implement it?

+6
source share
4 answers

You have a few problems:

  • File::RDWR||File::CREAT always evaluated before File::RDWR . You mean File::RDWR|File::CREAT ( | , not || ).

  • 2*1024*1024*1024*1024.times do starts the loop 1024 times, and then multiplies the loop result by the material on the left. You mean (2*1024*1024*1024*1024).times do .

As for your question, I get significant speedup by writing 1024 bytes at a time:

 File.open("data",File::RDWR|File::CREAT) do |file| buf = "1" * 1024 (2*1024*1024*1024).times do file.write(buf) end end 

You can experiment and find a better buffer size than 1024.

+7
source

I don’t know which OS you are using, but the fastest approach would be for us a system copy, to combine files into one large file, you can script. Example. If you start with a string like "1" and echo it to a file

 echo "1" > file1 

you can associate this file with yourself within a few seconds with the new file, in windows you must use the / b option for the binary copy to do this.

 copy /b file1+file1 file2 

gives you a file2 of 12 bytes (including CR)

 copy file2+file2 file1 

gives you 24 bytes, etc.

I will give you the math (and the fun of Rubing this), but you will reach your size fast enough and probably faster than the accepted answer.

0
source

A related answer, if you want to write binary zeros of any size, just do it using the dd command (Linux / Mac):

 dd if=/dev/zero of=output_file bs=128K count=8000 

bs - block size (the number of bytes to read / write at a time. count - the number of blocks. The above line writes 1 Gegabyte zeros to the output_file in just 10 seconds on my machine:

 1048576000 bytes (1.0 GB) copied, 10.275 s, 102 MB/s 

Maybe someone is interested!

0
source

All data? Then there is no need to write them, just write their number.

 file.write( 2*1024*1024*1024*1024 ) 

Simple, huh?

-2
source

Source: https://habr.com/ru/post/922414/


All Articles