Improve write speed for high-speed file copy?

I am trying to find the fastest way to encode a file copy procedure in order to copy a large file to RAID 5 hardware.

The average file size is about 2 GB.

There are 2 windows (both start win2k3). The first field is the source where the large file is located. And the second block has RAID 5 storage.

http://blogs.technet.com/askperf/archive/2007/05/08/slow-large-file-copy-issues.aspx

The link above clearly explains why copies of Windows, robocopy, and other common copying utilities suffer from write performance. Therefore, I wrote a C / C ++ program that uses the CreateFile, ReadFile, and WriteFile APIs with the NO_BUFFERING and WRITE_THROUGH . The program simulates ESEUTIL.exe, in the sense that it uses 2 streams, one for reading and one for writing. The reader stream reads 256 KB from the source and fills the buffer. When 16 such 256KB blocks are full, the write stream writes the contents to the buffer in the destination file. As you can see, the writer stream writes 8 MB of data in 1 frame. The program allocates 32 such blocks of 8 MB ... therefore, writing and reading can occur in parallel. Details of ESEUtil.exe can be found in the link above. Note. I deal with data alignment issues when using NO_BUFFERING .

I used desktop marking utilities such as ATTO and found that our RAID 5 hardware had a write speed of 44 MB per second when writing 8 MB of data. Which is 2.57 GB per minute .

But my program can only reach 1.4 GB per minute.

Can anyone help me determine what the problem is? Is there a faster API other accessible by CreateFile , ReadFile , WriteFile ?

+6
c ++ windows copy raid
source share
7 answers

You should use async IO for maximum performance. This is opening a file with FILE_FLAG_OVERLAPPED and using the LPOVERLAPPED WriteFile argument. You may or may not improve performance with FILE_FLAG_NO_BUFFERING . You will need to check to see.

FILE_FLAG_NO_BUFFERING , as a rule, gives you a more stable speed and improves streaming behavior, and this avoids the pollution of your disk cache with data that you no longer need, but it is not necessarily faster.

You should also check what is the best size for each I / O block. In my experience, there is a huge performance difference between copying a 4k file at a time and copying 1Mb at a time.

In my last testing of this (a few years ago), I found that block sizes below about 64 KB were dominated by overhead, and overall throughput continued to improve with larger block sizes to about 512 KB. I would not be surprised if today you will need to use block sizes of more than 1 MB to get the maximum throughput.

The numbers you are using now seem reasonable, but may not be optimal. I am also sure that FILE_FLAG_WRITE_THROUGH prevents the use of cache on the disk and therefore will cost you pretty high performance.

You also need to know that copying files using CreateFile / WriteFile will not copy metadata, such as timestamps or alternative data streams to NTFS. You will have to deal with these things yourself.

Actually replacing CopyFile with your own code is quite a bit of work.

Addendum:

I should probably mention that when I tried this with Raid 0 software on WindowsNT 3.0 (about 10 years ago). The speed was VERY sensitive to alignment in the buffer memory. It turned out that at that time the SCSI drivers had to use a special algorithm to execute DMA from the scatter / gather list when the DMA had more than 16 physical memory areas (64Kb). Physically contiguous distributions are required to ensure optimal performance - this is something that only drivers can request. This was basically a workaround for a bug in the DMA controller of the popular chipset at the time, and was unlikely to still be a problem.

BUT. I would still strongly recommend that you test ALL the power of 2 blocks in size from 32 kb to 32 MB to find out which is faster. And you can think about testing to make sure that some buffers are consistently faster than others - this is not unheard of.

+6
source share

And back, I wrote a blog post about asynchronous I / O I / O and how often it tends to actually be synchronous if you don't do everything as best as possible ( http://www.lenholgate.com/blog/2008 /02/when-are-asynchronous-file-writes-not-asynchronous.html ).

The key point is that even if you use FILE_FLAG_OVERLAPPED and FILE_FLAG_NO_BUFFERING , you still need to pre-decrypt the file so that your asynchronous records do not need to expand the file as they arrive; for security reasons, the file extension is always synchronous. For preliminary expansion you need to do the following:

  • Enable privilege SE_MANAGE_VOLUME_NAME .
  • Open the file.
  • Look for the required file length using SetFilePointerEx() .
  • Set the end of the file using SetEndOfFile() .
  • Specify the end of valid data in the SetFileValidData() file.
  • Close the file.

Then...

  • Open the file for writing.
  • Display records
+2
source share

How fast can you read the source file if you are not writing a destination?

Is the source file fixed? Fragmented reads can be an order of magnitude slower than continuous reads. You can use the contig utility to make it contiguous:

http://technet.microsoft.com/en-us/sysinternals/bb897428.aspx

How fast does a network connect two machines?

Did you try to just write dummy data without reading it first, like ATTO?

Do you have multiple requests to read or write in flight at the same time?

What is the lane size of your RAID-5 array? Writing a full strip at a time is the fastest way to write to RAID-5.

0
source share

Just remember that the hard disk buffers the data coming from the records and going to the records. Most disk drives will attempt to optimize read requests in order to support plate movement and minimize head movement. The drives try to absorb so much data from the host before writing to the tablets so that the host can be disconnected as soon as possible.

Your performance also depends on the I / O bus traffic on the PC, as well as the traffic between the drive and the host. There are other alternative factors that can be taken into account, for example, system tasks and programs that run “simultaneously”. You may not be able to achieve accurate performance as a measuring tool. And remember that these timings have an error rate due to the aforementioned overhead.

If your platform has DMA controllers, try using them.

0
source share

If write speed is important, why not consider RAID 0 for your hardware configuration?

  • The client wants RAID 5.
  • Preferred for RAID 0 because of better fault tolerance.
  • The client is satisfied that RAID 5 can offer. The question here is comparing hardware using ATTO, the write speed is 2.57 GB per minute (8 MB chunk write), why is it impossible to use the copy tool next to it? Something like 2 GB per min is what we are looking at. So far, we have managed to achieve only 1.5 GB per minute.
0
source share

The right way to do this is with unbuffered, fully asynchronous I / O. You will want to issue several I / O to save the queue. This allows the Raid-5 file system, driver, and subsystem to more effectively manage I / O.

You can also open multiple files and issue read and wites for multiple files.

ATTENTION! The optimal number of outstanding I / O operations and how you alternate between reading and writing will largely depend on the storage subsystem itself. Your program should be highly appreciated so that you can customize it.

Note. I believe Robocopy has been improved - have you tried it? I

0
source share

I did some tests and got some results. Tests were performed on a network adapter 100 Mbit / s and 1 Gbit / s. The source computer is the Win2K3 server (SATA), and the target machine is the Win2k3 server (RAID 5).

I conducted 3 tests:

1) Network reader → This program simply reads files over the network. The goal of the program is to find the maximum reading speed n / w. I am performing NON BUFFERED reads using CreateFile and ReadFile.

2) Disk Writer -> This program compares RAID 5 speed by writing data. NO BUFFERED writes are performed using CreateFile and WriteFile.

3) Blitz Copy → This program is a file copying mechanism. It copies files over the network. The logic of this program was discussed in the original question. I use synchronous I / O with NO_BUFFERING Reads and Writes. The APIs used are CreateFile, ReadFile, and WriteFile.


The following are the results:

NETWORK READER: -

100 Mbps NIC

Took 148344 ms to read 768 MB with a block size of 8 KB.

Took 89359 ms to read 768 MB with a block size of 64 KB

Took 82625 ms for reading 768 MB with a block size of 128 KB

Took 79594 ms to read 768 MB with a block size of 256 KB

Took 78687 ms to read 768 MB with a block size of 512 KB

Took 79078 ms to read 768 MB with a block size of 1024 KB

Took 78594 ms to read 768 MB with a block size of 2048 KB

Took 78406 ms to read 768 MB with a block size of 4096 KB

Took 78281 ms to read 768 MB with a block size of 8192 KB

1 Gbps NIC

Got 206203 ms for reading 5120 MB (5 GB) with a block size of 8 KB

Taken 77860 ms for reading 5120 MB with a block size of 64 KB

Took 74531 ms for reading 5120 MB with a block size of 128 KB

Took 68656 ms to read 5120 MB with a block size of 256 KB

Took 64922 ms to read 5120 MB with a block size of 512 KB

Took 66312 ms for reading 5120 MB with a block size of 1024 KB

Took 68688 ms for reading 5120 MB with a block size of 2048 KB

Took 64922 ms to read 5120 MB with a block size of 4096 KB

Took 66047 ms for reading 5120 MB with a block size of 8192 KB

DISK WRITER: -

Write is performed on RAID 5 with NO_BUFFERING and WRITE_THROUGH

Writing 2048 MB (2 GB) of data with a block size of 4 MB takes 68328 ms.

Writing 2048 MB of data with a block size of 8 MB took 55985 ms.

Recording 2048 MB of data with a block size of 16 MB took 49569ms.

Writing 2048 MB of data with a block size of 32 MB took 47281 ms.

Write on RAID 5 Only with NO_BUFFERING

Recording 2048 MB (2 GB) of data with a block size of 4 MB took 57484 ms.

Writing 2048 MB of data with a block size of 8 MB takes 52594 ms.

Recording 2048 MB of data with a block size of 16 MB took 49125 ms.

Writing 2048 MB of data with a block size of 32 MB takes 46360 ms.

Recording performance deteriorates linearly as block size decreases. And the WRITE_THROUGH flag represents some kind of performance hit

BLITZ COPY: -

1 Gb / s NIC, copy 60 GB files with NO_BUFFERING

Time to complete copying: 2236735 ms. That is, 37.2 minutes. The speed is ~ 97 GB / s.

100 Mbps NIC, copy 60 GB files with NO_BUFFERING

Time required to complete copying: 7337219 ms. That is, 122 minutes. Speed ​​~ 30 GB / s.

I tried using the 10-FileCopy Jeffrey Ritcher program, which uses Async-IO with NO_BUFFERING. But the results were bad. I think the reason may be the 256 KB block size ... 256 KB writing on RAID 5 is awfully slow.

Comparison with robocopy:

100 Mbps NIC: Blitz Copy and copy copy @ ~ 30 GB per hour.

1 GBps NIC: Blitz Copy goes @ ~ 97 GB per hour, and robocopy @ ~ 50 GB per hour.

0
source share

All Articles