Dd: How to calculate the optimal block size?

Question

Dd: How to calculate the optimal block size?

How do you calculate the optimal block size when starting dd ? I studied it a little, and I did not find anything that would suggest how this will be achieved.

I get the impression that a higher block size will speed up dd ... is that true?

I am going to dd two identical Hitachi 500gb hard drives that run at 7200 rpm on a core with an Intel Core i3 processor with 4 GB of 1333 MHz DDR3 RAM, so I'm trying to figure out which block size to use. (I am going to download Ubuntu 10.10 x86 from a flash drive and run it from this.)

+98

linux dd

eckza May 28 '11 at 13:06

source share

6 answers

As others have said, there is no universally correct block size; what is optimal for one situation, or one piece of hardware can be terribly inefficient for another. In addition, depending on the health of the disks, it may be preferable to use a different block size than the one that is “optimal”.

One thing that is pretty reliable on modern hardware is that the default block size of 512 bytes is usually almost an order of magnitude slower than the more optimal alternative. When in doubt, I found that 64K is a pretty solid modern default. Although 64K is usually not the optimal block size, in my experience, it is usually much more efficient than the default. 64K also has a pretty solid history of reliability: you can find a message from the Eug-Lug mailing list around 2002, recommending a 64K block size here: http://www.mail-archive.com/eug-lug@efn.org/msg12073 .html

To determine the optimal size of the output block, I wrote the following script, which tests the recording of a test file 128M with dd in different block sizes by default from 512 bytes to a maximum of 64M. Be careful, this script uses dd internally, so use with caution.

dd_obs_test.sh:

 #!/bin/bash # Since we're dealing with dd, abort if any errors occur set -e TEST_FILE=${1:-dd_obs_testfile} TEST_FILE_EXISTS=0 if [ -e "$TEST_FILE" ]; then TEST_FILE_EXISTS=1; fi TEST_FILE_SIZE=134217728 if [ $EUID -ne 0 ]; then echo "NOTE: Kernel cache will not be cleared between tests without sudo. This will likely cause inaccurate results." 1>&2 fi # Header PRINTF_FORMAT="%8s : %s\n" printf "$PRINTF_FORMAT" 'block size' 'transfer rate' # Block sizes of 512b 1K 2K 4K 8K 16K 32K 64K 128K 256K 512K 1M 2M 4M 8M 16M 32M 64M for BLOCK_SIZE in 512 1024 2048 4096 8192 16384 32768 65536 131072 262144 524288 1048576 2097152 4194304 8388608 16777216 33554432 67108864 do # Calculate number of segments required to copy COUNT=$(($TEST_FILE_SIZE / $BLOCK_SIZE)) if [ $COUNT -le 0 ]; then echo "Block size of $BLOCK_SIZE estimated to require $COUNT blocks, aborting further tests." break fi # Clear kernel cache to ensure more accurate test [ $EUID -eq 0 ] && [ -e /proc/sys/vm/drop_caches ] && echo 3 > /proc/sys/vm/drop_caches # Create a test file with the specified block size DD_RESULT=$(dd if=/dev/zero of=$TEST_FILE bs=$BLOCK_SIZE count=$COUNT conv=fsync 2>&1 1>/dev/null) # Extract the transfer rate from dd STDERR output TRANSFER_RATE=$(echo $DD_RESULT | \grep --only-matching -E '[0-9.]+ ([MGk]?B|bytes)/s(ec)?') # Clean up the test file if we created one if [ $TEST_FILE_EXISTS -ne 0 ]; then rm $TEST_FILE; fi # Output the result printf "$PRINTF_FORMAT" "$BLOCK_SIZE" "$TRANSFER_RATE" done

View on GitHub

I tested only this script on Debian (Ubuntu) and OSX Yosemite, so it may need some improvement to work with other Unix accessories.

By default, the command will create a test file named dd_obs_testfile in the current directory. In addition, you can specify the path to the user test file by specifying the path after the script name:

 $ ./dd_obs_test.sh /path/to/disk/test_file

The script output is a list of measured block sizes and their transfer like this:

 $ ./dd_obs_test.sh block size : transfer rate 512 : 11.3 MB/s 1024 : 22.1 MB/s 2048 : 42.3 MB/s 4096 : 75.2 MB/s 8192 : 90.7 MB/s 16384 : 101 MB/s 32768 : 104 MB/s 65536 : 108 MB/s 131072 : 113 MB/s 262144 : 112 MB/s 524288 : 133 MB/s 1048576 : 125 MB/s 2097152 : 113 MB/s 4194304 : 106 MB/s 8388608 : 107 MB/s 16777216 : 110 MB/s 33554432 : 119 MB/s 67108864 : 134 MB/s

(Note: the unit of transmission rate depends on the OS)

To check the optimal read block size, you can use more or less the same process, but instead of reading from / dev / zero and writing to disk, you read from disk and write to / dev / null. A script for this might look like this:

dd_ibs_test.sh:

 #!/bin/bash # Since we're dealing with dd, abort if any errors occur set -e TEST_FILE=${1:-dd_ibs_testfile} if [ -e "$TEST_FILE" ]; then TEST_FILE_EXISTS=$?; fi TEST_FILE_SIZE=134217728 # Exit if file exists if [ -e $TEST_FILE ]; then echo "Test file $TEST_FILE exists, aborting." exit 1 fi TEST_FILE_EXISTS=1 if [ $EUID -ne 0 ]; then echo "NOTE: Kernel cache will not be cleared between tests without sudo. This will likely cause inaccurate results." 1>&2 fi # Create test file echo 'Generating test file...' BLOCK_SIZE=65536 COUNT=$(($TEST_FILE_SIZE / $BLOCK_SIZE)) dd if=/dev/urandom of=$TEST_FILE bs=$BLOCK_SIZE count=$COUNT conv=fsync > /dev/null 2>&1 # Header PRINTF_FORMAT="%8s : %s\n" printf "$PRINTF_FORMAT" 'block size' 'transfer rate' # Block sizes of 512b 1K 2K 4K 8K 16K 32K 64K 128K 256K 512K 1M 2M 4M 8M 16M 32M 64M for BLOCK_SIZE in 512 1024 2048 4096 8192 16384 32768 65536 131072 262144 524288 1048576 2097152 4194304 8388608 16777216 33554432 67108864 do # Clear kernel cache to ensure more accurate test [ $EUID -eq 0 ] && [ -e /proc/sys/vm/drop_caches ] && echo 3 > /proc/sys/vm/drop_caches # Read test file out to /dev/null with specified block size DD_RESULT=$(dd if=$TEST_FILE of=/dev/null bs=$BLOCK_SIZE 2>&1 1>/dev/null) # Extract transfer rate TRANSFER_RATE=$(echo $DD_RESULT | \grep --only-matching -E '[0-9.]+ ([MGk]?B|bytes)/s(ec)?') printf "$PRINTF_FORMAT" "$BLOCK_SIZE" "$TRANSFER_RATE" done # Clean up the test file if we created one if [ $TEST_FILE_EXISTS -ne 0 ]; then rm $TEST_FILE; fi

View on GitHub

An important difference in this case is that the test file is a file written by a script. Do not specify this command in an existing file, or the existing file will be overwritten with zeros!

For my specific hardware, I found that 128K was the most optimal input block size on the hard drive, and 32K was the most optimal size on the SSD.

Although this answer covers most of my conclusions, I often came across this situation when I wrote about this on my blog: http://blog.tdg5.com/tuning-dd-block-size/ You can find more details about tests that I spent there.

+61

tdg5 Jan 05 '15 at 2:01

source share

I found that my optimal block size is 8 MB (which corresponds to the disk cache?). I had to erase (some say: erase) empty disk space before creating a compressed image of it. I used:

 cd /media/DiskToWash/ dd if=/dev/zero of=zero bs=8M; rm zero

I experimented with values from 4K to 100M.

After letting dd run for a while, I killed it (Ctlr + C) and read the output:

 36+0 records in 36+0 records out 301989888 bytes (302 MB) copied, 15.8341 s, 19.1 MB/s

Since dd displays the I / O speed (in this case, 19.1 MB / s), it is easy to see if your chosen value works better than the previous one or worse.

My marks:

 bs= I/O rate --------------- 4K 13.5 MB/s 64K 18.3 MB/s 8M 19.1 MB/s <--- winner! 10M 19.0 MB/s 20M 18.6 MB/s 100M 18.6 MB/s

Note. To check the size of the disk cache / buffer, you can use sudo hdparm -i/dev/sda

+10

unfa Aug 2 '14 at 16:13

source share

You can try using dd-opt , a small utility that I wrote.

(Improvements / refinements are welcome!)

+4

sampablokuper Apr 18 2018-12-18T00:

source share

It is completely system dependent. You should experiment to find the best solution. Try starting with bs=8388608 . (Since Hitachi hard drives have an 8 MB cache.)

+3

ssapkota May 28 '11 at 13:24

source share

for best use, use the largest block size that RAM can accommodate (will send fewer I / O calls to the OS)
for better accuracy and data recovery, set the block size to your own input sector size

Since dd copies the data with the conv = noerror, sync parameter, any errors it encounters will cause the rest of the block to be replaced with zero bytes. Larger block sizes will be copied faster, but each time an error occurs, the rest of the block is ignored.

source

0

eadmaster Nov 03 '13 at 13:12

source share

user25148 · Accepted Answer · 2011-05-28 13:21

The optimal block size depends on various factors, including the operating system (and its version) and various hardware buses and disks. Several Unix-like systems (including Linux and at least some variations of BSD) define the st_blksize member in the struct stat , which gives what the kernel considers the optimal block size:

 #include <sys/stat.h> #include <stdio.h> int main(void) { struct stat stats; if (!stat("/", &stats)) { printf("%u\n", stats.st_blksize); } }

The best way could be an experiment: copy a gigabyte with different block sizes and time. (Remember to clear the kernel buffer cache before each run: echo 3 > /proc/sys/vm/drop_caches ).

However, as a rule, I found that a sufficiently large block size allows dd to do a good job, and the differences between, say, 64 KiB and 1 MiB are not significant, compared to 4 KiB compared to 64 KiB. (Although, admittedly, it has been some time since I did this. By default, I use mebibyte or just dd choose the size.)

Dd: How to calculate the optimal block size?

More articles: