Optimization for O_DIRECT records

Question

Optimization for O_DIRECT records

I am trying to write an application that will need to be written to disk very quickly. I hit my target performance for writing to disk, and that's great.

However, I noticed that writing to disk uses CPU time so fast: one core is maximized, the other 80%, and the other 2 10-20%. Therefore, I heard that O_DIRECT can reduce CPU consumption by avoiding all these copies in kernel space and then copying to disk.

I conducted a small test program that confirmed this: CPU usage drops to 50% of a single core - much better.

However, I never had the same bandwidth as with regular recording, and to make it fast, I had to use a really large recording size (something like 130 MB!)

So the question is what I think:

Is there a better way to reduce CPU usage than O_DIRECT for writing? or
How can I get a similar throughput for what the kernel gets?

My environment is Linux, I use RAID 50, and I can write records until I get the optimal write size. There will be only one writer at a time.

+4

c ++ c linux io

Frederik Jul 25 '11 at 10:50

source share

3 answers

orlp · Answer 1 · 2011-07-25T10:57:54+0000

Quote this page :

Using O_DIRECT, the kernel will execute DMA directly from / to the physical memory indicated by [in] the user space buffer passed as parameter [a] for read / write system calls. Thus, in the copies between the user space memory and the kernel cache, there will be no CPU and memory bandwidth, and in the cache management, processor time will not be spent on the kernel (such as cache search, page locks, etc.).

Basically you trade bandwidth for CPU performance when using O_DIRECT . The kernel ceases to optimize bandwidth for you, and in return you get predicted results and complete control.

In short: with O_DIRECT you will need to do caching and other optimizations, increasing throughput. The sheer size of the recording doesn’t seem so strange now.

I do not know other methods, but I am not a linux guru. Feel free to ask :)

Patrick schlüter · Answer 2 · 2011-07-25T11:30:36+0000

Have you tried with mmap and msync ? I don’t know if it runs faster or less than the processor, but since it represents a whole different approach to I / O (basically it is the kernel that does the I / O for you), this could be an interesting place.

Anon · Answer 3 · 2018-02-27T12:18:33+0000

You will need to somehow organize a larger amount of input-output, which will be stored in flight at the same time and send them in the optimal size. When the kernel buffers your I / O records together, there are a number of advantages that can occur:

It may be possible to combine adjacent inputs / outputs together into large inputs / outputs. If so, it is possible to save overhead, and instead of sending 8 small 4KBytes I / O down the kernel, it can now send 1 64Kbyte of I / O down (for example).
This opens up the possibility of parallel presentation. If the kernel can load up to 256 thousand, now it will be able to send it as 8 simultaneous I / O operations, thereby reaching a higher level.

So,

Is there a better way to reduce CPU usage than O_DIRECT for writing?

Yes send large I / O to the optimum size preferred by your drive.

How can I get a similar throughput for the kernel?

Ideally, perform the operations described above (send the I / O with the optimal size) and make sure that the maximum I / O that your drive likes is in flight immediately (for example, sending asynchronously or through multiple threads / processes, if you plan to use the procedures lock) and send the I / O in LBA disk order. A slightly less optimal trick is to send huge inputs / outputs and force the kernel to break them into parallelism, but this is less optimal.

Optimization for O_DIRECT records

More articles: