Blanket claims that blame DMA for large buffer alignment restrictions are incorrect.
Transmission of DMA equipment is usually aligned at 4 or 8 bytes, as the PCI bus can physically transmit 32 or 64 bits at a time. Besides this basic alignment, DMA hardware transfers are designed to work with any address provided.
However, the hardware deals with physical addresses, while the OS deals with virtual memory addresses (which is a protected mode construct in the x86 CPU). This means that the adjacent buffer in the process space may not be adjacent in the physical RAM. If you do not care about creating physically adjacent buffers, DMA transfer must be broken down at the borders of VM pages (usually 4K, possibly 2M).
As for the buffers that need to be matched with the size of the disk sector, this is completely wrong; DMA hardware does not completely pay attention to the size of the physical sector on the hard disk.
In Linux 2.4 O_DIRECT, 4K alignment is required, under 2.6 it was weakened to 512B. In any case, it was probably a constructive solution to prevent overlapping updates of one sector from the borders of the VM page and, therefore, to transmit split DMA transmissions. (An arbitrary 512B buffer has a 1/4 probability of crossing a 4K page).
So, although the OS is to blame, not the hardware, we can see why paging buffers are more efficient.
Edit: Of course, if we write large buffers in any case (100KB), then the number of VM page borders crossed will be almost the same, regardless of whether we are aligned with 512B or not. Thus, the main case that is optimized using 512B alignment is transmission in one sector.
source share