Read / Write Delay MMIO

I found that the MMIO read / write latency was unreasonably long. Hope someone can give me some advice.

In kernel space, I wrote a simple program to read a 4-byte value in the BAR0 address of a PCIe device. The device is a PCIe Intel 10G NIC and is connected to the PCIe x16 bus on my Xeon E5 server. I use rdtsc to measure the time between the start of reading MMIO and the end, the code snippet looks like this:

vaddr = ioremap_nocache(0xf8000000, 128); // addr is the BAR0 of the device rdtscl(init); ret = readl(vaddr); rmb(); rdtscl(end); 

I expect that the elapsed time between (end, init) will be less than 1us, after all, the data passing through the PCIe data channel should be only a few nanoseconds. However, my test results are shown in a 5.5use lease to read an MMIO PCIe device. I wonder if this is reasonable. I change my code to a remote memory barrier (rmb), but still get a delay of 5 s.

This document refers to PCIe delay measurement. Usually it is less than 1us. www.cl.cam.ac.uk/~awm22/.../miller2009motivating.pdf Do I need to make any special configuration, such as a kernel or device, in order to get a lower latency of access to MMIO? or does anyone have experience doing this before?

+7
linux linux-device-driver pci-e pci-bus
source share
2 answers

5usec is great! Do this in a loop statistically, and you can find much larger values.

There are several reasons for this. BARs are usually not cached or pre-election - check your data with pci_resource_flags (). If a BAR is marked as cacheable, then cache coherence — the process of ensuring that all CPUs have the same value caching — can be one problem.

Secondly, reading io is always an unfulfilled task. The CPU must stop until it receives permission to communicate on some data bus and stops a bit until the data arrives on the specified bus. This bus is created as a memory, but in reality it is not, and the stall may not be interrupted by lively expectation, but it has never been unproductive. Therefore, I would expect the worst latency to be much higher than 5us, even before you begin to consider the suspension of tasks.

+2
source share

If a NIC needs to go through a network, possibly through switches, to receive data from a remote host, 5.5us is a reasonable read time. If you are reading the register on a local PCIe device, it should be less than 1us. I have no experience with the Intel 10G network adapter, but have worked with Infiniband and custom cards.

-one
source share

All Articles