NIC Descriptor Concept

Question

NIC Descriptor Concept

I am trying to understand the concept of Rx and Tx descriptors used in network driver code.

These are descriptors in software (RAM) or hardware (NIC).
How they are filled.

EDIT: So in the Realtek card driver code. I have the following structure.

struct Desc { uint32_t opts1; uint32_t opts2; uint64_t addr; }; txd->addr = cpu_to_le64(mapping); txd->opts2 = cpu_to_le32(opts2); txd->opts1 = cpu_to_le32(opts1 & ~DescOwn);

So, opts1 and opts2 and is there a bit, for example, DescOwn card? Are they defined by the manufacturer in the data table?

Thanks Nayan

+6

cpu cpu-architecture

Haswell Apr 14 '16 at 14:16

source share

1 answer

Wei shen · Accepted Answer · 2016-04-15T23:42:12+0000

Quick response:

They are software constructs that follow the definition of NIC hardware, so both understand and can talk to each other.
They are populated with either a driver (for example, RX with an empty buffer allocation) or NIC (RX writeback). See below for more details.

More Architectural Details:

Note. I assume that you have knowledge of the ring data structure, the concept of DMA. https://en.wikipedia.org/wiki/Circular_buffer

https://en.wikipedia.org/wiki/Direct_memory_access

Consider the RX path first. After receiving the packet, the NIC converts the electronic / optical / radio signal per wire into binary data bytes. Then the network adapter should inform the OS that it has received something. In the old days, this was done by interrupts, and the OS read bytes from a predetermined location on the NIC in RAM. However, this happens slowly, since 1) the CPU is required to participate in the transfer of data from the NIC to RAM 2) there can be many packets, so there are many interrupts that may be too large to process the CPU. Then came DMA and solves the first problem. In addition, people developed a polling mode driver (or hybrid mode, as in Linux NAPI), so the processor could be freed from interrupt processing and polled for several packets at once, thereby solving the second problem.

A handle is a mechanism that helps the NIC make DMA easy. As its name implies, it describes a package. Therefore, it does not contain packet data (for network adapters, as far as I know), but rather describes where the data is.

Back to the history of the RX. The NIC is completing the signal in bytes and would like to do DMA in RAM. But before that, the NIC should know where the DMA is, since it cannot accidentally put data into RAM that the CPU will not know, where it is not safe.

Thus, during the initialization of the RX queue, the NIC driver preallocates some packet buffer, as well as an array of packet descriptors. It initializes each packet descriptor as defined by the NIC.

I will take the Intel XL710 driver as an example code (some vars have been renamed for better understanding):

 struct i40e_rx_queue { struct packet_buffer_pool *pool; /* < packet pool */ volatile i40e_16byte_rx_desc *rx_ring; /* < RX ring of descriptors */ struct packet_buffer *pkt_addr_backup; /* save a copy of packet buffer address for writeback descriptor reuse */ .... } union i40e_16byte_rx_desc { struct { __le64 pkt_addr; /* Packet buffer address, points to a free packet buffer in packet_buffer_pool */ __le64 hdr_addr; /* Header buffer address, normally isn't used */ } read; /* initialized by driver */ struct { struct { struct { union { __le16 mirroring_status; __le16 fcoe_ctx_id; } mirr_fcoe; __le16 l2tag1; } lo_dword; union { __le32 rss; /* RSS Hash */ __le32 fd_id; /* Flow director filter id */ __le32 fcoe_param; /* FCoE DDP Context id */ } hi_dword; } qword0; struct { /* ext status/error/pktype/length */ __le64 status_error_len; } qword1; } wb; /* writeback by NIC */ };

The driver allocates a certain amount of packet buffer in RAM (stored in the packet_buffer_pool data structure).
```
 pool = alloc_packet_buffer_pool(buffer_size=2048, num_buffer=512); 
```
The driver places each packet buffer address in a descriptor field, for example
```
 rx_ring[i]->read.pkt_addr = pool.get_free_buffer(); 
```
The driver tells the NIC the initial location of rx_ring, its length, and its head / tail. Thus, the NIC will know which descriptors are free (therefore, the packet buffer indicated by these descriptors is free). This process is performed using a driver that writes this information to NIC registers (fixed, can be found in the NIC specification).
```
 rx_ring_addr_reg = &rx_ring; rx_ring_len_reg = sizeof(rx_ring); rx_ring_head = 0; /* meaning all free at start */ /* rx_ring_tail is a register in NIC as NIC updates it */ 
```
Now NIC knows that the descriptor rx_ring [{x, y, z}] is free, and {x, y, z} .pkt_addr can be placed new batch data. It goes further, and new DMA packages in {x, y, z} .pkt_addr. At the same time, the NIC can pre-process (offload) the processing of packets (for example, check the checksum, retrieve the VLAN tag), so it also needs some space to leave this information for the software. Here descriptors are reused for this purpose (see The second structure in descriptor aggregation). The NIC then advances the tail pointer offset rx_ring, indicating that a new handle has been written to the NIC. [It has been found that since the descriptors are reused for the results of the preliminary process, the driver must save {x, y, z}. pkt_addr in the backup data structure].
```
 /* below is done in hardware, shown just for illustration purpose */ if (rx_ring_head != rx_ring_tail) { /* ring not full */ copy(rx_ring[rx_ring_tail].read.pkt_addr, raw_packet_data); result = do_offload_procesing(); if (pre_processing(raw_packet_data) & BAD_CHECKSUM)) rx_ring[rx_ring_tail].writeback.qword1.stats_error_len |= RX_BAD_CHECKSUM_ERROR; rx_ring_head++; /* actually driver sets a Descriptor done indication flag */ /* along in writeback descriptor so driver can figure out */ /* current HEAD, thus saving a PCIe write message */ } 
```
The driver reads the new tail pointer offset and detects {x, y, z} with the new packets. He would read the packet from pkt_addr_backup [{x, y, z}] and the associated pre-precessing result.
When top-level software runs with packages, {x, y, z} will be returned to rx_ring, and the pointer to the ring pointer will be updated to indicate free descriptors.

This completes the RX path. The TX path is pretty much the opposite: the top layer creates a packet, the driver copies the packet data to pool_buffer_pool and allows tx_ring [x] .buffer_addr to point to it. The driver also prepares some TX offload flags (for example, hardware checksumming, TSO) in the TX descriptor. The NIC reads the TX and DMA descriptor tx_ring [x] .buffer_addr from RAM to the NIC.

This information is usually displayed in the specification of the network adapter, for example, Intel XL710 xl710-10-40-controller-datasheet, chapters 8.3 and 8.4. RX / TX data channel.

http://www.intel.com/content/www/us/en/embedded/products/networking/xl710-10-40-controller-datasheet.html

You can also check the open source driver code (the Linux kernel or some user space library, such as DPDK PMD), which will contain a definition of the handle structure.

By the way, I suggest you also mark a question with the help of "Network".

- Change 1 -

For an additional question regarding the Realtek driver: Yes, these bits are specific to the NIC. The hint is strings like

  desc->opts1 = cpu_to_le32(DescOwn | RingEnd | cp->rx_buf_sz);

DescOwn is a bit that, setting it, tells the NIC that it now owns this descriptor and its associated buffer. He also needs to convert from a CPU (maybe a power processor, which is BE) to Little Endian, which agrees with the NIC.

You can find the relevant information in http://realtek.info/pdf/rtl8139cp.pdf (for example for DescOwn), although this is not the same as the XL710, but at least it contains all the registry / descriptor information.

- Change 2 -

The NIC descriptor is a very vendor-specific definition. As shown above, the Intel NIC descriptor uses the same RX descriptor ring to provide NIC buffers for writing and for NICs to write RX information. There are other implementations, such as splitting RX for presentation / completion of the queue (more common in NVMe technology). For example, some of the Broadcom network adapters have a single feed ring (to provide a NIC buffer) and a multiple termination ring. It is designed for network adapters that receive and place packets in different rings, for example. another priority of the traffic class, so first the driver can receive the most important packets. (from the NIC BCM5756M programmer's guide)

- Change 3 -

I usually find that the Intel NIC specification is the most open and informative in their design. A very brief summary of the Tx / Rx stream is described in the Intel 82599 family, Section 1.8, Architecture and Basic Operations.

NIC Descriptor Concept

More articles: