TCP receive window size is larger than net.core.rmem_max

I am doing iperf measurements between two servers connected via a 10Gbit link. I am trying to match the maximum window size that I am observing with system configuration parameters.

In particular, I noticed that the maximum window size is 3 megabytes. However, I cannot find the corresponding values ​​in the system files.

By running sysctl -a , I get the following values:

 net.ipv4.tcp_rmem = 4096 87380 6291456 net.core.rmem_max = 212992 

The first value tells us that the maximum recipient window size is 6 MB. However, TCP tends to allocate twice the requested size, so the maximum recipient window size should be 3 MiB, just like I measured it. From man tcp :

Note that TCP actually allocates twice the size of the buffer requested in the setsockopt (2) call, and therefore the subsequent getsockopt (2) call does not return the same buffer size that is requested in the setsockopt (2) call. TCP uses extra space for administrative purposes and internal kernel structures, and the / proc values ​​reflect larger sizes than actual TCP windows.

However, the second value of net.core.rmem_max indicates that the maximum recipient window size cannot exceed 208 KiB. And this should be a hard limit, according to man tcp :

tcp_rmem max: maximum receive buffer size used by each TCP socket. This value does not override the global net.core.rmem_max . This is not used to limit the size of the receive buffer declared using SO_RCVBUF on the socket.

So, where do I observe the maximum window size that exceeds the size specified in net.core.rmem_max ?

NB: I also calculated the product Bandwidth-Latency: window_size = Bandwidth x RTT , which is about 3 MiB (10 Gb / s RTT), thus checking traffic capture.

+6
source share
2 answers

There was a quick search:

https://github.com/torvalds/linux/blob/4e5448a31d73d0e944b7adb9049438a09bc332cb/net/ipv4/tcp_output.c

in void tcp_select_initial_window()

 if (wscale_ok) { /* Set window scaling on max possible window * See RFC1323 for an explanation of the limit to 14 */ space = max_t(u32, sysctl_tcp_rmem[2], sysctl_rmem_max); space = min_t(u32, space, *window_clamp); while (space > 65535 && (*rcv_wscale) < 14) { space >>= 1; (*rcv_wscale)++; } } 

max_t takes a larger argument value. Thus, priority here is of greater importance.

Another reference is made to sysctl_rmem_max , where it is used to limit the argument SO_RCVBUF (in net / core / sock.c).

All other tcp codes apply only to sysctl_tcp_rmem .

Thus, without looking deeper into the code, you can conclude that the larger net.ipv4.tcp_rmem will override net.core.rmem_max in all cases except when setting SO_RCVBUF (whose validation can be bypassed with SO_RCVBUFFORCE )

+5
source

net.ipv4.tcp_rmem takes precedence net.core.rmem_max according to https://serverfault.com/questions/734920/difference-between-net-core-rmem-max-and-net-ipv4-tcp-rmem :

It seems that the tcp setting will perceive the general max setting


But I agree with what you are saying, this seems to contradict what is written in man tcp , and I can reproduce your findings. Maybe the documentation is incorrect? Please find out and comment!

+1
source

All Articles