We encountered a problem that after a while a certain socket connection is blocked, and the client-side tcp core continues packet relaying [ACK].
The topology topology is as follows:
Client A ββ Switch A β Router A:NAT β .. Internet .. β Router B:NAT β Switch B ββ Server B
Here are the packets captured by WireShark:
A) Server
1. 8013 > 6757 [PSH, ACK] Seq=56 Ack=132 Win=5840 Len=55 2. 6757 > 8013 [ACK] Seq=132 Ack=111 Win=65425 Len=0
B) Customer
//lines 3 and 4 are exactly the same as line 1 and 2 3. 8013 > 13000 [PSH, ACK] Seq=56 Ack=132 Win=5840 Len=55 4. 13000 > 8013 [ACK] Seq=132 Ack=111 Win=65425 Len=0 5. 13000 > 8013 [PSH, ACK] Seq=132 Ack=111 Win=65425 Len=17 [TCP Retransmission] 6. 13000 > 8013 [PSH, ACK] Seq=132 Ack=111 Win=65425 Len=17
8013 is the server port, and 6757 is the client NAT port.
Why does the TCP kernel continue to transmit [ACK] packets to tell the client that it is receiving packet 1 (see packets 4, 5 and 6), even when the server has already received one [ACK] packet (see packet 2)? Neither side of the connection closes the socket when a problem occurs.
After packet 6, the connection is lost, and we canβt send anything to the server through this socket anymore.
psuedocode: //client serverAddr.port =htons(8013) ; serverAddr.ip = inet_addr(publicIPB); connect(fdA, serverAddr,...); //server listenfd = socket(,SO_STREAM,); localAddr.port = htons(8013); localAddr.ip = inet_addr(INADDR_ANY); bind(localAddr...) listen(listenfd, 100); ... //using select model select(fdSet, NULL, NULL, NULL); for(...) { if (FD_ISSET(listenfd)) { ... } ... }
UPDATE
UP1. Here are the specific steps to reproduce the problem.
Given the three computers that are PC1, PC2 and PC3. All three are behind RouterA, and the server is behind RouterB.
For two users who are U1 and U2. U1 is registered from PC1 and U2 from PC3. Both U1 and U2 will build a tcp connection between themselves and the Server. Now U1 can send data through its tcp to the server, then the server transfers all the data to U2. Everything is working fine up to this point.
Specify the socket number corresponding to the server endpoint of the TCP connection between U1 and the server: U1-OldSocketFd
Do not leave the U1 system and disconnect the cable from PC1. Then U1 logs in with PC2, now it establishes a new TCP connection to the server.
Specify the socket number corresponding to the server endpoint of the TCP connection between U1 and the server: U1-NewSocketFd
On the server side, when it updates its session with U1, it calls close(U1-OldSocketFd) .
4.1. About 30 seconds after step 3, we found U1 IS NOT able to send any data to the Server through its new TCP connection.
4.2. In step 3, if the server does not call close(U1-OldSocketFd) immediately (the same second new connection between U1 and Server), instead the server calls close(U1-OldSocketFd) for more than 70-80 seconds, then everything works fine.
UP2. Router B uses Port Forwarding on port 8013.
UP3. Some parameters of the Linux OS on which the Server is running.
net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_tw_recycle = 1