(network sockets) loaded into the send queue within 15 minutes; What for?

I have a Java program running on Windows (Citrix machine) that sends a request to a Java application server on Linux; This dispatch mechanism is common.

The Windows Java program (calling her W ) opens a socket to listen on the port provided by the OS, say 1234 to get the results. Then it calls the โ€œsendโ€ service on the server with the โ€œbusiness requestโ€. This service splits the request and sends it to other servers (call them S1 ... Sn ), and synchronously returns the number of tasks to the client.

In my tests, there are 13 jobs sent to several servers, and within 2 seconds all the servers have finished processing their tasks and will try to send the results back to the W socket.

In the magazines, I can see that 9 jobs are received by W (this number varies from test to test). So I'm trying to find the 4 remaining jobs. If I do netstat in this windows window, I see that 4 sockets are open:

 TCP W:4373 S5:48197 ESTABLISHED TCP W:4373 S5:48198 ESTABLISHED TCP W:4373 S6:57642 ESTABLISHED TCP W:4373 S7:48295 ESTABLISHED 

If I dump the W stream, I see 4 threads trying to read from these sockets and apparently got stuck in java.net.SocketInputStream.socketRead0(Native Method) .

If I go into each of S and make netstat , I see that some bytes are still in the send queue. This number of bytes does not move for 15 minutes. (Below is a collection of netstat on different machines):

 Proto Recv-Q Send-Q Local Address Foreign Addr State tcp 0 6385 S1:48197 W:4373 ESTABLISHED tcp 0 6005 S1:48198 W:4373 ESTABLISHED tcp 0 6868 S6:57642 W:4373 ESTABLISHED tcp 0 6787 S7:48295 W:4373 ESTABLISHED 

If I dump the streams on the servers, I see that the streams are also stuck in java.net.SocketInputStream.socketRead0(Native Method) . I would expect to write, but maybe they are waiting for ACK? (Not sure here whether it will be mapped to Java? Should this not directly handle the TCP protocol?)

Now itโ€™s very strange: after 15 minutes (and itโ€™s always 15 minutes), the results are received, the sockets are closed, and everything continues as usual.

It has always worked. S servers have moved to another data center, so W and S no longer in the same data center. In addition, S is located behind the firewall. All ports must be authorized between S and W (I said). Secret is really a 15 minute delay. I thought this could be some protection against DDOS?

I am not an expert on the network, so I asked for help, but no one could help me. I spent 30 minutes with a guy who grabbed packets from Wireshark (formerly Ethereal), but for security reasons I can't look at the result. He must analyze this and return to me. I asked for firewall logs; the same story.

I am not root or the administrator in these blocks, now I do not know what to do ... I do not expect a decision from you guys, but some ideas on how to progress will be wonderful!

+4
source share
5 answers

If it works fine on your local network, I do not assume that this is a programming problem (comments by flush() ).

Is the network connection between the two machines normal otherwise? Can you transfer similar amounts of data through (say) FTP without a problem. Can you repeat this problem by shooting down the client / server script only to send data blocks of the appropriate size. i.e. a good network connection between W and S?

Another question. You now have a firewall. Could this be a bottleneck that was not there before? (not sure how this explains the constant delay of 15 m).

The final question. What are your TCP configuration parameters that were installed (on both W and S - I think about OS level settings). Is there anything that could suggest or lead to a 15 meter figure.

Not sure if this will help.

+3
source

Do you miss the flash () on the S side after sending the response?

+1
source

Right If you are using BufferedOutputStream, you need to call flush () if you do not reach the maximum buffer size.

+1
source

Besides what Brian said, you can also check the following

1) Run tcpdump on any of the servers and view the sequence of message flows from the start of the job after a delay when all processing is complete. This will tell you which side is causing the delay (W or S). Check if there are any retransmissions, missing characters, etc.

2) Is there some kind of fragmentation between W and S?

3) What are the network load conditions on servers on which bytes are looping? Is excessive loading causing output errors, resulting in queue queues not being freed? (There may also be a network adapter error in which, after some error condition has been fixed, NIC buffers are not flushed or cannot resume transmission, and this condition is cleared by some watchdog)

Additional information on the above two may help.

+1
source

Are you sure that the streams stuck in readings are the same streams that sent data? Is it possible that the actually involved threads are blocked by some other action, and your stackdump shows other innocent threads that turned out to make the i / o socket? Some time has passed since I worked with Java, but I vaguely remember the JVM using sockets for IPC.

I would look at the whole host to see if one of them is the intended receiver and instead does something else for 15 minutes.

The fact that it works in one place compared to another usually indicates an application synchronization error, and not a data center problem.

0
source

All Articles