I have a Thrift API that works with a Java application running on Linux. I use the .NET client to connect to the API and perform operations.
The first few calls to the service work fine without errors, but then (seemingly randomly) the call will hang. If I force a shutdown with my client and try to connect again, the service freezes again, or my client has the following error:
Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size) at Thrift.Transport.TStreamTransport.Read(Byte[] buf, Int32 off, Int32 len) (etc.)
When I use JConsole to get a stream dump, the server is on accept()
"Thread-1" prio=10 tid=0x00002aaad457a800 nid=0x79c7 runnable [0x00000000434af000] java.lang.Thread.State: RUNNABLE at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408) - locked <0x00000005c0fef470> (a java.net.SocksSocketImpl) at java.net.ServerSocket.implAccept(ServerSocket.java:462) at java.net.ServerSocket.accept(ServerSocket.java:430) at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:113) at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35) at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31) at org.apache.thrift.server.TSimpleServer.serve(TSimpleServer.java:63)
netstat on the server shows connections to the service port located on TIME_WAIT , which eventually disappear a few minutes after I force the client to finish (as expected).
The code that installs the Thrift service is as follows:
int port = thriftServicePort; String host = thriftServiceHost; InetAddress adr = InetAddress.getByName(host); InetSocketAddress address = new InetSocketAddress(adr, port); TServerTransport serverTransport = new TServerSocket(address); TServer server = new TSimpleServer(new TServer.Args(serverTransport).processor((org.apache.thrift.TProcessor)processor)); server.serve();
Note that we use the TServerTransport constructor, which accepts an explicit host name or IP address. I suspect that I should change it to take a constructor that specifies only the port (eventually binding to InetAddress.anyLocalAddress() ). Alternatively, I suppose, I could configure the service to bind to a "wildcard" address ("0.0.0.0").
I must mention that the service is not hosted on the open Internet. It is hosted on a private network, and I use SSH tunneling to achieve it. Therefore, the host name to which the service is bound is not resolved on my local network (although I can make the initial connection through tunneling). Interestingly, is this something similar to a TCP RMI callback problem ?
Is there a technical explanation for what is happening (if this is a common problem) or additional troubleshooting steps that I can take?
UPDATE
Today we had the same problem, but this time jstack showed that the Thrift server blocks eternal reading from the input stream:
"Thread-1" prio=10 tid=0x00002aaad43fc000 nid=0x60b3 runnable [0x0000000041741000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:22) at org.apache.thrift.server.TSimpleServer.serve(TSimpleServer.java:70)
Therefore, we need to set the "client timeout" in the TServerSocket constructor. But why did this make the application also refuse connections when accept() blocked?