Java socketRead0 Problem

I am developing a web cralwer with htmlunit and I have added all the necessary timeout, but I noticed that the application freezes when the server of any site is scanned, it does not respond when I use Java VisualVM to dump the stream:

java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.net.SocksSocketImpl.readSocksReply(SocksSocketImpl.java:88) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:429) at java.net.Socket.connect(Socket.java:525) at com.gargoylesoftware.htmlunit.SocksSocketFactory.connectSocket(SocksSocketFactory.java:89) at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148) at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:149) at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121) at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:573) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:776) at com.gargoylesoftware.htmlunit.HttpWebConnection.getResponse(HttpWebConnection.java:152) at app.plugin.core.net.QHttpWebConnection.getResponse(QHttpWebConnection.java:30) at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseFromWebConnection(WebClient.java:1439) at com.gargoylesoftware.htmlunit.WebClient.loadWebResponse(WebClient.java:1358) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:307) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:373) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:358) 

This is really disappointing since I do not control these servers. This issue seriously affects the performance of my application.

Question:

  • How can I solve this problem?
  • Is there a way to get a list of socket connections opened by a Java application and use this to terminate a socket, for example, that the server has closed the connection?
+9
java sockets
source share
3 answers

I believe that when you use the native Java method, the stack trace will indicate RUNNABLE, even if the call is actually blocked waiting for some event. In essence, I don’t believe that Java has any way of knowing what its own method is doing, so it designates these calls as RUNNABLE. I saw this with socketRead0 () and socketAccept () - both of which are usually blocked.

You need to set a timeout for a reasonable amount of time so that your request does not work if the server is not responding, but not too short if the server is just busy. Your application must be written to use multiple threads. I would try to start a dozen or more threads and each thread wait up to five or ten seconds to respond. There is virtually no overhead with multiple threads. You should also remember not to bombard the server with many requests when writing a web spider.

+10
source share

Here's a blog post that is possibly related: http://javaeesupportpatterns.blogspot.fi/2011/04/javanetsocketinputstreamsocketread0.html

In short, the solution is to determine the socket timeout. The default value is 0, which means no timeout. How exactly, it depends on the library, in this case, obviously, com.gargoylesoftware.htmlunit . With a quick glance, the correct method might be com.gargoylesoftware.htmlunit.WebClient.setTimeout .

+6
source share

If your Java server runs on Windows, the last resort is SysInternals TCPView.

http://technet.microsoft.com/en-us/sysinternals/bb897437.aspx

From it, you will see a list of all processes and all local and remote ports that your Java application will include. You will need to select the correct connection to close it, and after that the Java thread will throw an exception and terminate.

Of course, there is a risk of closing the wrong connection. In the end, this method is the last resort.

August 23, 2019 Patch:

TCPView runs slowly with a large number of connections.

A much faster alternative is CurrPorts (from NirSoft): https://www.nirsoft.net/utils/cports.html

+1
source share

All Articles