I wrote the following test code:
@Test public void testLeakWithGrizzly() throws Throwable { ExecutorService executor = Executors.newFixedThreadPool(N_THREADS); Set<Future<Void>> futures = new HashSet<>(); InetSocketAddress inetSocketAddress = new InetSocketAddress(localhostAddress, 111); for (int i = 0; i < N_THREADS; i++) { Future<Void> future = executor.submit(new GrizzlyConnectTask(inetSocketAddress, requests, bindFailures, successfulOpens, failedOpens, successfulCloses, failedCloses)); futures.add(future); } for (Future<Void> future : futures) { future.get(); //block } Thread.sleep(1000); //let everything calm down reporter.report(); throw causeOfDeath; } private static class GrizzlyConnectTask implements Callable<Void> { private final InetSocketAddress address; private final Meter requests; private final Meter bindFailures; private final Counter successfulOpens; private final Counter failedOpens; private final Counter successfulCloses; private final Counter failedCloses; public GrizzlyConnectTask(InetSocketAddress address, Meter requests, Meter bindFailures, Counter successfulOpens, Counter failedOpens, Counter successfulCloses, Counter failedCloses) { this.address = address; this.requests = requests; this.bindFailures = bindFailures; this.successfulOpens = successfulOpens; this.failedOpens = failedOpens; this.successfulCloses = successfulCloses; this.failedCloses = failedCloses; } @Override public Void call() throws Exception { while (!die) { TCPNIOTransport transport = null; boolean opened = false; try { transport = TCPNIOTransportBuilder.newInstance().build(); transport.start(); transport.connect(address).get(); //block opened = true; successfulOpens.inc(); //successful open requests.mark(); } catch (Throwable t) { //noinspection ThrowableResultOfMethodCallIgnored Throwable root = getRootCause(t); if (root instanceof BindException) { bindFailures.mark(); //ephemeral port exhaustion. continue; } causeOfDeath = t; die = true; } finally { if (!opened) { failedOpens.inc(); } if (transport != null) { try { transport.shutdown().get(); //block successfulCloses.inc(); //successful close } catch (Throwable t) { failedCloses.inc(); System.err.println("while trying to close transport"); t.printStackTrace(); } } else { //no transport == successful close successfulCloses.inc(); } } } return null; } }
on my linux laptop, this will work after ~ 5 minutes with the following exception:
java.io.IOException: Too many open files at sun.nio.ch.EPollArrayWrapper.epollCreate(Native Method) at sun.nio.ch.EPollArrayWrapper.<init>(EPollArrayWrapper.java:130) at sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:68) at sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:36) at org.glassfish.grizzly.nio.Selectors.newSelector(Selectors.java:62) at org.glassfish.grizzly.nio.SelectorRunner.create(SelectorRunner.java:109) at org.glassfish.grizzly.nio.NIOTransport.startSelectorRunners(NIOTransport.java:256) at org.glassfish.grizzly.nio.NIOTransport.start(NIOTransport.java:475) at net.radai.LeakTest$GrizzlyConnectTask.call(LeakTest.java:137) at net.radai.LeakTest$GrizzlyConnectTask.call(LeakTest.java:111) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
success / failure counters are as follows:
-- Counters -------------------------------------------------------------------- failedCloses count = 0 failedOpens count = 40999 successfulCloses count = 177177 successfulOpens count = 136178 -- Meters ---------------------------------------------------------------------- bindFailures count = 40998 mean rate = 153.10 events/second 1-minute rate = 144.61 events/second 5-minute rate = 91.12 events/second 15-minute rate = 39.56 events/second requests count = 136178 mean rate = 508.54 events/second 1-minute rate = 547.38 events/second 5-minute rate = 442.76 events/second 15-minute rate = 391.53 events/second
which tells me that:
- there were no major failures
- all connections were either not created or were successfully closed (136178 + 40999 = 177177)
- all failures in the opening were exhausted for the ephemeral port, with the exception of the latter (40999 = 40998 + 1)
full github code here - https://github.com/radai-rosenblatt/oncrpc4j-playground/blob/master/src/test/java/net/radai/LeakTest.java
So am I somehow abusing the grizzly API, or is this a real leak? (note - im using grizzly 2.3.12, which, as I know, is not the latest), the update would require convincing people, so I want to be sure that this is not a user error at my end)
EDIT - this thing flows even when nothing is thrown away. cutting one thread and placing 2 ms of sleep, 800 pipes still seep for 50 minutes.