Lwt file descriptor leak, not sure if error or my code

Question

Lwt file descriptor leak, not sure if error or my code

(Cross reference to lwt github)

I applied my use to this code sample, which will result in leaking file descriptors.

let's say you have:

#require "lwt.unix" open Lwt.Infix let echo ic oc = Lwt_io.(write_chars oc (read_chars ic)) let program = let server_address = Unix.(ADDR_INET (inet_addr_loopback, 2000)) in let other_addr = Unix.(ADDR_INET (inet_addr_loopback, 2001)) in let server = Lwt_io.establish_server server_address begin fun (tcp_ic, tcp_oc) -> Lwt_io.with_connection other_addr begin fun (nc_ic, nc_oc) -> Lwt_io.printl "Created connection" >>= fun () -> echo tcp_ic nc_oc <&> echo nc_ic tcp_oc >>= fun () -> Lwt_io.printl "finished" end |> Lwt.ignore_result end in fst (Lwt.wait ()) let () = Lwt_main.run program

and then you create a simple server using

nc -l 2001

and then run the OCaml code using utop example.ml

and then open the client

 nc localhost 2000 blah blah ^c

Then, looking at the connections for port 2000 using lsof, we see

 ocamlrun 71109 Edgar 6u IPv4 0x7ff3e309cb80aead 0t0 TCP 127.0.0.1:callbook (LISTEN) ocamlrun 71109 Edgar 7u IPv4 0x7ff3e309c9dc8ead 0t0 TCP 127.0.0.1:callbook->127.0.0.1:54872 (CLOSE_WAIT)

In fact, for every use of nc localhost 2000 we get the remaining CLOSE_WAIT record from using lsof.

Ultimately, this will cause the system to run out of file descriptors, which will be MOST annoyingly not a program crash, but will cause Lwt to just hang.

I can’t say that I am doing something wrong or if this is a real mistake, in any case this is a serious mistake for me, and I have run out of file descriptors within 10 hours ...

EDIT: It seems to me that the problem is that one side of the connection is closed and the other is not, I would think that with_connection should clear / close whenever both sides close, aka whenever nc_ic or nc_oc close.

EDIT II: I tried every way when I manually close the descriptors with Lwt_io.close , but I still have the CLOSE_WAIT message.

EDIT III: even Lwt_unix.close used on raw fd given with the argument fd with_connection optionally with similarly bad results.

EDIT IV: Most insidious if I use Lwt_daemon.daemonize then this problem seems to go away

+7

ocaml ocaml-lwt

Edgar aroutiounian Jan 12 '16 at 0:03

source share

2 answers

First, it’s not clear why you are using join <&> instead of select <?> . I think that the connection should be closed if one of the parties wants to close it.

Regarding CLOSE_WAIT : this is a semi-closed connection from utop server to nc client.

A TCP connection consists of two half-connections, and they close independently. The connection to the nc client server on utop was closed by nc due to Ctrl-C . But you must explicitly close the opposite connection on the server side by closing the output stream. I am not sure why Lwt.establish_server does not close it automatically. Perhaps this is a design problem.

This works for me on CentOS 7:

 Lwt_io.printl "Created connection" >>= fun () -> echo tcp_ic nc_oc <?> echo nc_ic tcp_oc >>= fun () -> Lwt_io.close tcp_oc >>= fun () -> Lwt_io.printl "finished"

In addition, there is a simplified code snippet to reproduce the problem:

 #require "lwt.unix" let program = let server_address = Unix.(ADDR_INET (inet_addr_loopback, 2000)) in let _server = Lwt_io.establish_server server_address begin fun (ic, oc) -> (* Lwt_io.close oc |> Lwt.ignore_result; *) () end in fst (Lwt.wait ()) let () = Lwt_main.run program

Run nc localhost 2000 several times to get connections in the CLOSE_WAIT state. Uncomment the code to fix the problem.

+5

Stas Jan 13 '16 at 7:29

source share

antron · Accepted Answer · 2017-04-20T00:39:45+0000

The main problem when asking this question was that Lwt_io.establish_server made no effort to close the file descriptors associated with tcp_ic and tcp_oc . Although this could (and should) have been addressed by users closing them manually, it was a strange and unexpected behavior.

The new Lwt_io.establish_server , available since Lwt 3.0.0, is trying to automatically close tcp_ic and tcp_oc . To resolve this, it has a slightly different signature for the callback: the callback should return a promise that you must resolve when tcp_ic / tcp_oc no longer needed. (EDIT) In practice, this means that you simply record the callback in a natural Lwt style and completing the last Lwt operation will close the channels.

The new API also internally calls Lwt.async to launch your callback, so you do not need to call this or Lwt.ignore_result .

You can still close tcp_ic and tcp_oc manually in the callback to write your own error handlers, which can be as complex as you like. The second automatic, internal, closed inside the new Lwt_io.establish_server , will not have any harmful effect.

The new API was the latest result of a parallel discussion of this issue in the Lwt # 208 issue .

If someone would like the old, painful behavior, perhaps in order to reproduce the problem in the question, the old API is available for some time under the name Lwt_io.Versioned.establish_server_1 .

Lwt file descriptor leak, not sure if error or my code

More articles: