A few points that I would like to touch upon.
1) According to this document , here is what is needed to use keepalive on Linux:
Linux has built-in keepalive support. You must enable TCP / IP networks to use it. You also need procfs support and sysctl support for tuning kernel parameters at runtime.
Procedures involving keepalive use three user-controlled variables:
tcp_keepalive_time
> interval between the last data packet sent (simple ACKs are not considered data) and the first probing sounding; after connection is marked as mandatory keepalive, this counter is no longer used
tcp_keepalive_intvl
> interval between consecutive keepalive probes, regardless of what the connection exchanged in the meantime
tcp_keepalive_probes
> the number of unconfirmed probes to send before considering the connection is dead and application level notification
Remember that keepalive support, even if it is configured in the kernel, is not the default behavior on Linux. Programs should request keepalive management of their sockets using the setsockopt interface. relatively few programs that implement keepalive, but you can easily add keepalive for most of them, following the instructions explained later in this document.
Try looking at the current values ββof these variables in your current system to make sure they are correct or make sense. The bold highlight is mine, and it looks like you are doing it.
I assume that the values ββfor these variables are in milliseconds, but are not sure if you double check.
tcp_keepalive_time
I expect the value to mean something around "ASAP, after sending the last data packet, send the first probe"
tcp_keepalive_intvl
I assume that the value for this variable must be something less than the default TCP time in order to disconnect the connection.
tcp_keepalive_probes
It may be the "magical meaning" that makes or breaks your application; if the number of unconfirmed probes is too large, this may cause epoll_wait()
never exit.
The document discusses the implementation of Linux keepalive in versions of the Linux kernel (2.4.x, 2.6.x), as well as how to write applications with TCP keepalive support in C.
http://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/
2) Make sure you do not specify -1 in the timeout argument in epoll_wait()
, because it causes epoll_wait()
block indefinitely.
int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);
The timeout argument specifies the minimum number of milliseconds that epoll_wait () will block. (This interval will be rounded to system clock synchronization and delays in kernel scheduling mean that the blocking interval can exceed a small amount.) Specifying a timeout of -1 causes epoll_wait () to lock indefinitely, indicating a timeout of zero, calls epoll_wait () to return immediately if there are no events.
On the manual page http://linux.die.net/man/2/epoll_wait