I experience intermittent reading latency from a POSIX socket (RHEL6 x86_64 C ++ icpc). My code is designed in such a way that the user can provide an absolute deadline (against the relative timeout) that will be used for several calls to recv. I call pselect to make sure the data is readable before trying to call recv.
Usually this works as expected (it will wait for the data, but will not exceed the deadline without introducing a noticeable delay if the data is available for recv). However, I have a user who periodically (~ 50% of the time) receives his application in a state in which the selection lock is ~ 400-500 ms, even if the data is available on the socket. If I look at / proc / net / tcp, I see that the data is available in the RX queue, and I see the application slowly reading the data from the queue. If I skip the pselect call and just call recv, the behavior will be the same (but a shorter delay generally indicates that recv will also block unnecessarily). When the application enters this state, it remains that way (it experiences a consistent delay with each pselect / recv).
I spent several hours picking here and on other sites. This is the closest similar problem that I could find, but there was no resolution ...
http://developerweb.net/viewtopic.php?id=7458
Has anyone encountered similar behavior? I don’t understand what to do. I checked the code to verify that this happens when a delay occurs. (Edit: We actually just confirmed that the whole method below was slow, not some specific system call.) It seems to be a kernel / OS problem, but I'm not sure where to look. Here is the code ...
bool
Message::wait(int socket, const timespec & deadline) {
if (deadline.tv_sec == 0 && deadline.tv_nsec == 0) {
return true;
}
timespec currentTime;
clock_gettime(CLOCK_REALTIME, ¤tTime);
if (VirtualClock::cmptime(currentTime, deadline) >= 0) {
LOG_WARNING("Timed out waiting to receive data");
m_timedOut = true;
return false;
}
timespec timeout;
memset(&timeout, 0, sizeof(timeout));
timeout.tv_nsec = VirtualClock::nsecs(currentTime, deadline);
VirtualClock::fixtime(timeout);
fd_set descSet;
FD_ZERO(&descSet);
FD_SET(socket, &descSet);
int result = pselect(socket + 1, &descSet, NULL, NULL, &timeout, NULL);
if (result == -1) {
m_error = errno;
LOG_ERROR("Failed to wait for data: %d, %s",
m_error, strerror(m_error));
return false;
} else if (result == 0 || !FD_ISSET(socket, &descSet)) {
LOG_WARNING("Timed out waiting to receive data");
m_timedOut = true;
return false;
}
return true;
}
VirtualClock is a time-related utility class that is used here to compare / correct time (i.e. it does not introduce any delays). I would appreciate an understanding of this behavior.