Possible reasons for blocking in the choice of connector

I have a jabber server application of another jabber client application in C ++.

When a client receives and sends a lot of messages (more than 20 per second), this means that the choice simply freezes and never returns.

With netstat, the socket is still connected to linux and with tcpdump, the message is still sent to the client, but the selection is simply not returned.

Here is the code that selects:

bool ConnectionTCPBase::dataAvailable( int timeout ) { if( m_socket < 0 ) return true; // let recv() catch the closed fd fd_set fds; struct timeval tv; FD_ZERO( &fds ); // the following causes a C4127 warning in VC++ Express 2008 and possibly other versions. // however, the reason for the warning can't be fixed in gloox. FD_SET( m_socket, &fds ); tv.tv_sec = timeout / 1000000; tv.tv_usec = timeout % 1000000; return ( ( select( m_socket + 1, &fds, 0, 0, timeout == -1 ? 0 : &tv ) > 0 ) && FD_ISSET( m_socket, &fds ) != 0 ); } 

And the dead end with gdb:

 Thread 2 (Thread 0x7fe226ac2700 (LWP 10774)): #0 0x00007fe224711ff3 in select () at ../sysdeps/unix/syscall-template.S:82 #1 0x00000000004706a9 in gloox::ConnectionTCPBase::dataAvailable (this=0xcaeb60, timeout=<value optimized out>) at connectiontcpbase.cpp:103 #2 0x000000000046c4cb in gloox::ConnectionTCPClient::recv (this=0xcaeb60, timeout=10) at connectiontcpclient.cpp:131 #3 0x0000000000471476 in gloox::ConnectionTLS::recv (this=0xd1a950, timeout=648813712) at connectiontls.cpp:89 #4 0x00000000004324cc in glooxd::C2S::recv (this=0xc5d120, timeout=10) at c2s.cpp:124 #5 0x0000000000435ced in glooxd::C2S::run (this=0xc5d120) at c2s.cpp:75 #6 0x000000000042d789 in CNetwork::run (this=0xc56df0) at src/Network.cpp:343 #7 0x000000000043115f in threading::ThreadManager::threadWorker (data=0xc56e10) at src/ThreadManager.cpp:15 #8 0x00007fe2249bc9ca in start_thread (arg=<value optimized out>) at pthread_create.c:300 #9 0x00007fe22471970d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #10 0x0000000000000000 in ?? () 

Did you know what might cause a choice to stop receiving messages, even if we still send it. Is there a limitation in linux when receiving and sending a large number of messages over a socket?

thanks

+4
source share
1 answer

There are several possibilities.

Exceeding FD_SETSIZE

Your code checks for a negative file descriptor, but does not exceed the upper limit, which is FD_SETSIZE (usually 1024). Whenever this happens, your code

  • damage to own stack
  • represents an empty fd_set for select , which will lead to a hang

Suppose you don’t need so many concurrent open file descriptors, the solution will probably be to look for removal of the file descriptor leak, especially the stack code that handles the closure of abandoned descriptors.

There is a suspicious comment in your code that indicates a possible leak:

 // let recv() catch the closed fd 

If this comment means someone sets m_socket to -1 and hopes that a recv will catch a closed socket and close it, who knows, maybe we are closing -1, not a real closed socket. (Note the difference between closing at the network level and closing at the file descriptor level, which requires a separate close call.)

This can also be considered by switching to poll , but there are several other limitations imposed by the operating system that make this route quite difficult.

Out-of-band data

You say that the server "sends" data. If this means that data is sent using the send call (as opposed to the write call), use strace to specify the argument of the send flags. If the MSG_OOB flag is MSG_OOB , the data arrives as data out of range - and your select call will not notice it until you pass a copy of fds as another parameter.

 fd_set fds_copy = fds; select( m_socket + 1, &fds, 0, &fds_copy, timeout == -1 ? 0 : &tv ) 

Fasting process

If the mailbox is heavily overloaded, the server performs without any blocking calls and with real-time priority (use top to check this) - and the client does not work - the client may be hungry.

Paused process

A client can theoretically be stopped using SIGSTOP . You probably know if this is so by pressing somewhere ctrl-Z or having some specific process that exercises control over the client, except for the launch itself.

+1
source

All Articles