The delay in listening, as Peter said , is the queue that the operating system uses to store connections that were accepted by the TCP stack, but not yet, by your program. Conceptually, when a client connects it, placed in this queue until your Accept() code removes it and passes it into your program.
So backlog is a setting that can be used to help your server handle peaks while trying to connect. Please note that this is due to peaks in parallel connection attempts and is in no way related to the maximum number of parallel connections that your server can support. For example, if you have a server that receives 10 new connections per second, it is unlikely that setting the listening delay will have any effect, even if these connections are long-lived, and your server supports 10,000 concurrent connections (assuming your server is not the maximum of the central processor serving existing connections!). However, if the server occasionally experiences short periods when it accepts 1000 new connections per second, you can probably prevent some connections from being rejected by setting the listening lag to provide a longer queue and therefore give your server more time to call Accept() for each connection.
As for the pluses and minuses, the pluses are that you can handle peaks when trying to parallelize better, and the corresponding con is that the operating system should allocate more space for the replacement queue to listen to, because it is larger. Thus, productivity and resources are traded.
Personally, I make listen to the backlog what can be edited externally through the configuration file.
How and when you call listening and accept depends on the style of socket code you use. With the synchronous code, you call Listen() once with a value, say 10, for your listening, and then you call the Accept() loop. The call to listen establishes an endpoint from which your clients can connect and conceptually creates a queue for storage in anticipation of the specified size. The Accept() call removes the pending connection from the listening listen queue, sets the socket to use the application, and passes it to your code as the newly established connection. If the time your code spent on calling Accept() processes a new connection, and the round loop for calling Accept() again larger than the gap between simultaneous connection attempts, then you will begin to accumulate records in the listening delay queue.
With asynchronous sockets, this can be a little different, if you use async accepts, you will listen once, as before, and then publish several (again configurable) asynchronous connections. As each of them completes, you process a new connection and publish a new asynchronous reception. Thus, you have a waiting list for the listen and a waiting receiving queue, and therefore you can receive connections faster (which accepts asynchronous calls in thread pool threads more, so you do not have a single hard acceptance cycle). This is usually more scalable and gives you two points to configure for handling more concurrent connection attempts.