We found the same problem with our server, where our servers are too busy with the condition to serve 100 connections per second. It works great with restarting the server, but with long-term situations it repeats retries. And you can see this very often when the nodes (server and client) are located on the network too far to reach.
We made a hardware upgrade and server configuration, as increasing the value of open files (linux), allowing you to use the maximum possible ports, increasing the maximum flows and connections allowed on the web server. Setting the correct SO time and connection time, and then managing client resources (connections), as well as the correct shutdown of clients, provided good control. Reducing other messages to the server via HTTP / HTTPS, such as a heartbeat, user access to front-end applications and some other clients updating the cache, made the scene better.
However, we have one and the same problem in case of less productive machines and machines of the Windows 7 environment.
source share