We have a publisher application that sends data using multicast. The application is extremely performance sensitive (we optimize the microsecond level). Applications that listen to this published data can be (and often) on the same computer as the publication application.
Recently, we noticed an interesting phenomenon: the time spent on sendto () increases in proportion to the number of listeners on the machine.
For example, say, without listeners, the base time for our sendto () call is 5 ยตs. Each additional listener increases the sendto () call time by about 2 microseconds. So, if we have 10 listeners, now sending sendto () takes 2 * 10 + 5 = 25 microseconds.
This for me suggests that the sendto () call is blocked until the data is copied to each listener.
An analysis of the listening side also confirms this. If there are 10 listeners, each listener receives data two microseconds later than the previous one. (Ie, the first listener receives data in about five microseconds, and the last listener receives data in about 23-25 โโms.)
Is there a way, both at the program level and at the system level, to change this behavior? Something like a non-blocking / asynchronous call to sendto ()? Or, at least, block only until the message is copied to the kernel memory, so it can return without waiting for all listeners)?
source share