We have a built-in board based on ARM9 running Linux 2.6.32.20. The device is a video camera whose associated capture / compression equipment places the data in the fifo input value in the ARM memory, which the ARM then receives from user space. We also have a driver for this high level encoder.
A thread in the application-level code checks this fifo user space and when there is data, they send it through the socket. To avoid the overhead of this thread, which requires polling the fifo user space for data, we have a very simple read () request to the driver, which actually just gets delayed until there is any data in the fifo (nothing really "reads" into the buffer supplied to the read () call). Then this read () call returns, and the stream continues to read data from fifo until it becomes empty, and then closes again, calling the fake read () call.
This system is quite effective because it measures how many network streams can be transmitted before frame drops are detected. But we have determined that using a fake read () call causes the Linux "top" utility to report a lot of processor usage by our application.
We created 2 versions of an application that works as described above, and another that is identical, except that it never calls a fake read (), but instead polls a fifo using inleep () calls. When we look at CPU usage, as reported by "top" for two cases where each sends 5 threads, we get:
1) read () version: CPU 12%
2) usleep () version: CPU 4%
Of course, the survey is actually less effective, and if we ignore what the "top" says, and instead simply measure the number of simultaneous network streams that two versions can transmit before we see the frames drop, then victory over version 1 above.
We have verified that the read () call above works correctly. If some kind of error results in a read () call that immediately returns, even when there is no data in fifo, then the thread will do expensive continuous polling. But this is not so; calling read () makes the thread run exactly 30 times per second, as it should.
We thought that there might be some shortcut obtained by our version of busybox "top" toys, but these results are not indicated in the source numbers in / proc // stat, which top uses to calculate the displayed numbers.
This issue should be some restriction on how the Linux kernel itself collects the numbers shown in / proc // stat.
If someone understands why this is so, please point me in the right direction. Thanks!