Linux "top" utility for ARM report numbers that we checked to be incorrect. What for?

We have a built-in board based on ARM9 running Linux 2.6.32.20. The device is a video camera whose associated capture / compression equipment places the data in the fifo input value in the ARM memory, which the ARM then receives from user space. We also have a driver for this high level encoder.

A thread in the application-level code checks this fifo user space and when there is data, they send it through the socket. To avoid the overhead of this thread, which requires polling the fifo user space for data, we have a very simple read () request to the driver, which actually just gets delayed until there is any data in the fifo (nothing really "reads" into the buffer supplied to the read () call). Then this read () call returns, and the stream continues to read data from fifo until it becomes empty, and then closes again, calling the fake read () call.

This system is quite effective because it measures how many network streams can be transmitted before frame drops are detected. But we have determined that using a fake read () call causes the Linux "top" utility to report a lot of processor usage by our application.

We created 2 versions of an application that works as described above, and another that is identical, except that it never calls a fake read (), but instead polls a fifo using inleep () calls. When we look at CPU usage, as reported by "top" for two cases where each sends 5 threads, we get:

1) read () version: CPU 12%
2) usleep () version: CPU 4%

Of course, the survey is actually less effective, and if we ignore what the "top" says, and instead simply measure the number of simultaneous network streams that two versions can transmit before we see the frames drop, then victory over version 1 above.

We have verified that the read () call above works correctly. If some kind of error results in a read () call that immediately returns, even when there is no data in fifo, then the thread will do expensive continuous polling. But this is not so; calling read () makes the thread run exactly 30 times per second, as it should.

We thought that there might be some shortcut obtained by our version of busybox "top" toys, but these results are not indicated in the source numbers in / proc // stat, which top uses to calculate the displayed numbers.

This issue should be some restriction on how the Linux kernel itself collects the numbers shown in / proc // stat.

If someone understands why this is so, please point me in the right direction. Thanks!

+4
source share
2 answers

I can GUARANTEE that the top does not lie to you. If he says that your process uses 12% of the processor, he uses 12% of the processor. There are no two ways.

Obviously, calling usleep will not take long, because it causes the process to sleep (at least) the amount of time requested. This is probably 100 cycles per call for sleep. Reading does a lot more, so I'm not surprised that it takes more processor time, especially if you do.

Reading:

  • Make sure your descriptor is valid.
  • Verify that the pointer and buffer length are correct.
  • Copy length from user space to kernel space.
  • Paste the read data into the appropriate data structures.
  • Look at the appropriate descriptors and driver for the request.
  • Issue a read request to your driver.
  • The driver goes to sleep [assuming no data].
  • The driver wakes up the process [if there is data].
  • Copy read length data to user space.
  • Return to the caller.

Compare this to usleep:

  • Go to sleep.
  • Wake up
  • Return to the user.

To go to bed, Corus is not one trivial function, and waking up is not trivial either. But these are the same operations, and during sleep the process does not use the processor.

You can easily understand how much overhead there is when reading from a read / dev / zero and sleep between them. /dev/zero is a device that immediately returns with a buffer filled with zeros.

Alternatively, you can try using something like oprofile for performance analysis and see where the time is spent.

But I'm sure your top not lying.

+1
source

In fact, the "guarantee" is too strong a word to use here. I am currently testing performance testing of two different ARM-based boards using 3.8 and 3.7 kernels, and BOTH are reporting strange things from above. One uses the spare core that the company shipped, and Arch-Linux the other. I do not believe that these kernels were β€œcracked” in any way. The hardware just doesn't play well. One of them reports 0% CPU usage for each process, although the total number at the top shows 50% inactivity. Another shows the process associated with the dma driver, always using 100% CPU. Therefore, in my very limited experience, the problem was at the very beginning.

+1
source

All Articles