Microsecond granularity event planning in POSIX

I am trying to determine the granularity, I can accurately plan the execution of tasks in C / C ++. At the moment, I can reliably plan tasks every 5 microseconds, but I'm trying to figure out if I can reduce this further.

Any advice on how to achieve this / if possible would be greatly appreciated.

Since I know that the frequency of the timer often depends on the OS: I currently work on Linux, but I will use Windows if the grain size is better (although I do not believe that this is based on what I found for QueryPerformanceCounter)

I perform all measurements on bare metals (without VM). /proc/timer_info confirms the resolution of the nanosecond timer for my processor (but I know that it does not translate to the resolution of the nanosecond alarm)

Current

My current code can be found here as a Gist

Currently, I can execute the request every 5 microseconds (5000 nanoseconds) with less than 1% late arrival. When late arrivals occur, they usually only make up one cycle (5000 nanoseconds).

I am doing 3 things now

Real-time process setup (some of them are marked @ Spudd86 here )

 struct sched_param schedparm; memset(&schedparm, 0, sizeof(schedparm)); schedparm.sched_priority = 99; // highest rt priority sched_setscheduler(0, SCHED_FIFO, &schedparm); 

Minimizing timer attenuation

 prctl(PR_SET_TIMERSLACK, 1); 

Using timerfds (part of the Linux 2.6 kernel)

 int timerfd = timerfd_create(CLOCK_MONOTONIC,0); struct itimerspec timspec; bzero(&timspec, sizeof(timspec)); timspec.it_interval.tv_sec = 0; timspec.it_interval.tv_nsec = nanosecondInterval; timspec.it_value.tv_sec = 0; timspec.it_value.tv_nsec = 1; timerfd_settime(timerfd, 0, &timspec, 0); 

Possible improvements

  • Allocate a processor for this process?
  • Use a non-blocking timerfd so that I can create a hard loop instead of a lock (a hard loop will waste more CPU, but it may also be faster to respond to an alarm)
  • Using an external built-in device to run (I can’t imagine why it would be better)

Why

I am currently working on a benchmarking workload generator. The workload generator simulates the arrival rate (X-requests / second, etc.) using the Poisson process. From the Poisson process, I can determine the relative times when requests should be made from the benchmarking mechanism.

So, for example, during 10 requests per second, we may have requests made at the address: t = 0.02, 0.04, 0.05, 0.056, 0.09 s.

These requests must be scheduled in advance and then completed. As the number of requests per second increases, the granularity needed to plan these requests increases (thousands of requests per second require accuracy in milliseconds). As a result, I try to figure out how to scale this system further.

+7
c posix timer real-time poisson
source share
2 answers

You are very close to the limits of what Vanilla Linux will offer you, and this is far from what it can guarantee. Adding real-time fixes to your kernel and tuning for a complete proactive operation will help you get more reliable boot guarantees. I will also remove any dynamic memory allocation from your critical time code, malloc and friends can (and will) stop on a non-essential (real-time) period of time if it should return memory from the i / o cache. I would also consider removing the swap from this machine to guarantee performance. Devoting the processor to your task will help prevent context switching time, but, again, this does not guarantee.

I also suggest that you be careful with this level of sched_priority, you are above the various important bits of Linux there, which can lead to very strange consequences.

+3
source share

What you get from creating a real-time kernel is more reliable guarantees (i.e. lower maximum latency) of the time between the IO / timer event handled by the kernel, and control is passed to your application in response. This is due to low bandwidth, and you may notice an increase in the latency of your best case.

However, the only reason you use OS timers to schedule events with high accuracy is that you are afraid of burning processor cycles in a cycle while you wait for your next due event. OS timers (especially on MS Windows) are not reliable for highly detailed synchronization events and are very dependent on what time / HPET equipment is available on your system.

When I need very accurate event planning, I use the hybrid method. First, I measure the delay of the worst case β€” that is, the biggest difference between the time I requested to sleep and the actual time of hours after sleep. Let me call this difference "D". (In fact, you can do this on the fly during normal operation by tracking β€œD” every time you sleep, with something like β€œD = (D * 7 + lastD) / 8” to create a temporary average) .

Then never ask to sleep outside of "N - D * 2", where "N" is the time of the next event. When during the time "D * 2" of the next event, enter a rotation cycle and wait for the appearance of "N".

This is much more CPU cycles, but depending on the required accuracy you can get away with "sched_yield ()" in your spin cycle, which is more kind to your system.

+3
source share

All Articles