I am trying to determine the granularity, I can accurately plan the execution of tasks in C / C ++. At the moment, I can reliably plan tasks every 5 microseconds, but I'm trying to figure out if I can reduce this further.
Any advice on how to achieve this / if possible would be greatly appreciated.
Since I know that the frequency of the timer often depends on the OS: I currently work on Linux, but I will use Windows if the grain size is better (although I do not believe that this is based on what I found for QueryPerformanceCounter)
I perform all measurements on bare metals (without VM). /proc/timer_info confirms the resolution of the nanosecond timer for my processor (but I know that it does not translate to the resolution of the nanosecond alarm)
Current
My current code can be found here as a Gist
Currently, I can execute the request every 5 microseconds (5000 nanoseconds) with less than 1% late arrival. When late arrivals occur, they usually only make up one cycle (5000 nanoseconds).
I am doing 3 things now
Real-time process setup (some of them are marked @ Spudd86 here )
struct sched_param schedparm; memset(&schedparm, 0, sizeof(schedparm)); schedparm.sched_priority = 99;
Minimizing timer attenuation
prctl(PR_SET_TIMERSLACK, 1);
Using timerfds (part of the Linux 2.6 kernel)
int timerfd = timerfd_create(CLOCK_MONOTONIC,0); struct itimerspec timspec; bzero(&timspec, sizeof(timspec)); timspec.it_interval.tv_sec = 0; timspec.it_interval.tv_nsec = nanosecondInterval; timspec.it_value.tv_sec = 0; timspec.it_value.tv_nsec = 1; timerfd_settime(timerfd, 0, &timspec, 0);
Possible improvements
- Allocate a processor for this process?
- Use a non-blocking timerfd so that I can create a hard loop instead of a lock (a hard loop will waste more CPU, but it may also be faster to respond to an alarm)
- Using an external built-in device to run (I canβt imagine why it would be better)
Why
I am currently working on a benchmarking workload generator. The workload generator simulates the arrival rate (X-requests / second, etc.) using the Poisson process. From the Poisson process, I can determine the relative times when requests should be made from the benchmarking mechanism.
So, for example, during 10 requests per second, we may have requests made at the address: t = 0.02, 0.04, 0.05, 0.056, 0.09 s.
These requests must be scheduled in advance and then completed. As the number of requests per second increases, the granularity needed to plan these requests increases (thousands of requests per second require accuracy in milliseconds). As a result, I try to figure out how to scale this system further.