I want to calculate (more or less) the exact amount of instructions for some piece of code. In addition, I want to receive a signal after a certain number of instructions transmitted.
For this purpose I use the overflow signal behavior provided by perf_event_open .
I use the second method that manpage offers to get overflow signals:
Signal overflow
Events can be set to deliver a signal when a threshold crosses. The signal handler is configured using polling (2), selection (2), epoll (2) and fcntl (2), system calls.
[...]
Another way is to use the PERF_EVENT_IOC_REFRESH ioctl. This ioctl adds a counter, which decreases every time the event overflows. If the signal POLL_IN is non-zero, it is sent during overflow, but after the value reaches 0, the signal is sent of type POLL_HUP and the base event is disabled.
Further explanation of PERF_EVENT_IOC_REFRESH ioctl:
PERF_EVENT_IOC_REFRESH
Non-forced overflow counters can use this to count for the number of overflows specified by the argument, after which it is disabled. Subsequent calls to this ioctl add the argument value to the current account. The signal with Set POLL_IN will occur at each overflow until the counter reaches 0; when this happens, the signal is set to POLL_HUP and the event is disabled. Using argument 0, undefined behavior is considered.
A very minimal example would look like this:
#define _GNU_SOURCE 1 #include <asm/unistd.h> #include <fcntl.h> #include <linux/perf_event.h> #include <signal.h> #include <stdio.h> #include <stdlib.h> #include <string.h> long perf_event_open(struct perf_event_attr* event_attr, pid_t pid, int cpu, int group_fd, unsigned long flags) { return syscall(__NR_perf_event_open, event_attr, pid, cpu, group_fd, flags); } static void perf_event_handler(int signum, siginfo_t* info, void* ucontext) { if(info->si_code != POLL_HUP) { // Only POLL_HUP should happen. exit(EXIT_FAILURE); } ioctl(info->si_fd, PERF_EVENT_IOC_REFRESH, 1); } int main(int argc, char** argv) { // Configure signal handler struct sigaction sa; memset(&sa, 0, sizeof(struct sigaction)); sa.sa_sigaction = perf_event_handler; sa.sa_flags = SA_SIGINFO; // Setup signal handler if (sigaction(SIGIO, &sa, NULL) < 0) { fprintf(stderr,"Error setting up signal handler\n"); perror("sigaction"); exit(EXIT_FAILURE); } // Configure perf_event_attr struct struct perf_event_attr pe; memset(&pe, 0, sizeof(struct perf_event_attr)); pe.type = PERF_TYPE_HARDWARE; pe.size = sizeof(struct perf_event_attr); pe.config = PERF_COUNT_HW_INSTRUCTIONS; // Count retired hardware instructions pe.disabled = 1; // Event is initially disabled pe.sample_type = PERF_SAMPLE_IP; pe.sample_period = 1000; pe.exclude_kernel = 1; // excluding events that happen in the kernel-space pe.exclude_hv = 1; // excluding events that happen in the hypervisor pid_t pid = 0; // measure the current process/thread int cpu = -1; // measure on any cpu int group_fd = -1; unsigned long flags = 0; int fd = perf_event_open(&pe, pid, cpu, group_fd, flags); if (fd == -1) { fprintf(stderr, "Error opening leader %llx\n", pe.config); perror("perf_event_open"); exit(EXIT_FAILURE); } // Setup event handler for overflow signals fcntl(fd, F_SETFL, O_NONBLOCK|O_ASYNC); fcntl(fd, F_SETSIG, SIGIO); fcntl(fd, F_SETOWN, getpid()); ioctl(fd, PERF_EVENT_IOC_RESET, 0); // Reset event counter to 0 ioctl(fd, PERF_EVENT_IOC_REFRESH, 1); // // Start monitoring long loopCount = 1000000; long c = 0; long i = 0; // Some sample payload. for(i = 0; i < loopCount; i++) { c += 1; } // End monitoring ioctl(fd, PERF_EVENT_IOC_DISABLE, 0); // Disable event long long counter; read(fd, &counter, sizeof(long long)); // Read event counter value printf("Used %lld instructions\n", counter); close(fd); }
So basically I do the following:
- Configuring a SIGIO signal handler
- Create a new performance counter using
perf_event_open (returns a file descriptor) - Use
fcntl to add the send signal behavior to the file descriptor. - Run the payload loop to follow many instructions.
When executing the payload cycle, at some point 1000 instructions ( sample_interval ) will be executed. According to the perf_event_open manpage , this causes an overflow, which then decreases the internal counter. As soon as this counter reaches zero, "a signal is sent of type POLL_HUP, and the main event is disabled."
When a signal is sent, the control thread of the current process / thread stops and the signal handler is executed. Scenario:
- 1000 instructions completed.
- The event is automatically disabled and a signal is sent.
- The signal is immediately delivered , the process control flow stops and the signal handler is executed.
This scenario will mean two things:
- The final sum of instructions counted will always be equal for an example that does not use signals at all.
- An instruction pointer that was stored for the signal handler (and can be accessed via
ucontext ) will directly point to the instruction that caused the overflow.
In principle, we can say that the signal behavior can be considered as synchronous .
This is the perfect semantics for what I want to achieve.
However, as far as I know, the configured signal is usually quite asynchronous, and it may take some time until it is delivered and the signal handler is executed. This can create a problem for me.
For example, consider the following scenario:
- 1000 instructions completed.
- The event is automatically disabled and a signal is sent.
- Other instructions are passed
- The signal is delivered, the process control flow stops and the signal handler is executed.
This scenario will mean two things:
- The final amount of instructions counted will be less than an example that does not use signals at all.
- The instruction pointer that was stored for the signal handler will indicate the instructions that caused the overflow or any after .
So far, I have tested the above example, and have not followed the missing instructions that would support the first scenario.
However, I would really like to know if I can rely on this assumption or not. What happens in the core?