Perfect signal perf_event_open

I want to calculate (more or less) the exact amount of instructions for some piece of code. In addition, I want to receive a signal after a certain number of instructions transmitted.

For this purpose I use the overflow signal behavior provided by perf_event_open .

I use the second method that manpage offers to get overflow signals:

Signal overflow

Events can be set to deliver a signal when a threshold crosses. The signal handler is configured using polling (2), selection (2), epoll (2) and fcntl (2), system calls.

[...]

Another way is to use the PERF_EVENT_IOC_REFRESH ioctl. This ioctl adds a counter, which decreases every time the event overflows. If the signal POLL_IN is non-zero, it is sent during overflow, but after the value reaches 0, the signal is sent of type POLL_HUP and the base event is disabled.

Further explanation of PERF_EVENT_IOC_REFRESH ioctl:

PERF_EVENT_IOC_REFRESH

Non-forced overflow counters can use this to count for the number of overflows specified by the argument, after which it is disabled. Subsequent calls to this ioctl add the argument value to the current account. The signal with Set POLL_IN will occur at each overflow until the counter reaches 0; when this happens, the signal is set to POLL_HUP and the event is disabled. Using argument 0, undefined behavior is considered.

A very minimal example would look like this:

#define _GNU_SOURCE 1 #include <asm/unistd.h> #include <fcntl.h> #include <linux/perf_event.h> #include <signal.h> #include <stdio.h> #include <stdlib.h> #include <string.h> long perf_event_open(struct perf_event_attr* event_attr, pid_t pid, int cpu, int group_fd, unsigned long flags) { return syscall(__NR_perf_event_open, event_attr, pid, cpu, group_fd, flags); } static void perf_event_handler(int signum, siginfo_t* info, void* ucontext) { if(info->si_code != POLL_HUP) { // Only POLL_HUP should happen. exit(EXIT_FAILURE); } ioctl(info->si_fd, PERF_EVENT_IOC_REFRESH, 1); } int main(int argc, char** argv) { // Configure signal handler struct sigaction sa; memset(&sa, 0, sizeof(struct sigaction)); sa.sa_sigaction = perf_event_handler; sa.sa_flags = SA_SIGINFO; // Setup signal handler if (sigaction(SIGIO, &sa, NULL) < 0) { fprintf(stderr,"Error setting up signal handler\n"); perror("sigaction"); exit(EXIT_FAILURE); } // Configure perf_event_attr struct struct perf_event_attr pe; memset(&pe, 0, sizeof(struct perf_event_attr)); pe.type = PERF_TYPE_HARDWARE; pe.size = sizeof(struct perf_event_attr); pe.config = PERF_COUNT_HW_INSTRUCTIONS; // Count retired hardware instructions pe.disabled = 1; // Event is initially disabled pe.sample_type = PERF_SAMPLE_IP; pe.sample_period = 1000; pe.exclude_kernel = 1; // excluding events that happen in the kernel-space pe.exclude_hv = 1; // excluding events that happen in the hypervisor pid_t pid = 0; // measure the current process/thread int cpu = -1; // measure on any cpu int group_fd = -1; unsigned long flags = 0; int fd = perf_event_open(&pe, pid, cpu, group_fd, flags); if (fd == -1) { fprintf(stderr, "Error opening leader %llx\n", pe.config); perror("perf_event_open"); exit(EXIT_FAILURE); } // Setup event handler for overflow signals fcntl(fd, F_SETFL, O_NONBLOCK|O_ASYNC); fcntl(fd, F_SETSIG, SIGIO); fcntl(fd, F_SETOWN, getpid()); ioctl(fd, PERF_EVENT_IOC_RESET, 0); // Reset event counter to 0 ioctl(fd, PERF_EVENT_IOC_REFRESH, 1); // // Start monitoring long loopCount = 1000000; long c = 0; long i = 0; // Some sample payload. for(i = 0; i < loopCount; i++) { c += 1; } // End monitoring ioctl(fd, PERF_EVENT_IOC_DISABLE, 0); // Disable event long long counter; read(fd, &counter, sizeof(long long)); // Read event counter value printf("Used %lld instructions\n", counter); close(fd); } 

So basically I do the following:

  • Configuring a SIGIO signal handler
  • Create a new performance counter using perf_event_open (returns a file descriptor)
  • Use fcntl to add the send signal behavior to the file descriptor.
  • Run the payload loop to follow many instructions.

When executing the payload cycle, at some point 1000 instructions ( sample_interval ) will be executed. According to the perf_event_open manpage , this causes an overflow, which then decreases the internal counter. As soon as this counter reaches zero, "a signal is sent of type POLL_HUP, and the main event is disabled."

When a signal is sent, the control thread of the current process / thread stops and the signal handler is executed. Scenario:

  • 1000 instructions completed.
  • The event is automatically disabled and a signal is sent.
  • The signal is immediately delivered , the process control flow stops and the signal handler is executed.

This scenario will mean two things:

  • The final sum of instructions counted will always be equal for an example that does not use signals at all.
  • An instruction pointer that was stored for the signal handler (and can be accessed via ucontext ) will directly point to the instruction that caused the overflow.

In principle, we can say that the signal behavior can be considered as synchronous .

This is the perfect semantics for what I want to achieve.

However, as far as I know, the configured signal is usually quite asynchronous, and it may take some time until it is delivered and the signal handler is executed. This can create a problem for me.

For example, consider the following scenario:

  • 1000 instructions completed.
  • The event is automatically disabled and a signal is sent.
  • Other instructions are passed
  • The signal is delivered, the process control flow stops and the signal handler is executed.

This scenario will mean two things:

  • The final amount of instructions counted will be less than an example that does not use signals at all.
  • The instruction pointer that was stored for the signal handler will indicate the instructions that caused the overflow or any after .

So far, I have tested the above example, and have not followed the missing instructions that would support the first scenario.

However, I would really like to know if I can rely on this assumption or not. What happens in the core?

+8
linux signals perf
source share
1 answer

I want to calculate (more or less) the exact amount of instructions for some piece of code. In addition, I want to receive a signal after a certain number of instructions transmitted.

You have two tasks that may conflict with each other. When you want to get a count (the exact amount of any equipment), just use the CPU monitoring unit in the count mode (do not use the sample_period / sample_freq structure of perf_event_attr ) and put the measurement code in your target program (as was done in your example) . In this mode, according to the perf_event_open man page, no overflows will be generated (the PMU CPU usually has a 64-bit width and not overflow if it is not set to a small negative value when using the sampling mode):

Overflow is generated only by sampling events (sample_period should be non-zero).

To count part of the program, use the ioctl of perf_event_open returned by fd, as described in the man page

perf_event ioctl calls - various ioctls act on perf_event_open () file descriptors: PERF_EVENT_IOC_ENABLE ... PERF_EVENT_IOC_DISABLE ... PERF_EVENT_IOC_RESET

You can read the current value using rdpmc (on x86) or read syscall on fd, as in the short example from the man page :

  #include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <string.h> #include <sys/ioctl.h> #include <linux/perf_event.h> #include <asm/unistd.h> static long perf_event_open(struct perf_event_attr *hw_event, pid_t pid, int cpu, int group_fd, unsigned long flags) { int ret; ret = syscall(__NR_perf_event_open, hw_event, pid, cpu, group_fd, flags); return ret; } int main(int argc, char **argv) { struct perf_event_attr pe; long long count; int fd; memset(&pe, 0, sizeof(struct perf_event_attr)); pe.type = PERF_TYPE_HARDWARE; pe.size = sizeof(struct perf_event_attr); pe.config = PERF_COUNT_HW_INSTRUCTIONS; pe.disabled = 1; pe.exclude_kernel = 1; pe.exclude_hv = 1; fd = perf_event_open(&pe, 0, -1, -1, 0); if (fd == -1) { fprintf(stderr, "Error opening leader %llx\n", pe.config); exit(EXIT_FAILURE); } ioctl(fd, PERF_EVENT_IOC_RESET, 0); ioctl(fd, PERF_EVENT_IOC_ENABLE, 0); printf("Measuring instruction count for this printf\n"); /* Place target code here instead of printf */ ioctl(fd, PERF_EVENT_IOC_DISABLE, 0); read(fd, &count, sizeof(long long)); printf("Used %lld instructions\n", count); close(fd); } 

In addition, I want to receive a signal after a certain number of instructions transmitted.

Do you really want to receive a signal, or do you just need instruction pointers for every 1000 instructions executed? If you want to collect pointers, use perf_even_open with fetch mode, but do it from another program to disable event code collection. In addition, it will have less negative impact on your target program if you use no signals for each overflow (with a huge number of kernel and tracer interactions and switch from / to the kernel), but use perf_events to collect several overflow events instead into one mmap buffer and polling on this buffer. When the interrupt from the PMU is full, a perfection interrupt handler will be called to save the instruction pointer to the buffer, and then the count will be reset, and the program will return to execution. In your example, the interrupt handler will wake your program, it will make several system calls, return to the kernel, and then the kernel will restart the target code (so that the overhead for the sample is greater than using mmap and parsing it). With the flag precise_ip you can activate the extended sample of your PMU (if he has such a regime as PEBS and PREC_DIST in the Intel x86 / em64t for some counters such as INST_RETIRED, UOPS_RETIRED, BR_INST_RETIRED, BR_MISP_RETIRED, MEM_UOPS_RETIRED, MEM_LOAD_UOPS_RETIRED, MEM_LOAD_UOPS_LLC_HIT_RETIRED and simple burglary to cycles , or as an IBS AMD x86 / amd64; document on PEBS and IBS ) when the instruction address is stored directly by low-slip hardware. Some very advanced PMUs have the ability to do hardware sampling, storing information about overflow of several events in a line with an automatic reset counter without software interruptions (some descriptions on precise_ip in the same article ).

I don’t know whether it is possible to use two perf_event tasks in the perf_events subsystem and your CPU simultaneously: how to count events in the target process, and at the same time have a selection from another process. With an advanced PMU, this may be possible in hardware, and perf_events in a modern kernel can enable it. But you do not give any details about your version of the kernel and the provider and processor family, so we can not answer this part.

You can also try other APIs to access the PMU, such as PAPI or likwid ( https://github.com/RRZE-HPC/likwid ). Some of them can directly read PMU registers (sometimes MSR) and can allow sampling at the same time as counting.

+3
source share

All Articles