TLB Pass Processing

I want to see which pages will be available to my program. Now, one way is to use mprotect with the SIGSEGV handler to notice the pages being accessed. However, this is due to the overhead of setting the protection bits for all the pages of memory that interest me.

The second method that comes to mind is to invalidate the Buffer Lookaside Buffer (TLB) at the beginning and then mark the misses. With each pass, I write down the address page of the memory and therefore write it down. Now the question is how to handle TLB gaps in user space for a linux program.

And if you know an even faster method than skipping TLB or mprotect for taking notes on dirty memory pages, kindly let me know. In addition, I only want a solution for x86.

+4
source share
3 answers

TLB is transparent to the user program, in most cases you can consider TLB skips as some performance counter (without addresses).

+6
source

I want to see which pages will be available to my program.

You can simulate a CPU and receive this data. Options:

  • 1) valgrind is a dynamic user space binary translator with good toolkit support. Try the cachegrind tool - it will emulate even L1 / L2 caches; You can also try to create a new tool for recording all memory accesses (for example, with page granularity).
  • 2) qemu is a dynamic translator, both system-wide and system-wide. No tools in the original qemu as I know.
  • 3) bochs - system processor emulator (very slow). You can easily crack a memory access code to get a memory log.
  • 4) PTLsim - www.ptlsim.org/papers/PTLsim-ISPASS-2007.pdf

However, this is due to the overhead of setting protection bits for all memory pages.

Is this too much overhead?

Now the question is how to handle TLB gaps in user space for a Linux program.

You cannot handle a pass or in user space or in kernel space (on x86 and many other popular platforms). This is because most platforms control TLB skipping in hardware :. MMU (part of the CPU / chipset) will go through the page tables and will receive the physical address transparently. Only if some bits are set or when the address area is not displayed, a page fault interrupt is generated and delivered to the kernel.

In addition, it seems that there is no way to drop TLB in modern processors ( but 386DX was able to do this )

You can try to detect a missed TLB by the entered delay. But this delay may be hidden due to the abnormal start of the TLB search.

In addition, most hardware events (memory access, tlb access, tlb hits, tlb misses) are counted using hardware performance monitoring (this part of the processor is used by Vtune, CodeAnalyst and oprofile). Unfortunately, these are only global event counters, and you cannot activate more than 2-4 events at a time. The good news is that you can set the perfmon counter to interrupt when any count is reached. You will then receive (via interrupt) the instruction address ($ eip) where the account was reached. Thus, you can find the TLB-miss-heavy hot spot with this hardware (it is found in every modern x86 processor, both Intel and amd).

+6
source

Take a look at the / proc / PID / maps file for your process. According to the documentation at http://www.kernel.org/doc/Documentation/filesystems/proc.txt , / proc / PID / maps sets a memory card for each process. On this map you will find out what pages my program accesses. However, it seems you want to know which of them are dirty pages. Although I'm not sure how to find the exact list of dirty pages, you can find how many pages are dirty by looking at the private dirty and general dirty fields in / proc / PID / smaps and dividing them into pages. Please note that this method is pretty fast. I believe that a rough idea of ​​which pages are dirty can be obtained by polling / proc / PID / maps periodically.

-1
source

All Articles