Capturing user space assembly with ftrace and kprobes (using virtual address translation)?

I apologize for the long post, I have problems with its wording shorter. Also, it may be more suitable for Unix and Linux Stack Exchange, but I will try here first in SO, as there is an ftrace tag.

In any case, I would like to observe whether the machine instructions of the user program are executed in the context of the complete capture of function_graph using ftrace . One problem is that I need this for an older kernel:

 $ uname -a Linux mypc 2.6.38-16-generic #67-Ubuntu SMP Thu Sep 6 18:00:43 UTC 2012 i686 i686 i386 GNU/Linux 

... and there is no UPROBES in this release - which, like Uprobes in 3.5 [LWN.net] notes, should be able to do something similar. ( Until I need to fix the original kernel, I would like to try the kernel module built from a tree as User-Space Probes (Uprobes) [chunghwan.com] seems to demonstrate, but as far as I can see from 0: buffers based on Inode [LWN.net] , 2.6, probably will need a full patch )

However, in this version there are /sys/kernel/debug/kprobes and /sys/kernel/debug/tracing/kprobe_events ; and Documentation / trace / kprobetrace.txt implies that kprobe can be installed directly at the address; even if I cannot find an example anywhere about how this is used.

In any case, I would still not be sure which addresses to use - as a small example, I can say that I want to trace the beginning of the main function of the wtest.c program (see below). I can do this to compile and get the machine command build list:

 $ gcc -g -O0 wtest.c -o wtest $ objdump -S wtest | less ... 08048474 <main>: int main(void) { 8048474: 55 push %ebp 8048475: 89 e5 mov %esp,%ebp 8048477: 83 e4 f0 and $0xfffffff0,%esp 804847a: 83 ec 30 sub $0x30,%esp 804847d: 65 a1 14 00 00 00 mov %gs:0x14,%eax 8048483: 89 44 24 2c mov %eax,0x2c(%esp) 8048487: 31 c0 xor %eax,%eax char filename[] = "/tmp/wtest.txt"; ... return 0; 804850a: b8 00 00 00 00 mov $0x0,%eax } ... 

I would set ftrace logging through this script:

 sudo bash -c ' KDBGPATH="/sys/kernel/debug/tracing" echo function_graph > $KDBGPATH/current_tracer echo funcgraph-abstime > $KDBGPATH/trace_options echo funcgraph-proc > $KDBGPATH/trace_options echo 0 > $KDBGPATH/tracing_on echo > $KDBGPATH/trace echo 1 > $KDBGPATH/tracing_on ; ./wtest ; echo 0 > $KDBGPATH/tracing_on cat $KDBGPATH/trace > wtest.ftrace ' 

You can see the part (otherwise complicated) of the resulting ftrace debug record - Monitoring the recording of a hard disk in kernel space (with drivers / modules) - Unix and Linux Stack Exchange (from which I gave an example).

Basically, I need a printout in this ftrace when the first main commands - say, instructions in 0x8048474, 0x8048475, 0x8048477, 0x804847a, 0x804847d, 0x8048483 and 0x8048487 - execute (any) CPU. The problem is, as far as I can understand from the Anatomy of a program in memory: Gustavo Duarte , these addresses are virtual addresses, as seen from the point of view of the process itself (and I'm going to, the same perspective is shown /proc/PID/maps ) ... And apparently for krpobe_event do I need a physical address?

So, my idea is: if I can find the physical addresses corresponding to the virtual addresses of the disassembly of the program (for example, by encoding a kernel module that will take the pid and address and return the physical address through procfs) I could configure the addresses as a kind of "trace point "via /sys/kernel/debug/tracing/kprobe_events in the above script - and hopefully get them in the ftrace . Could this work in principle?

One of the problems with this, I found on Linux (ubuntu), the C language: converting virtual to physical address - stack overflow :

In the user code, you cannot find the physical address corresponding to the virtual address. This information is simply not exported outside the kernel. It can even change at any time, especially if the kernel decides to replace part of your process memory ....
Pass the virtual address to the kernel using systemcall / procfs and use vmalloc_to_pfn. Return the physical address through procfs / registers.

However, vmalloc_to_pfn does not seem trivial:

x86 64 - vmalloc_to_pfn returns a 32-bit address on a Linux 32 system. Why does it chop off the higher bits of the physical PAE address? - stack overflow

VA: 0xf8ab87fc PA using vmalloc_to_pfn: 0x36f7f7fc. But I really expect: 0x136f7f7fc.
...
The physical address takes from 4 to 5 GB. But I can’t get the exact physical address, I only get a chopped-off 32-bit address. Is there any other way to get the true physical address?

So, I'm not sure how reliably I can retrieve the physical addresses so that they are tracked by kprobes, all the more so because "it can even change at any time." But here, I would like to hope that, since the program is small and trivial, there is a reasonable probability that the program will not change during tracking, which will allow you to get the correct capture. ( Therefore, even if I need to run the debug script several times, if I can hope to get the β€œcorrect” capture once every 10 times (or even 100 times), I would be fine with it. ).

Note that I need output via ftrace , so that timestamps are expressed in the same domain (see Reliable Linux kernel timestamps (or setting them up) using usbmon and ftrace? - Stack overflow to illustrate the problem with timestamps). Thus, even if I could come up with a gdb script to run and track the program from user space (while ftrace capture was received at the same time) - I would like to avoid this, since the service data from gdb will be displayed in ftrace .

So in short:

  • Is the approach of obtaining (possibly using a separate kernel module) physical addresses from virtual (from disassembling executable) addresses - so they are used to run kprobe_event registered by ftrace - is it worth it to pursue? If so, are there any examples of kernel modules that can be used for this purpose of address translation?
  • Could I use the kernel module to "register" the callback / handler function while executing a specific memory address? Then I could just use trace_printk in this function to have the ftrace (or even without it, the name of the handler function should appear in the ftrace ), and it looks like there won't be a lot of overhead with this ...

In fact, in this 2007 publication, Jim Keniston is utrace-based uprums: the systemtap mailing list is 11. Uprobes Example (added to Documentation/uprobes.txt ), which seems to be exactly what is a kernel module that registers handler function. Unfortunately, it uses linux/uprobes.h ; and I only have kprobes.h in my /usr/src/linux-headers-2.6.38-16/include/linux/ . Also, on my system, even systemtap complains that CONFIG_UTRACE not turned on (see this comment ) ... So, if there is any other approach that I could use to get the debug trace as I want, without having to recompile kernel to get uprobes, it would be great to know ...


wtest.c :

 #include <stdio.h> #include <fcntl.h> // O_CREAT, O_WRONLY, S_IRUSR int main(void) { char filename[] = "/tmp/wtest.txt"; char buffer[] = "abcd"; int fd; mode_t perms = S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH; fd = open(filename, O_RDWR|O_CREAT, perms); write(fd,buffer,4); close(fd); return 0; } 
+4
debugging linux linux-kernel ftrace
source share
1 answer

Obviously, this would be much simpler with built-in extensions on 3.5+ kernels; but taking into account the fact that for my 2.6.38 kernel there are very complex settings that I could not isolate in a separate kernel module to avoid fixing the kernel), this is what I can note for a separate module on 2.6.38. (Since I am still not sure about many things, I would still like to receive an answer that would correct any misunderstandings in this post.)

I think I'm somewhere, but not with kprobes . I'm not sure, but it seems to me that I managed to get the physical addresses; however, the kprobes documentation is such that when using "@ADDR: memory fetch in ADDR (ADDR should be in the kernel)"; and the physical addresses that I get are below the kernel border 0xc0000000 (but then 0xc0000000 usually matches the layout of virtual memory?).

So, instead, I used a hardware breakpoint - the module is lower, but caveat emptor - it behaves randomly and can sometimes call the oops! Kernel. Compiling the module and working in bash :

 $ sudo bash -c 'KDBGPATH="/sys/kernel/debug/tracing" ; echo function_graph > $KDBGPATH/current_tracer ; echo funcgraph-abstime > $KDBGPATH/trace_options echo funcgraph-proc > $KDBGPATH/trace_options ; echo 8192 > $KDBGPATH/buffer_size_kb ; echo 0 > $KDBGPATH/tracing_on ; echo > $KDBGPATH/trace' $ sudo insmod ./callmodule.ko && sleep 0.1 && sudo rmmod callmodule && \ tail -n25 /var/log/syslog | tee log.txt && \ sudo cat /sys/kernel/debug/tracing/trace >> log.txt 

... I get a magazine. I want to trace the first two main() of wtest , which are for me:

 $ objdump -S wtest/wtest | grep -A3 'int main' int main(void) { 8048474: 55 push %ebp 8048475: 89 e5 mov %esp,%ebp 8048477: 83 e4 f0 and $0xfffffff0,%esp 

... on virtual addresses 0x08048474 and 0x08048475. In syslog output, I could get, say:

 ... [ 1106.383011] callmodule: parent task a: f40a9940 c: kworker/u:1 p: [14] s: stopped [ 1106.383017] callmodule: - wtest [9404] [ 1106.383023] callmodule: Trying to walk page table; addr task 0xEAE90CA0 ->mm ->start_code: 0x08048000 ->end_code: 0x080485F4 [ 1106.383029] callmodule: walk_ 0x8048000 callmodule: Valid pgd : Valid pud: Valid pmd: page frame struct is @ f63e5d80; *virtual (page_address) @ (null) (is_vmalloc_addr 0 virt_addr_valid 0 virt_to_phys 0x40000000) page_to_pfn 639ec page_to_phys 0x639ec000 [ 1106.383049] callmodule: walk_ 0x80483c0 callmodule: Valid pgd : Valid pud: Valid pmd: page frame struct is @ f63e5d80; *virtual (page_address) @ (null) (is_vmalloc_addr 0 virt_addr_valid 0 virt_to_phys 0x40000000) page_to_pfn 639ec page_to_phys 0x639ec000 [ 1106.383067] callmodule: walk_ 0x8048474 callmodule: Valid pgd : Valid pud: Valid pmd: page frame struct is @ f63e5d80; *virtual (page_address) @ (null) (is_vmalloc_addr 0 virt_addr_valid 0 virt_to_phys 0x40000000) page_to_pfn 639ec page_to_phys 0x639ec000 [ 1106.383083] callmodule: physaddr : (0x080483c0 ->) 0x639ec3c0 : (0x08048474 ->) 0x639ec474 [ 1106.383106] callmodule: 0x08048474 id [3] [ 1106.383113] callmodule: 0x08048475 id [4] [ 1106.383118] callmodule: (( 0x08048000 is_vmalloc_addr 0 virt_addr_valid 0 )) [ 1106.383130] callmodule: cont pid task a: eae90ca0 c: wtest p: [9404] s: runnable [ 1106.383147] initcall callmodule_init+0x0/0x1000 [callmodule] returned with preemption imbalance [ 1106.518074] callmodule: < exit 

... means that it has mapped virtual address 0x08048474 with physical address 0x639ec474. However, the physical is not used for hardware control points - there we can directly provide the virtual address register_user_hw_breakpoint ; however, we also need to provide the task_struct this process. With this, I can get something similar in ftrace output:

 ... 597.907256 | 1) wtest-5339 | | handle_mm_fault() { ... 597.907310 | 1) wtest-5339 | + 35.627 us | } 597.907311 | 1) wtest-5339 | + 46.245 us | } 597.907312 | 1) wtest-5339 | + 56.143 us | } 597.907313 | 1) wtest-5339 | 1.039 us | up_read(); 597.907317 | 1) wtest-5339 | 1.285 us | native_get_debugreg(); 597.907319 | 1) wtest-5339 | 1.075 us | native_set_debugreg(); 597.907322 | 1) wtest-5339 | 1.129 us | native_get_debugreg(); 597.907324 | 1) wtest-5339 | 1.189 us | native_set_debugreg(); 597.907329 | 1) wtest-5339 | | () { 597.907333 | 1) wtest-5339 | | /* callmodule: hwbp hit: id [3] */ 597.907334 | 1) wtest-5339 | 5.567 us | } 597.907336 | 1) wtest-5339 | 1.123 us | native_set_debugreg(); 597.907339 | 1) wtest-5339 | 1.130 us | native_get_debugreg(); 597.907341 | 1) wtest-5339 | 1.075 us | native_set_debugreg(); 597.907343 | 1) wtest-5339 | 1.075 us | native_get_debugreg(); 597.907345 | 1) wtest-5339 | 1.081 us | native_set_debugreg(); 597.907348 | 1) wtest-5339 | | () { 597.907350 | 1) wtest-5339 | | /* callmodule: hwbp hit: id [4] */ 597.907351 | 1) wtest-5339 | 3.033 us | } 597.907352 | 1) wtest-5339 | 1.105 us | native_set_debugreg(); 597.907358 | 1) wtest-5339 | 1.315 us | down_read_trylock(); 597.907360 | 1) wtest-5339 | 1.123 us | _cond_resched(); 597.907362 | 1) wtest-5339 | 1.027 us | find_vma(); 597.907364 | 1) wtest-5339 | | handle_mm_fault() { ... 

... where the paths corresponding to the assembly are marked with a breakpoint identifier. Fortunately, they immediately after another, as expected; however ftrace also captured some debug commands between them. In any case, this is what I wanted to see.

Here are some notes about the module:

  • Most of the module is located in the Run / call user space program and get its pid from the kernel module ; where the user process starts and get pid
    • Since we need to get to task_struct to get to pid; here I save both (which is redundant)
  • If function characters are not exported; if the character is in kallsyms , then I use a pointer to the address; other other functions are copied from the source
  • I did not know how to start the user space process, so after spawning I issue SIGSTOP (which by itself seems unreliable at this point) and sets the state to __TASK_STOPPED ),
    • I can still get the status of "runnable", where I do not expect it sometimes, however, if init fails, I noticed that wtest hangs in the process list long before it is finished, so I think it works .
  • To get the absolute / physical addresses, I used the page tables for the process hosts in Linux to go to the page corresponding to the virtual address and then dig the kernel Sources I found page_to_phys() to get to the address (internally via the page page number); LDD3 ch.15 helps with understanding the relationship between pfn and the physical address.
    • Since here I expect to have a physical address, I do not use PAGE_SHIFT, but I calculate the offsets directly from the assembly objdump assembly. I am not 100% sure, but this is true.
    • Please note (see also How to get the page structure from any address in the Linux kernel ), the module output says that the virtual address 0x08048000 not is_vmalloc_addr and virt_addr_valid ; I think this should tell me that neither vmalloc_to_pfn() nor virt_to_page() could be used to get to its physical address !?
  • Configuring kprobes for ftrace from kernel space is pretty complicated (function copying required)
    • An attempt to install kprobe on the physical addresses that I get (for example, 0x639ec474) is always obtained using "Unable to insert probe (-22)"
    • To check if the format is parsed, I try to use the kallsyms address of the tracing_on() function (0xc10bcf60) below; which seems to work - because it calls the fatal "BUG: while atomic planning" (apparently we shouldn't set breakpoints in module_init?). The error is fatal because it causes the kprobes directory to disappear from the debug ftrace directory
    • Just creating kprobe will not appear in the ftrace - it should also be included; I have the right code to include, but I never tried it because of a previous error
  • Finally, the breakpoint parameter from Observes a variable change (memory address) in the Linux kernel and traces the stack trace when changing?
    • I have never seen an example for setting a checkpoint for executable hardware; it continued to fail for me until, through a search of the kernel source code, I found that for HW_BREAKPOINT_X , attr.bp_len it was necessary to set the value sizeof(long)
    • If I try to printk the attr variable - from _init or from the handler - something will get seriously confused, and whatever variable I try to print further, I get the value 0x5 (or 0x48) for this (?!)
    • As I try to use one handler function for both breakpoints, the only reliable piece of information that survives from _init to the handler that can distinguish between the two seems to be bp->id
    • These identifiers are automatically assigned, and it seems that they are not re-declared if you unregistered checkpoints (I will not unregister them to avoid unnecessary ftrace printouts).

As for randomness, I think this is due to the fact that the process does not start in a stopped state; and by the time it is stopped, it is in a different state (or, quite possibly, I’m missing some kind of lock somewhere). Anyway, you can also expect in syslog :

 [ 1661.815114] callmodule: Trying to walk page table; addr task 0xEAF68CA0 ->mm ->start_code: 0x08048000 ->end_code: 0x080485F4 [ 1661.815319] callmodule: walk_ 0x8048000 callmodule: Valid pgd : Valid pud: Valid pmd: page frame struct is @ f5772000; *virtual (page_address) @ c0000000 (is_vmalloc_addr 0 virt_addr_valid 1 virt_to_phys 0x0) page_to_pfn 0 page_to_phys 0x0 [ 1661.815837] callmodule: walk_ 0x80483c0 callmodule: Valid pgd : Valid pud: Valid pmd: page frame struct is @ f5772000; *virtual (page_address) @ c0000000 (is_vmalloc_addr 0 virt_addr_valid 1 virt_to_phys 0x0) page_to_pfn 0 page_to_phys 0x0 [ 1661.816846] callmodule: walk_ 0x8048474 callmodule: Valid pgd : Valid pud: Valid pmd: page frame struct is @ f5772000; *virtual (page_address) @ c0000000 (is_vmalloc_addr 0 virt_addr_valid 1 virt_to_phys 0x0) page_to_pfn 0 page_to_phys 0x0 

... that is, even with the correct task pointer (judging by start_code), only 0x0 is obtained as a physical address. Sometimes you get the same result, but with start_code: 0x00000000 ->end_code: 0x00000000 . And sometimes, task_struct cannot be obtained even if the pid can:

 [ 833.380417] callmodule:c: pid 7663 [ 833.380424] callmodule: everything all right; pid 7663 (7663) [ 833.380430] callmodule: p is NULL - exiting [ 833.516160] callmodule: < exit 

Well, hopefully someone will comment and clarify some of the behavior of this module :)
Hope this helps someone
Hooray!

Makefile :

 EXTRA_CFLAGS=-g -O0 obj-m += callmodule.o all: make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules clean: make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean 

callmodule.c :

 #include <linux/module.h> #include <linux/slab.h> //kzalloc #include <linux/syscalls.h> // SIGCHLD, ... sys_wait4, ... #include <linux/kallsyms.h> // kallsyms_lookup, print_symbol #include <linux/highmem.h> // 'kmap_atomic' (via pte_offset_map) #include <asm/io.h> // page_to_phys (arch/x86/include/asm/io.h) struct subprocess_infoB; // forward declare // global variable - to avoid intervening too much in the return of call_usermodehelperB: static int callmodule_pid; static struct subprocess_infoB* callmodule_infoB; #define TRY_USE_KPROBES 0 // 1 // enable/disable kprobes usage code #include <linux/kprobes.h> // enable_kprobe // for hardware breakpoint: #include <linux/perf_event.h> #include <linux/hw_breakpoint.h> // define a modified struct (with extra fields) here: struct subprocess_infoB { struct work_struct work; struct completion *complete; char *path; char **argv; char **envp; int wait; //enum umh_wait wait; int retval; int (*init)(struct subprocess_info *info); void (*cleanup)(struct subprocess_info *info); void *data; pid_t pid; struct task_struct *task; unsigned long long last_page_physaddr; }; struct subprocess_infoB *call_usermodehelper_setupB(char *path, char **argv, char **envp, gfp_t gfp_mask); static inline int call_usermodehelper_fnsB(char *path, char **argv, char **envp, int wait, //enum umh_wait wait, int (*init)(struct subprocess_info *info), void (*cleanup)(struct subprocess_info *), void *data) { struct subprocess_info *info; struct subprocess_infoB *infoB; gfp_t gfp_mask = (wait == UMH_NO_WAIT) ? GFP_ATOMIC : GFP_KERNEL; int ret; populate_rootfs_wait(); infoB = call_usermodehelper_setupB(path, argv, envp, gfp_mask); printk(KBUILD_MODNAME ":a: pid %d\n", infoB->pid); info = (struct subprocess_info *) infoB; if (info == NULL) return -ENOMEM; call_usermodehelper_setfns(info, init, cleanup, data); printk(KBUILD_MODNAME ":b: pid %d\n", infoB->pid); // this must be called first, before infoB->pid is populated (by __call_usermodehelperB): ret = call_usermodehelper_exec(info, wait); // assign global pid (and infoB) here, so rest of the code has it: callmodule_pid = infoB->pid; callmodule_infoB = infoB; printk(KBUILD_MODNAME ":c: pid %d\n", callmodule_pid); return ret; } static inline int call_usermodehelperB(char *path, char **argv, char **envp, int wait) //enum umh_wait wait) { return call_usermodehelper_fnsB(path, argv, envp, wait, NULL, NULL, NULL); } static void __call_usermodehelperB(struct work_struct *work) { struct subprocess_infoB *sub_infoB = container_of(work, struct subprocess_infoB, work); int wait = sub_infoB->wait; // enum umh_wait wait = sub_info->wait; pid_t pid; struct subprocess_info *sub_info; // hack - declare function pointers int (*ptrwait_for_helper)(void *data); int (*ptr____call_usermodehelper)(void *data); // assign function pointers to verbatim addresses as obtained from /proc/kallsyms int killret; struct task_struct *spawned_task; ptrwait_for_helper = (void *)0xc1065b60; ptr____call_usermodehelper = (void *)0xc1065ed0; sub_info = (struct subprocess_info *)sub_infoB; if (wait == UMH_WAIT_PROC) pid = kernel_thread((*ptrwait_for_helper), sub_info, //(wait_for_helper, sub_info, CLONE_FS | CLONE_FILES | SIGCHLD); else pid = kernel_thread((*ptr____call_usermodehelper), sub_info, //(____call_usermodehelper, sub_info, CLONE_VFORK | SIGCHLD); spawned_task = pid_task(find_vpid(pid), PIDTYPE_PID); // stop/suspend/pause task killret = kill_pid(find_vpid(pid), SIGSTOP, 1); if (spawned_task!=NULL) { // does this stop the process really? spawned_task->state = __TASK_STOPPED; printk(KBUILD_MODNAME ": : exst %d exco %d exsi %d diex %d inex %d inio %d\n", spawned_task->exit_state, spawned_task->exit_code, spawned_task->exit_signal, spawned_task->did_exec, spawned_task->in_execve, spawned_task->in_iowait); } printk(KBUILD_MODNAME ": : (kr: %d)\n", killret); printk(KBUILD_MODNAME ": : pid %d (%p) (%s)\n", pid, spawned_task, (spawned_task!=NULL)?((spawned_task->state==-1)?"unrunnable":((spawned_task->state==0)?"runnable":"stopped")):"null" ); // grab and save the pid (and task_struct) here: sub_infoB->pid = pid; sub_infoB->task = spawned_task; switch (wait) { case UMH_NO_WAIT: call_usermodehelper_freeinfo(sub_info); break; case UMH_WAIT_PROC: if (pid > 0) break; /* FALLTHROUGH */ case UMH_WAIT_EXEC: if (pid < 0) sub_info->retval = pid; complete(sub_info->complete); } } struct subprocess_infoB *call_usermodehelper_setupB(char *path, char **argv, char **envp, gfp_t gfp_mask) { struct subprocess_infoB *sub_infoB; sub_infoB = kzalloc(sizeof(struct subprocess_infoB), gfp_mask); if (!sub_infoB) goto out; INIT_WORK(&sub_infoB->work, __call_usermodehelperB); sub_infoB->path = path; sub_infoB->argv = argv; sub_infoB->envp = envp; out: return sub_infoB; } #if TRY_USE_KPROBES // copy from /kernel/trace/trace_probe.c (is unexported) int traceprobe_command(const char *buf, int (*createfn)(int, char **)) { char **argv; int argc, ret; argc = 0; ret = 0; argv = argv_split(GFP_KERNEL, buf, &argc); if (!argv) return -ENOMEM; if (argc) ret = createfn(argc, argv); argv_free(argv); return ret; } // copy from kernel/trace/trace_kprobe.c?v=2.6.38 (is unexported) #define TP_FLAG_TRACE 1 #define TP_FLAG_PROFILE 2 typedef void (*fetch_func_t)(struct pt_regs *, void *, void *); struct fetch_param { fetch_func_t fn; void *data; }; typedef int (*print_type_func_t)(struct trace_seq *, const char *, void *, void *); enum { FETCH_MTD_reg = 0, FETCH_MTD_stack, FETCH_MTD_retval, FETCH_MTD_memory, FETCH_MTD_symbol, FETCH_MTD_deref, FETCH_MTD_END, }; // Fetch type information table * / struct fetch_type { const char *name; /* Name of type */ size_t size; /* Byte size of type */ int is_signed; /* Signed flag */ print_type_func_t print; /* Print functions */ const char *fmt; /* Fromat string */ const char *fmttype; /* Name in format file */ // Fetch functions * / fetch_func_t fetch[FETCH_MTD_END]; }; struct probe_arg { struct fetch_param fetch; struct fetch_param fetch_size; unsigned int offset; /* Offset from argument entry */ const char *name; /* Name of this argument */ const char *comm; /* Command of this argument */ const struct fetch_type *type; /* Type of this argument */ }; struct trace_probe { struct list_head list; struct kretprobe rp; /* Use rp.kp for kprobe use */ unsigned long nhit; unsigned int flags; /* For TP_FLAG_* */ const char *symbol; /* symbol name */ struct ftrace_event_class class; struct ftrace_event_call call; ssize_t size; /* trace entry size */ unsigned int nr_args; struct probe_arg args[]; }; static int probe_is_return(struct trace_probe *tp) { return tp->rp.handler != NULL; } static int probe_event_enable(struct ftrace_event_call *call) { struct trace_probe *tp = (struct trace_probe *)call->data; tp->flags |= TP_FLAG_TRACE; if (probe_is_return(tp)) return enable_kretprobe(&tp->rp); else return enable_kprobe(&tp->rp.kp); } #define KPROBE_EVENT_SYSTEM "kprobes" #endif // TRY_USE_KPROBES // <<<<<<<<<<<<<<<<<<<<<< static struct page *walk_page_table(unsigned long addr, struct task_struct *intask) { pgd_t *pgd; pte_t *ptep, pte; pud_t *pud; pmd_t *pmd; struct page *page = NULL; struct mm_struct *mm = intask->mm; callmodule_infoB->last_page_physaddr = 0ULL; // reset here, in case of early exit printk(KBUILD_MODNAME ": walk_ 0x%lx ", addr); pgd = pgd_offset(mm, addr); if (pgd_none(*pgd) || pgd_bad(*pgd)) goto out; printk(KBUILD_MODNAME ": Valid pgd "); pud = pud_offset(pgd, addr); if (pud_none(*pud) || pud_bad(*pud)) goto out; printk( ": Valid pud"); pmd = pmd_offset(pud, addr); if (pmd_none(*pmd) || pmd_bad(*pmd)) goto out; printk( ": Valid pmd"); ptep = pte_offset_map(pmd, addr); if (!ptep) goto out; pte = *ptep; page = pte_page(pte); if (page) { callmodule_infoB->last_page_physaddr = (unsigned long long)page_to_phys(page); printk( ": page frame struct is @ %p; *virtual (page_address) @ %p (is_vmalloc_addr %d virt_addr_valid %d virt_to_phys 0x%llx) page_to_pfn %lx page_to_phys 0x%llx", page, page_address(page), is_vmalloc_addr((void*)page_address(page)), virt_addr_valid(page_address(page)), (unsigned long long)virt_to_phys(page_address(page)), page_to_pfn(page), callmodule_infoB->last_page_physaddr); } //~ pte_unmap(ptep); out: printk("\n"); return page; } static void sample_hbp_handler(struct perf_event *bp, struct perf_sample_data *data, struct pt_regs *regs) { trace_printk(KBUILD_MODNAME ": hwbp hit: id [%llu]\n", bp->id ); //~ unregister_hw_breakpoint(bp); } // ---------------------- static int __init callmodule_init(void) { int ret = 0; char userprog[] = "/path/to/wtest"; char *argv[] = {userprog, "2", NULL }; char *envp[] = {"HOME=/", "PATH=/sbin:/usr/sbin:/bin:/usr/bin", NULL }; struct task_struct *p; struct task_struct *par; struct task_struct *pc; struct list_head *children_list_head; struct list_head *cchildren_list_head; char *state_str; unsigned long offset, taddr; int (*ptr_create_trace_probe)(int argc, char **argv); struct trace_probe* (*ptr_find_probe_event)(const char *event, const char *group); //int (*ptr_probe_event_enable)(struct ftrace_event_call *call); // not exported, copy #if TRY_USE_KPROBES char trcmd[256] = ""; struct trace_probe *tp; #endif //TRY_USE_KPROBES struct perf_event *sample_hbp, *sample_hbpb; struct perf_event_attr attr, attrb; printk(KBUILD_MODNAME ": > init %s\n", userprog); ptr_create_trace_probe = (void *)0xc10d5120; ptr_find_probe_event = (void *)0xc10d41e0; print_symbol(KBUILD_MODNAME ": symbol @ 0xc1065b60 is %s\n", 0xc1065b60); // shows wait_for_helper+0x0/0xb0 print_symbol(KBUILD_MODNAME ": symbol @ 0xc1065ed0 is %s\n", 0xc1065ed0); // shows ____call_usermodehelper+0x0/0x90 print_symbol(KBUILD_MODNAME ": symbol @ 0xc10d5120 is %s\n", 0xc10d5120); // shows create_trace_probe+0x0/0x590 ret = call_usermodehelperB(userprog, argv, envp, UMH_WAIT_EXEC); if (ret != 0) printk(KBUILD_MODNAME ": error in call to usermodehelper: %i\n", ret); else printk(KBUILD_MODNAME ": everything all right; pid %d (%d)\n", callmodule_pid, callmodule_infoB->pid); tracing_on(); // earlier, so trace_printk of handler is caught! // find the task: rcu_read_lock(); p = pid_task(find_vpid(callmodule_pid), PIDTYPE_PID); rcu_read_unlock(); if (p == NULL) { printk(KBUILD_MODNAME ": p is NULL - exiting\n"); return 0; } state_str = (p->state==-1)?"unrunnable":((p->state==0)?"runnable":"stopped"); printk(KBUILD_MODNAME ": pid task a: %pc: %sp: [%d] s: %s\n", p, p->comm, p->pid, state_str); // find parent task: par = p->parent; if (par == NULL) { printk(KBUILD_MODNAME ": par is NULL - exiting\n"); return 0; } state_str = (par->state==-1)?"unrunnable":((par->state==0)?"runnable":"stopped"); printk(KBUILD_MODNAME ": parent task a: %pc: %sp: [%d] s: %s\n", par, par->comm, par->pid, state_str); // iterate through parent (and our task's) child processes: rcu_read_lock(); // read_lock(&tasklist_lock); list_for_each(children_list_head, &par->children){ p = list_entry(children_list_head, struct task_struct, sibling); printk(KBUILD_MODNAME ": - %s [%d] \n", p->comm, p->pid); if (p->pid == callmodule_pid) { list_for_each(cchildren_list_head, &p->children){ pc = list_entry(cchildren_list_head, struct task_struct, sibling); printk(KBUILD_MODNAME ": - - %s [%d] \n", pc->comm, pc->pid); } } } rcu_read_unlock(); //~ read_unlock(&tasklist_lock); // NOTE: here p == callmodule_infoB->task !! printk(KBUILD_MODNAME ": Trying to walk page table; addr task 0x%X ->mm ->start_code: 0x%08lX ->end_code: 0x%08lX \n", (unsigned int) callmodule_infoB->task, callmodule_infoB->task->mm->start_code, callmodule_infoB->task->mm->end_code); walk_page_table(0x08048000, callmodule_infoB->task); // 080483c0 is start of .text; 08048474 start of main; for objdump -S wtest walk_page_table(0x080483c0, callmodule_infoB->task); walk_page_table(0x08048474, callmodule_infoB->task); if (callmodule_infoB->last_page_physaddr != 0ULL) { printk(KBUILD_MODNAME ": physaddr "); taddr = 0x080483c0; // .text offset = taddr - callmodule_infoB->task->mm->start_code; printk(": (0x%08lx ->) 0x%08llx ", taddr, callmodule_infoB->last_page_physaddr+offset); taddr = 0x08048474; // main offset = taddr - callmodule_infoB->task->mm->start_code; printk(": (0x%08lx ->) 0x%08llx ", taddr, callmodule_infoB->last_page_physaddr+offset); printk("\n"); #if TRY_USE_KPROBES // can't use this here (BUG: scheduling while atomic, if probe inserts) //~ sprintf(trcmd, "p:myprobe 0x%08llx", callmodule_infoB->last_page_physaddr+offset); // try symbol for c10bcf60 - tracing_on sprintf(trcmd, "p:myprobe 0x%08llx", (unsigned long long)0xc10bcf60); ret = traceprobe_command(trcmd, ptr_create_trace_probe); //create_trace_probe); printk("%s -- ret: %d\n", trcmd, ret); // try find probe and enable it (compiles, but untested): tp = ptr_find_probe_event("myprobe", KPROBE_EVENT_SYSTEM); if (tp != NULL) probe_event_enable(&tp->call); #endif //TRY_USE_KPROBES } hw_breakpoint_init(&attr); attr.bp_len = sizeof(long); //HW_BREAKPOINT_LEN_1; attr.bp_type = HW_BREAKPOINT_X ; attr.bp_addr = 0x08048474; // main sample_hbp = register_user_hw_breakpoint(&attr, (perf_overflow_handler_t)sample_hbp_handler, p); printk(KBUILD_MODNAME ": 0x08048474 id [%llu]\n", sample_hbp->id); // if (IS_ERR((void __force *)sample_hbp)) { int ret = PTR_ERR((void __force *)sample_hbp); printk(KBUILD_MODNAME ": Breakpoint registration failed (%d)\n", ret); //~ return ret; } hw_breakpoint_init(&attrb); attrb.bp_len = sizeof(long); attrb.bp_type = HW_BREAKPOINT_X ; attrb.bp_addr = 0x08048475; // first instruction after main sample_hbpb = register_user_hw_breakpoint(&attrb, (perf_overflow_handler_t)sample_hbp_handler, p); printk(KBUILD_MODNAME ": 0x08048475 id [%llu]\n", sample_hbpb->id); //45 if (IS_ERR((void __force *)sample_hbpb)) { int ret = PTR_ERR((void __force *)sample_hbpb); printk(KBUILD_MODNAME ": Breakpoint registration failed (%d)\n", ret); //~ return ret; } printk(KBUILD_MODNAME ": (( 0x08048000 is_vmalloc_addr %d virt_addr_valid %d ))\n", is_vmalloc_addr((void*)0x08048000), virt_addr_valid(0x08048000)); kill_pid(find_vpid(callmodule_pid), SIGCONT, 1); // resume/continue/restart task state_str = (p->state==-1)?"unrunnable":((p->state==0)?"runnable":"stopped"); printk(KBUILD_MODNAME ": cont pid task a: %pc: %sp: [%d] s: %s\n", p, p->comm, p->pid, state_str); return 0; } static void __exit callmodule_exit(void) { tracing_off(); //corresponds to the user space /sys/kernel/debug/tracing/tracing_on file printk(KBUILD_MODNAME ": < exit\n"); } module_init(callmodule_init); module_exit(callmodule_exit); MODULE_LICENSE("GPL"); 
+1
source share

All Articles