Proper use of linux inotify - reopen every time?

I am debugging a system boot problem that a client encounters on their production system, and they made a test application that simulates the load to reproduce the problem:

sexy graph

In this particular workload, one of the things the encoder did was this:

while(1) initialize inotify watch a directory for events receive event process event remove watch close inotify fd 

Oddly enough, high system load comes from close() inotify fd:

 inotify_init() = 4 <0.000020> inotify_add_watch(4, "/mnt/tmp/msys_sim/QUEUES/Child_032", IN_CREATE) = 1 <0.059537> write(1, "Child [032] sleeping\n", 21) = 21 <0.000012> read(4, "\1\0\0\0\0\1\0\0\0\0\0\0\20\0\0\0SrcFile.b8tlfT\0\0", 512) = 32 <0.231012> inotify_rm_watch(4, 1) = 0 <0.000044> close(4) = 0 <0.702530> open("/mnt/tmp/msys_sim/QUEUES/Child_032", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4 <0.000031> lseek(4, 0, SEEK_SET) = 0 <0.000010> getdents(4, /* 3 entries */, 32768) = 88 <0.000048> getdents(4, /* 0 entries */, 32768) = 0 <0.000009> write(1, "Child [032] dequeue [SrcFile.b8t"..., 37) = 37 <0.000011> unlink("/mnt/tmp/msys_sim/QUEUES/Child_032/SrcFile.b8tlfT") = 0 <0.059298> lseek(4, 0, SEEK_SET) = 0 <0.000011> getdents(4, /* 2 entries */, 32768) = 48 <0.000038> getdents(4, /* 0 entries */, 32768) = 0 <0.000009> close(4) = 0 <0.000012> inotify_init() = 4 <0.000020> inotify_add_watch(4, "/mnt/tmp/msys_sim/QUEUES/Child_032", IN_CREATE) = 1 <0.040385> write(1, "Child [032] sleeping\n", 21) = 21 <0.000903> read(4, "\1\0\0\0\0\1\0\0\0\0\0\0\20\0\0\0SrcFile.mQgUSh\0\0", 512) = 32 <0.023423> inotify_rm_watch(4, 1) = 0 <0.000012> close(4) = 0 <0.528736> 

What can cause close () calls for such a huge amount of time? I can identify two possible things:

  • closing and reinitializing each time if necessary.
  • There are 256 KB (flat) files in /mnt/tmp/msys_sim/QUEUES/Child_032 , and a specific file in /mnt/tmp/msys_sim/QUEUES/Child_032 bound to one in this directory. But SOURCES never open with the process described above.

Is the artifact of using inotify wrong? What can I say to say "What are you doing, it is WRONG!"?


Conclusion perf top (I was looking for this!)

 Events: 109K cycles 70.01% [kernel] [k] _spin_lock 24.30% [kernel] [k] __fsnotify_update_child_dentry_flags 2.24% [kernel] [k] _spin_unlock_irqrestore 0.64% [kernel] [k] __do_softirq 0.60% [kernel] [k] __rcu_process_callbacks 0.46% [kernel] [k] run_timer_softirq 0.40% [kernel] [k] rcu_process_gp_end 

Sweet! I suspect that somewhere there are spin blocks, and the whole system is very latent when this happens.

+6
source share
3 answers

Typically, the inotify pseudo-code loop will look like this:

 initialize inotify watch a directory | file for events while(receive event) { process event } [ remove watch ] close inotify fd 

There is no need to remove the clock and reinitialize inotify in each cycle.

+3
source

I tried to duplicate your problem. I do not get the same results as you. But yes, its wrong to use inotify. You usually initialize inotify, then read / poll from its clock descriptor.

I ran this with strace -T and did not approach this performance level on close() .

 #include <stdio.h> #include <stdint.h> #include <stdlib.h> #include <unistd.h> #include <string.h> #include <err.h> #include <sysexits.h> #include <fcntl.h> #include <sys/types.h> #include <sys/stat.h> #include <sys/inotify.h> #include <errno.h> #define WATCHDIR "./watched" void child_run(void) { printf("Child spawned..\n"); int fd; if (chdir(WATCHDIR)) err(EX_OSERR, "Cannot chdir in child"); /* Care not if this fails.. */ unlink("myfile.dat"); while (1) { fd = open("myfile.dat", O_CREAT|O_EXCL, S_IRUSR|S_IWUSR); if (fd < 0) { warn("Cannot create necessary file.. sleeping"); sleep(1); } close(fd); fd = -1; if (unlink("myfile.dat") < 0) err(EX_OSERR, "Cannot unlink file in watched directory"); } } int main() { int watch_fd = -1; int watched = -1; struct inotify_event ev[128]; memset(ev, 0, sizeof(&ev)*128); if (mkdir(WATCHDIR, S_IRWXU) < 0) { if (errno != EEXIST) { err(EX_OSERR, "Cannot create directory"); } } if (fork() == 0) { child_run(); exit(0); } while (1) { if ((watch_fd = inotify_init1(IN_CLOEXEC)) < 0) err(EX_OSERR, "Cannot init inotify"); if (watch_fd < 0) err(EX_OSERR, "Cannot init watch"); if ((watched = inotify_add_watch(watch_fd, WATCHDIR, IN_CREATE)) < 0) err(EX_OSERR, "Cannot add watched directory"); if (read(watch_fd, ev, sizeof(ev)*128) < 0) err(EX_OSERR, "Cannot read from watcher"); if (inotify_rm_watch(watch_fd, watched) < 0) err(EX_OSERR, "Cannot remove watch"); close(watch_fd); } return 0; } 

If you run this, will you get the same performance on this host?

+1
source

I found a smoking gun. From kernel profiling ( perf top is what I was looking for ):

 Events: 109K cycles 70.01% [kernel] [k] _spin_lock 24.30% [kernel] [k] __fsnotify_update_child_dentry_flags 2.24% [kernel] [k] _spin_unlock_irqrestore 0.64% [kernel] [k] __do_softirq 0.60% [kernel] [k] __rcu_process_callbacks 0.46% [kernel] [k] run_timer_softirq 0.40% [kernel] [k] rcu_process_gp_end 

Consuming 70% of our time in _spin_lock (remember, we suggested that this might be the reason) explains all the symptoms. The second entry on the list is probably the culprit:

http://lxr.free-electrons.com/source/fs/notify/fsnotify.c?a=sh#L52

Without careful code analysis, it seems that with the test case provided, this code will iterate over all 262K directory entries in SOURCES inside the kernel lock. This behavior is probably incorrect and comes from improper use of the inotify API.

Calling fs remount (with the test still running) makes it behave better:

 Events: 38K cycles 20.41% [kernel] [k] _spin_lock 17.43% [kernel] [k] _spin_unlock_irqrestore 12.40% [kernel] [k] __fsnotify_update_child_dentry_flags 6.44% [kernel] [k] run_timer_softirq 5.65% [kernel] [k] __do_softirq 5.18% [kernel] [k] update_shares 5.02% [kernel] [k] __rcu_process_callbacks 

But still not perfect.

+1
source

All Articles