Which is faster, 'find -exec' or 'find | xargs -0'?

In my web application, I process the pages using a PHP script and then generate static HTML files from them. Static HTML is provided to users for better performance. HTML files eventually become obsolete and need to be deleted.

I discuss two ways to write an eviction script.

First, one find command is used, for example

find /var/www/cache -type f -mmin +10 -exec rm \{} \; 

The second form is laying through xargs, something like

 find /var/www/cache -type f -mmin +10 -print0 | xargs -0 rm 

The first form calls rm for each file found, and the second form simply sends all the file names to one rm (but the list of files can be very long).

Which shape will be faster?

In my case, the cache directory is shared between several web servers, so all this is done via NFS, if that matters for this problem.

+6
unix shell find xargs
source share
3 answers

I expect the xargs version to be a little faster since you are not creating a process for each file name. But I would be surprised if in practice there was a big difference. If you are worried about the long list of xargs sent to each rm call, you can use -l with xargs to limit the number of tokens it will use. However, xargs knows the longest cmdline and does not go beyond that.

+6
source share

The xargs version is significantly faster with more files than the -exec version, because you sent it because rm runs once for each file you want to delete, while xargs will collect as many files as possible together into one rm command .

With tens or hundreds of thousands of files, this can be the difference between a minute or less compared to most of the hour.

You can get the same behavior with -exec by running the command with "+" instead of "\;". This option is available only in later versions of find .

The following two examples are roughly equivalent:

 find . -print0 | xargs -0 rm find . -exec rm \{} + 

Please note that the xargs version will work somewhat faster (by several percent) in a multiprocessor system, since some of them can be parallelized. This is especially true if a large number of calculations are involved.

+13
source share

The find command has a built-in -delete option, maybe it can also be useful? http://lists.freebsd.org/pipermail/freebsd-questions/2004-July/051768.html

+2
source share

All Articles