A quick way to count the number of files in a directory containing hundreds of thousands of files

On a Solaris system that processes a large number of files and stores their information in a database (yes, I know that using a database is the fastest way to get information about the number of files that we have). I need a quick way to track files as they go through the system along the path of their storage in the database.

I am currently using a perl script that reads an array in a directory and then grabs the size of the array and sends it to the monitoring script. Unfortunately, as our system grows, this monitor becomes slower.

I am looking for a method that will work much faster, instead of pausing and updating every 15-20 seconds after performing the count operation in all the directories involved.

I am relatively confident that my bottleneck is the read directory in the array operation.

I don’t need any information about files, I don’t need sizes or file names, just the number of files in a directory.

In my code, I do not count hidden files or text files that I use to store configuration information. It would be great if this functionality was preserved, but, of course, is not required.

I found some links to counting inodes with C code or something in that direction, but I'm not very experienced in this area.

I would like to make this monitor as real as possible.

The perl code I use looks like this:

opendir (DIR, $currentDir) or die "Cannot open directory: $!"; @files = grep ! m/^\./ && ! /config_file/, readdir DIR; # skip hidden files and config files closedir(DIR); $count = @files; 
+8
unix directory perl count solaris
source share
2 answers

What you are doing right now reads the entire directory (more or less) in memory just to drop this content to count it. Avoid this by streaming the directory:

 my $count; opendir(my $dh, $curDir) or die "opendir($curdir): $!"; while (my $de = readdir($dh)) { next if $de =~ /^\./ or $de =~ /config_file/; $count++; } closedir($dh); 

Importantly, do not use glob() in any form. glob() will cost stat() every entry you don't need.

Now you can have much more complex and easier ways to do this depending on the capabilities of the OS or the capabilities of the file system (Linux, by comparison, offers inotify), but directory streaming, as mentioned above, is about the same good, ll get portable .

+9
source share

Keep it short.

 @files = readdir(DIR) - 2; The -2 is because readdir counts "." and ".." as directory entries. print @files . " files found\n"; exit; 

1 file found

-one
source share

All Articles