I am reporting a disk that uses File::Find to collect cumulative size in a directory tree.
What I get (easily) from File::Find is the name of the directory.
eg:.
/path/to/user/username/subdir/anothersubdir/etc
I run File::Find to collect the sizes below:
/path/to/user/username
And create a report on the total size of the directory and each of the subdirectories.
I currently have:
while ( $dir_tree ) { %results{$dir_tree} += $blocks * $block_size; my @path_arr = split ( "/", $dir_tree ); pop ( @path_arr ); $dir_tree = join ( "/", @path_arr ); }
(And yes, I know this is not very nice.).
The purpose of this is to do so when I stat each file, I add its size to the current node and to each parent node in the tree.
This is enough to generate:
username,300M username/documents,150M username/documents/excel,50M username/documents/word,40M username/work,70M username/fish,50M, username/some_other_stuff,30M
But I would now like to include this in JSON as follows:
{ "name" : "username", "size" : "307200", "children" : [ { "name" : "documents", "size" : "153750", "children" : [ { "name" : "excel", "size" : "51200" }, { "name" : "word", "size" : "81920" } ] } ] }
This is because I intend to visualize the D3 of this structure based on the D3 Zoomable Circle Pack
So my question is this: what is the most accurate way to map my data so that I can have cumulative (and ideally not cumulative) size information, but a hash hierarchy.
I thought in terms of the cursor approach (and used File::Spec this time):
use File::Spec; my $data; my $cursor = \$data; foreach my $element ( File::Spec -> splitdir ( $File::Find::dir ) ) { $cursor -> {size} += $blocks * $block_size; $cursor = $cursor -> {$element} }
Although ... this is not exactly the creation of the data structure I'm looking for, not least because we basically have to look for the hash key in order to “collapse” the part of the process.
Is there a better way to do this?
Edit is a more complete example of what I already have:
#!/usr/bin/env perl use strict; use warnings; use File::Find; use Data::Dumper; my $block_size = 1024; sub collate_sizes { my ( $results_ref, $starting_path ) = @_; $starting_path =~ s,/\w+$,/,; if ( -f $File::Find::name ) { print "$File::Find::name isafile\n"; my ($dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $size, $atime, $mtime, $ctime, $blksize, $blocks ) = stat($File::Find::name); my $dir_tree = $File::Find::dir; $dir_tree =~ s|^$starting_path||g; while ($dir_tree) { print "Updating $dir_tree\n"; $$results_ref{$dir_tree} += $blocks * $block_size; my @path_arr = split( "/", $dir_tree ); pop(@path_arr); $dir_tree = join( "/", @path_arr ); } } } my @users = qw ( user1 user2 ); foreach my $user (@users) { my $path = "/home/$user"; print $path; my %results; File::Find::find( { wanted => sub { \&collate_sizes( \%results, $path ) }, no_chdir => 1 }, $path ); print Dumper \%results;
And yes - I know this is not portable, and does a few somewhat unpleasant things. Part of what I'm doing here is trying to improve this. (But for now, this is a Unix-based homedir structure, so great.)