File path to JSON data structure

I am reporting a disk that uses File::Find to collect cumulative size in a directory tree.

What I get (easily) from File::Find is the name of the directory.

eg:.

 /path/to/user/username/subdir/anothersubdir/etc 

I run File::Find to collect the sizes below:

 /path/to/user/username 

And create a report on the total size of the directory and each of the subdirectories.

I currently have:

 while ( $dir_tree ) { %results{$dir_tree} += $blocks * $block_size; my @path_arr = split ( "/", $dir_tree ); pop ( @path_arr ); $dir_tree = join ( "/", @path_arr ); } 

(And yes, I know this is not very nice.).

The purpose of this is to do so when I stat each file, I add its size to the current node and to each parent node in the tree.

This is enough to generate:

 username,300M username/documents,150M username/documents/excel,50M username/documents/word,40M username/work,70M username/fish,50M, username/some_other_stuff,30M 

But I would now like to include this in JSON as follows:

 { "name" : "username", "size" : "307200", "children" : [ { "name" : "documents", "size" : "153750", "children" : [ { "name" : "excel", "size" : "51200" }, { "name" : "word", "size" : "81920" } ] } ] } 

This is because I intend to visualize the D3 of this structure based on the D3 Zoomable Circle Pack

So my question is this: what is the most accurate way to map my data so that I can have cumulative (and ideally not cumulative) size information, but a hash hierarchy.

I thought in terms of the cursor approach (and used File::Spec this time):

 use File::Spec; my $data; my $cursor = \$data; foreach my $element ( File::Spec -> splitdir ( $File::Find::dir ) ) { $cursor -> {size} += $blocks * $block_size; $cursor = $cursor -> {$element} } 

Although ... this is not exactly the creation of the data structure I'm looking for, not least because we basically have to look for the hash key in order to “collapse” the part of the process.

Is there a better way to do this?

Edit is a more complete example of what I already have:

 #!/usr/bin/env perl use strict; use warnings; use File::Find; use Data::Dumper; my $block_size = 1024; sub collate_sizes { my ( $results_ref, $starting_path ) = @_; $starting_path =~ s,/\w+$,/,; if ( -f $File::Find::name ) { print "$File::Find::name isafile\n"; my ($dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $size, $atime, $mtime, $ctime, $blksize, $blocks ) = stat($File::Find::name); my $dir_tree = $File::Find::dir; $dir_tree =~ s|^$starting_path||g; while ($dir_tree) { print "Updating $dir_tree\n"; $$results_ref{$dir_tree} += $blocks * $block_size; my @path_arr = split( "/", $dir_tree ); pop(@path_arr); $dir_tree = join( "/", @path_arr ); } } } my @users = qw ( user1 user2 ); foreach my $user (@users) { my $path = "/home/$user"; print $path; my %results; File::Find::find( { wanted => sub { \&collate_sizes( \%results, $path ) }, no_chdir => 1 }, $path ); print Dumper \%results; #would print this to a file in the homedir - to STDOUT for convenience foreach my $key ( sort { $results{$b} <=> $results{$a} } keys %results ) { print "$key => $results{$key}\n"; } } 

And yes - I know this is not portable, and does a few somewhat unpleasant things. Part of what I'm doing here is trying to improve this. (But for now, this is a Unix-based homedir structure, so great.)

+6
source share
2 answers

If you use your own dir scan instead of using File :: Find, you naturally get the right structure.

 sub _scan { my ($qfn, $fn) = @_; my $node = { name => $fn }; lstat($qfn) or die $!; my $size = -s _; my $is_dir = -d _; if ($is_dir) { my @child_fns = do { opendir(my $dh, $qfn) or die $!; grep !/^\.\.?\z/, readdir($dh); }; my @children; for my $child_fn (@child_fns) { my $child_node = _scan("$qfn/$child_fn", $child_fn); $size += $child_node->{size}; push @children, $child_node; } $node->{children} = \@children; } $node->{size} = $size; return $node; } 

The rest of the code:

 #!/usr/bin/perl use strict; use warnings; no warnings 'recursion'; use File::Basename qw( basename ); use JSON qw( encode_json ); ... sub scan { _scan($_[0], basename($_[0])) } print(encode_json(scan($ARGV[0] // '.'))); 
+3
source

In the end, I did it as follows:

In File::Find required sub collate_sizes :

 my $cursor = $data; foreach my $element ( File::Spec->splitdir( $File::Find::dir =~ s/^$starting_path//r ) ) { $cursor->{$element}->{name} = $element; $cursor->{$element}->{size} += $blocks * $block_size; $cursor = $cursor->{$element}->{children} //= {}; } 

To create a hash of subdirectory names. (The name subelement is probably redundant, but whatever).

And then send it using (using JSON ):

 my $json_structure = { 'name' => $user, 'size' => $data->{$user}->{size}, 'children' => [], }; process_data_to_json( $json_structure, $data->{$user}->{children} ); open( my $json_out, '>', "homedir.json" ) or die $!; print {$json_out} to_json( $json_structure, { pretty => 1 } ); close($json_out); sub process_data_to_json { my ( $json_cursor, $data_cursor ) = @_; if ( ref $data_cursor eq "HASH" ) { print "Traversing $key\n"; my $newelt = { 'name' => $key, 'size' => $data_cursor->{$key}->{size}, }; push( @{ $json_cursor->{children} }, $newelt ); process_data_to_json( $newelt, $data_cursor->{$key}->{children} ); } } 
0
source

All Articles