Perl
This code calculates the occurrences of all columns and prints a sorted report for each of them:
# columnvalues.pl while (<>) { @Fields = split /\s+/; for $i ( 0 .. $#Fields ) { $result[$i]{$Fields[$i]}++ }; } for $j ( 0 .. $#result ) { print "column $j:\n"; @values = keys %{$result[$j]}; @sorted = sort { $result[$j]{$b} <=> $result[$j]{$a} || $a cmp $b } @values; for $k ( @sorted ) { print " $k $result[$j]{$k}\n" } }
Save the text as columnvalues.pl
Run it as: perl columnvalues.pl files*
Description
In the top level loop:
* Loop through each line of combined input files
* Split string into @Fields array
* For each column, increase the data structure of the hash result array
At the top level for the loop:
* Loop over an array of results
* Print column number
* Get the values ββused in this column
* Sort values ββby number of occurrences
* Secondary sorting based on value (e.g. b vs g vs m vs z)
* Iterate over the hash of the result using a sorted list
* Print out the value and number of each event
Results based on sample input files provided by @Dennis
column 0: a 3 z 3 t 1 v 1 w 1 column 1: d 3 r 2 b 1 g 1 m 1 z 1 column 2: c 4 a 3 e 2
.csv input
If your input files are .csv, change /\s+/ to /,/
Obfuscation
In an ugly contest, Perl is especially well equipped.
This single line line does the same:
perl -lane 'for $i (0..$#F){$g[$i]{$F[$i]}++};END{for $j (0..$#g){print "$j:";for $k (sort{$g[$j]{$b}<=>$g[$j]{$a}||$a cmp $b} keys %{$g[$j]}){print " $k $g[$j]{$k}"}}}' files*
Chris Koknat Sep 16 '15 at 22:37 2015-09-16 22:37
source share