Best way to make basic statistics in the shell?

There are so many goodies that come with the modern Unix shell environment that I need, is almost always installed on my machine, or it loads quickly; the only problem is to find it. In this case, I am trying to find the basic statistical operations.

For example, right now I am creating a prototype of a crawler based application. Thanks to wget plus some other goodies, I now have several hundred thousand files. Therefore, I can estimate the cost of this with billions of files, I would like to get the average and median file sizes over a certain limit. For example:.

% ls -l | perl -ne '@a=split(/\s+/); next if $a[4] <100; print $a[4], "\n"' > sizes % median sizes % mean sizes 

Of course, I could encode my own median and middle bits in a bit of perl or awk. But isn’t there any kind of noob friendly package that does this and much more?

+6
unix shell r statistics
source share
1 answer

Can you install R ? Then littler and his r team can help:

 ~/svn/littler/examples$ ls -l . | awk '!/^total/ {print $5}' 87 1747 756 988 959 871 ~/svn/littler/examples$ ls -l . | awk '!/^total/ {print $5}' | ./fsizes.r Min. 1st Qu. Median Mean 3rd Qu. Max. 87 785 915 901 981 1750 The decimal point is 3 digit(s) to the right of the | 0 | 1 0 | 89 1 | 00 1 | 7 ~/svn/littler/examples$ cat fsizes.r #!/usr/bin/r -i fsizes <- as.integer(readLines()) print(summary(fsizes)) stem(fsizes) 

This is an example that we used before, therefore, the R-function summary() , which contains median() and mean() , as well as the plot of excised art stem . A generalization simply called median() or mean() , of course, quite simple.

+8
source share

All Articles