Best way to make basic statistics in the shell?

Question

Best way to make basic statistics in the shell?

There are so many goodies that come with the modern Unix shell environment that I need, is almost always installed on my machine, or it loads quickly; the only problem is to find it. In this case, I am trying to find the basic statistical operations.

For example, right now I am creating a prototype of a crawler based application. Thanks to wget plus some other goodies, I now have several hundred thousand files. Therefore, I can estimate the cost of this with billions of files, I would like to get the average and median file sizes over a certain limit. For example:.

% ls -l | perl -ne '@a=split(/\s+/); next if $a[4] <100; print $a[4], "\n"' > sizes % median sizes % mean sizes

Of course, I could encode my own median and middle bits in a bit of perl or awk. But isn’t there any kind of noob friendly package that does this and much more?

+6

unix shell r statistics

William Pietri Nov 09 '10 at 20:06

source share

1 answer

Dirk eddelbuettel · Accepted Answer · 2010-11-09T20:19:04+0000

Can you install R ? Then littler and his r team can help:

 ~/svn/littler/examples$ ls -l . | awk '!/^total/ {print $5}' 87 1747 756 988 959 871 ~/svn/littler/examples$ ls -l . | awk '!/^total/ {print $5}' | ./fsizes.r Min. 1st Qu. Median Mean 3rd Qu. Max. 87 785 915 901 981 1750 The decimal point is 3 digit(s) to the right of the | 0 | 1 0 | 89 1 | 00 1 | 7 ~/svn/littler/examples$ cat fsizes.r #!/usr/bin/r -i fsizes <- as.integer(readLines()) print(summary(fsizes)) stem(fsizes)

This is an example that we used before, therefore, the R-function summary() , which contains median() and mean() , as well as the plot of excised art stem . A generalization simply called median() or mean() , of course, quite simple.

Best way to make basic statistics in the shell?

More articles: