There are so many goodies that come with the modern Unix shell environment that I need, is almost always installed on my machine, or it loads quickly; the only problem is to find it. In this case, I am trying to find the basic statistical operations.
For example, right now I am creating a prototype of a crawler based application. Thanks to wget plus some other goodies, I now have several hundred thousand files. Therefore, I can estimate the cost of this with billions of files, I would like to get the average and median file sizes over a certain limit. For example:.
% ls -l | perl -ne '@a=split(/\s+/); next if $a[4] <100; print $a[4], "\n"' > sizes % median sizes % mean sizes
Of course, I could encode my own median and middle bits in a bit of perl or awk. But isnβt there any kind of noob friendly package that does this and much more?
unix shell r statistics
William Pietri
source share