Unix - random row selection based on column values

I have a file with ~ 1000 lines that looks like this:

ABC C5A 1 CFD D5G 4 E1E FDF 3 CFF VBV 1 FGH F4R 2 K8K F9F 3 ... etc 

I would like to select 100 random rows, but with 10 values ​​for every third column (so that random 10 rows from all rows with a value of β€œ1” in column 3, random 10 rows from all rows with a value of β€œ2” in column 3, etc. d.).

Is this possible with bash?

+4
source share
2 answers

If you can use awk , you can do the same with single-line

 sort -R file | awk '{if (count[$3] < 10) {count[$3]++; print $0}}' 
+3
source

First grep all the files with a specific number, shuffle them and select the first 10 using shuf -n 10 .

 for i in {1..10}; do grep " ${i}$" file | shuf -n 10 done > randomFile 

If you don't have shuf , use sort -R to randomly sort them:

 for i in {1..10}; do grep " ${i}$" file | sort -R | head -10 done > randomFile 
+7
source

All Articles