Unix - random row selection based on column values

Question

Unix - random row selection based on column values

I have a file with ~ 1000 lines that looks like this:

ABC C5A 1 CFD D5G 4 E1E FDF 3 CFF VBV 1 FGH F4R 2 K8K F9F 3 ... etc

I would like to select 100 random rows, but with 10 values for every third column (so that random 10 rows from all rows with a value of “1” in column 3, random 10 rows from all rows with a value of “2” in column 3, etc. d.).

Is this possible with bash?

+4

unix bash random

Abdel Feb 25 '13 at 10:49

source share

2 answers

First grep all the files with a specific number, shuffle them and select the first 10 using shuf -n 10 .

 for i in {1..10}; do grep " ${i}$" file | shuf -n 10 done > randomFile

If you don't have shuf , use sort -R to randomly sort them:

 for i in {1..10}; do grep " ${i}$" file | sort -R | head -10 done > randomFile

+7

dogbane Feb 25 '13 at 10:56

source share

user000001 · Accepted Answer · 2013-02-25T11:08:05+0000

If you can use awk , you can do the same with single-line

 sort -R file | awk '{if (count[$3] < 10) {count[$3]++; print $0}}'

Unix - random row selection based on column values

More articles: