Compare consecutive rows and multiple columns in awk and randomly select one of the repeating rows

Question

Compare consecutive rows and multiple columns in awk and randomly select one of the repeating rows

I read the question: Compare consecutive lines in awk / (or python) and randomly select one of the repeating lines . Now I have one more question: How do I change the code if I want to make this comparison not only for the x value, but also for the y value or more columns? Maybe something like

if ($1 != prev) && ($2 != prev) ???

In other words: I want to compare if the x value and y value of the current line match the x-value AND y-value of the next consecutive lines.

Data:

 #xyz 1 1 11 10 10 12 10 10 17 4 4 14 20 20 15 20 88 16 20 99 17 20 20 22 5 5 19 10 10 20

The result should look like this:

 #xyz 1 1 11 10 10 17 4 4 14 20 20 15 20 88 16 20 99 17 20 20 22 5 5 19 10 10 20

or (due to random selection)

 #xyz 1 1 11 10 10 12 4 4 14 20 20 15 20 88 16 20 99 17 20 20 22 5 5 19 10 10 20

Code from the above link, which does the stuff for x values, but NOT for y values in the AND condition:

 $ cat tst.awk function prtBuf( idx) { if (cnt > 0) { idx = int((rand() * cnt) + 1) print buf[idx] } cnt = 0 } BEGIN { srand() } $1 != prev { prtBuf() } { buf[++cnt]=$0; prev=$1 } END { prtBuf() }

+5

bash awk sed

Jojo Jul 22 '16 at 23:09

source share

1 answer

Andrzej pronobis · Accepted Answer · 2016-07-23T17:42:10+0000

This should do it:

 function prtBuf(idx) { if (cnt > 0) { idx = int((rand() * cnt) + 1) print buf[idx] } cnt = 0 } BEGIN { srand() } $1 != prev1 || $2 != prev2 { prtBuf() } { buf[++cnt]=$0; prev1=$1; prev2=$2 } END { prtBuf() }

Compare consecutive rows and multiple columns in awk and randomly select one of the repeating rows

More articles: