How to remove duplicate rows based on column value?

Question

How to remove duplicate rows based on column value?

Given the following table

123456.451 entered-auto_attendant 123456.451 duration:76 real:76 139651.526 entered-auto_attendant 139651.526 duration:62 real:62` 139382.537 entered-auto_attendant

Using a bash shell script based on Linux, I would like to delete all rows based on the value of column 1 (the one with the longest number). Given that this number is a variable number

I tried with

awk '{a[$3]++}!(a[$3]-1)' file

 sort -u | uniq

But I do not get a result that would be like this, making a comparison between all the values of the first column, delete all duplicates and show it

  123456.451 entered-auto_attendant 139651.526 entered-auto_attendant 139382.537 entered-auto_attendant

+7

linux bash awk delete-row

user3494949 Apr 3 '14 at 21:55

source share

4 answers

uniq compares the entire string by default. Since your lines are not identical, they are not deleted.

You can use sort to conveniently sort by the first field, and also remove duplicates of it:

 sort -t ' ' -k 1,1 -u file

-t ' ' fields are separated by spaces
-k 1,1 : look only at the first field
-u : remove duplicates

Also, you could see the awk '!a[$0]++' trick for string deduplication. You can do this deduction in the first column only with awk '!a[$1]++' .

+2

that other guy Apr 3 '14 at 10:03

source share

Using awk:

 awk '!($1 in a){a[$1]++; next} $1 in a' file 123456.451 duration:76 real:76 139651.526 duration:62 real:62

+1

anubhava Apr 3 '14 at 10:02

source share

try this command

 awk '!x[$1]++ { print $1, $2 }' file

+1

Yogesh deore Jul 22 '16 at 8:34

source share

Kent · Accepted Answer · 2014-04-03T22:58:09+0000

You did not give the expected result, does it work for you?

  awk '!a[$1]++' file

with your data, output:

 123456.451 entered-auto_attendant 139651.526 entered-auto_attendant 139382.537 entered-auto_attendant

and this line only displays the unique string column1:

  awk '{a[$1]++;b[$1]=$0}END{for(x in a)if(a[x]==1)print b[x]}' file

exit:

 139382.537 entered-auto_attendant

How to remove duplicate rows based on column value?

More articles: