Unix tool to remove duplicate lines from file

Question

Unix tool to remove duplicate lines from file

I have a tool that generates tests and predicts output. The idea is that if I have a failure, I can compare the prediction with the actual output and see where they diverge. The problem is that the actual output contains several lines twice, which confuses diff . I want to remove duplicates so that I can easily compare them. Basically, something like sort -u , but without sorting.

Is there any unix command line tool that can do this?

+13

command-line unix duplicates

Nathan Fellman Apr 14 '09 at 7:51

source share

5 answers

In addition to uniq answers that work great if you don't mind sort file. If you need to delete non-contiguous lines (or if you want to remove duplicates without reordering your file), you should do the following single-line Perl (stolen from here ):

 cat textfile | perl -ne '$H{$_}++ or print'

+24

Matt J Apr 14 '09 at 8:09

source share

If you are interested in removing adjacent duplicate lines, use uniq .

If you want to delete all duplicate rows, not just adjacent ones, then this is more complicated.

+1

Chris Jester-Young Apr 14 '09 at 7:53

source share

This is what I came up with while I was waiting for an answer here (although the first (and accepted) answer came in about 2 minutes). I used this substitution in VIM :

 %s/^\(.*\)\n\1$/\1/

This means: find the lines where, after the new line, we have the same as before, and replace them only with what we fixed in the first line.

uniq definitely simpler.

+1

Nathan Fellman Apr 14 '09 at 8:03

source share

Here is the awk implementation, if there is no / allow perl in the environment (you haven’t seen it yet)! PS: If there is more than one repeating line, this prints repeating outputs.

 awk '{ # Cut out the key on which duplicates are to be determined. key = substr($0,2,14) #If the key is not seen before, store in array,else print if ( ! s[key] ) s[key] = 1; else print key; }'

+1

Rishabh Sagar Jul 18 2018-11-18T00:

source share

The Archetypal Paul · Accepted Answer · 2009-04-14 07:53

uniq (1)

SYNTAX

uniq [OPTION] ... [INPUT [OUTPUT]]

DESCRIPTION

Discard all but one of consecutive identical lines from INPUT (or standard input), write to OUTPUT (or standard output).

Or, if you want to remove non-adjacent duplicate lines, this perl fragment will do this:

 while(<>) { print $_ if (!$seen{$_}); $seen{$_}=1; }

Unix tool to remove duplicate lines from file

More articles: