Unix tool to remove duplicate lines from file

I have a tool that generates tests and predicts output. The idea is that if I have a failure, I can compare the prediction with the actual output and see where they diverge. The problem is that the actual output contains several lines twice, which confuses diff . I want to remove duplicates so that I can easily compare them. Basically, something like sort -u , but without sorting.

Is there any unix command line tool that can do this?

+13
command-line unix duplicates
Apr 14 '09 at 7:51
source share
5 answers

uniq (1)

SYNTAX

uniq [OPTION] ... [INPUT [OUTPUT]]

DESCRIPTION

Discard all but one of consecutive identical lines from INPUT (or standard input), write to OUTPUT (or standard output).

Or, if you want to remove non-adjacent duplicate lines, this perl fragment will do this:

 while(<>) { print $_ if (!$seen{$_}); $seen{$_}=1; } 
+18
Apr 14 '09 at 7:53
source share

In addition to uniq answers that work great if you don't mind sort file. If you need to delete non-contiguous lines (or if you want to remove duplicates without reordering your file), you should do the following single-line Perl (stolen from here ):

 cat textfile | perl -ne '$H{$_}++ or print' 
+24
Apr 14 '09 at 8:09
source share

If you are interested in removing adjacent duplicate lines, use uniq .

If you want to delete all duplicate rows, not just adjacent ones, then this is more complicated.

+1
Apr 14 '09 at 7:53
source share

This is what I came up with while I was waiting for an answer here (although the first (and accepted) answer came in about 2 minutes). I used this substitution in VIM :

 %s/^\(.*\)\n\1$/\1/ 

This means: find the lines where, after the new line, we have the same as before, and replace them only with what we fixed in the first line.

uniq definitely simpler.

+1
Apr 14 '09 at 8:03
source share

Here is the awk implementation, if there is no / allow perl in the environment (you haven’t seen it yet)! PS: If there is more than one repeating line, this prints repeating outputs.

 awk '{ # Cut out the key on which duplicates are to be determined. key = substr($0,2,14) #If the key is not seen before, store in array,else print if ( ! s[key] ) s[key] = 1; else print key; }' 
+1
Jul 18 2018-11-18T00:
source share



All Articles