How to save file format if you use uniq command (in shell)?

Question

How to save file format if you use uniq command (in shell)?

To use the uniq command, you must first sort the file.

But in the file that I have, the order of information is important, so how can I keep the original file format, but still get rid of duplicate content?

+6

sorting unix file shell duplicates

Dennis Mar 13 '09 at 15:07

source share

7 answers

This awk saves the first occurrence. The same algorithm as the other answers uses:

 awk '!($0 in lines) { print $0; lines[$0]; }'

Here you only need to save duplicate lines (unlike all lines) with awk :

 sort file | uniq -d | awk ' FNR == NR { dups[$0] } FNR != NR && (!($0 in dups) || !lines[$0]++) ' - file

+4

Johannes Schaub - litb Mar 13 '09 at 15:18

source share

There is also a "line number, double sort" method.

  nl -n ln | sort -u -k 2| sort -k 1n | cut -f 2-

+4

ashawley Mar 13 '09 at 15:41

source share

You can run uniq -d in a sorted version of the file to find duplicate lines, and then run several scripts that say:

 if this_line is in duplicate_lines { if not i_have_seen[this_line] { output this_line i_have_seen[this_line] = true } } else { output this_line }

+1

chaos Mar 13 '09 at 15:15

source share

Using only uniq and grep:

Create d.sh:

 #!/bin/sh sort $1 | uniq > $1_uniq for line in $(cat $1); do cat $1_uniq | grep -m1 $line >> $1_out cat $1_uniq | grep -v $line > $1_uniq2 mv $1_uniq2 $1_uniq done; rm $1_uniq

Example:

 ./d.sh infile

+1

Wadih M. Mar 13 '09 at 16:08

source share

You can use some terrible thing O (n ^ 2) like this (Pseudocode):

 file2 = EMPTY_FILE for each line in file1: if not line in file2: file2.append(line)

This is potentially quite slow, especially if it is implemented at the Bash level. But if your files are short enough, it will probably work fine, and will be quickly implemented ( not line in file2 then just grep -v , etc.).

Otherwise, you could, of course, encode a dedicated program using a more advanced in-memory data structure to speed it up.

0

unwind Mar 13 '09 at 15:12

source share

 for line in $(sort file1 | uniq ); do grep -n -m1 line file >>out done; sort -n out

sort first

for each uniqe grep value for the first match (-m1)

and save line numbers

sort the result numerically (-n) by line number.

you can delete line # with sed or awk

0

Steve B. Mar 13 '09 at 15:21

source share

Dimitre radoulov · Accepted Answer · 2009-03-13T15:37:11+0000

Another awk version:

awk '!_[$0]++' infile

How to save file format if you use uniq command (in shell)?

More articles: