Delete duplicates by variable without sorting

I have a variable that contains the following spaces, separated by spaces.

variable="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana" 

How to remove duplicates without sorting?

 #Something like this. new_variable="apple lemon papaya avocado grapes mango banana" 

I found somewhere a script that does duplicate variable deletion but sort the contents.

 #Not something like this. new_variable=$(echo "$variable"|tr " " "\n"|sort|uniq|tr "\n" " ") echo $new_variable apple avocado banana grapes lemon mango papaya 
+7
variables sorting unix bash shell
source share
8 answers
 new_variable=$( awk 'BEGIN{RS=ORS=" "}!a[$0]++' <<<$variable ); 

Here's how it works:

The RS (Input Record Separator) is set to empty space, so that it treats each fruit in the $ variable as a record instead of a field. Unique magic without sorting happens with! A [$ 0] ++. Since awk supports associative arrays, it uses the current entry ($ 0) as the key to a [] array. If this key has not yet been seen, the value [$ 0] evaluates to "0" (the default awk value for undefined indexes), which is then canceled to return TRUE. Then I use the fact that awk will "print $ 0" by default if the expression returns TRUE and no '{commands}'. Finally, [$ 0] is incremented, so that this key can no longer return TRUE, and therefore repeat values ​​are never printed. ORS (Output Record Separator) is also set to space to simulate the input format.

A less complex version of this command that produces the same output will be as follows:

 awk 'BEGIN{RS=ORS=" "}{ if (a[$0] == 0){ a[$0] += 1; print $0}}' 

Gotta love awk =)

EDIT

If you needed to do this in pure Bash 2.1+, I would suggest the following:

 #!/bin/bash variable="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana" temp="$variable" new_variable="${temp%% *}" while [[ "$temp" != ${new_variable##* } ]]; do temp=${temp//${temp%% *} /} new_variable="$new_variable ${temp%% *}" done echo $new_variable; 
+19
source share

This version of the pipeline works while maintaining the original order:

 variable=$(echo "$variable" | tr ' ' '\n' | nl | sort -u -k2 | sort -n | cut -f2-) 
+4
source share

Pure Bash:

 variable="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana" declare new_value='' for item in $variable; do if [[ ! $new_value =~ $item ]] ; then # first time? new_value="$new_value $item" fi done new_value=${new_value:1} # remove leading blank 
+3
source share

In a clean, portable sh :

 words="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana" seen= for word in $words; do case $seen in $word\ * | *\ $word | *\ $word\ * | $word) # already seen ;; *) seen="$seen $word" ;; esac done echo $seen
words="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana" seen= for word in $words; do case $seen in $word\ * | *\ $word | *\ $word\ * | $word) # already seen ;; *) seen="$seen $word" ;; esac done echo $seen 
+3
source share

shell

 declare -a arr variable="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana" set -- $variable count=0 for c in $@ do flag=0 for((i=0;i<=${#arr[@]}-1;i++)) do if [ "${arr[$i]}" == "$c" ] ;then flag=1 break fi done if [ "$flag" -eq 0 ] ; then arr[$count]="$c" count=$((count+1)) fi done for((i=0;i<=${#arr[@]}-1;i++)) do echo "result: ${arr[$i]}" done 

Startup Result:

 linux# ./myscript.sh result: apple result: lemon result: papaya result: avocado result: grapes result: mango result: banana 

OR if you want to use gawk

 awk 'BEGIN{RS=ORS=" "} (!($0 in a) ){a[$0];print}' 
+1
source share

Z Shell:

 % variable="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana" % print ${(zu)variable} apple lemon papaya avocado grapes mango banana 
+1
source share

Another awk solution:

 #!/bin/bash variable="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana" variable=$(printf '%s\n' "$variable" | awk -v RS='[[:space:]]+' '!a[$0]++{printf "%s%s", $0, RT}') variable="${variable%,*}" echo "$variable" 

Output:

 apple lemon papaya avocado grapes mango banana 
0
source share

Perl Solution:

perl -le 'for (@ARGV){ $h{$_}++ }; for (keys %h){ print $_ }' $variable

@ARGV - list of input parameters from $variable
Scroll through the hash h using the $_ loop variable
Scroll keys of hash h and type each

 grapes avocado apple lemon banana mango papaya 

This option displays the result, sorted first by the frequency $h{$a} <=> $h{$b} and then in alphabetical order $a cmp $b

perl -le 'for (@ARGV){ $h{$_}++ }; for (sort { $h{$a} <=> $h{$b} || $a cmp $b } keys %h){ print "$h{$_}\t$_" }' $variable

 1 banana 1 grapes 1 mango 2 apple 2 avocado 2 lemon 2 papaya 

This option gives the same result as the last.
However, instead of the input shell variable, the input fruit file is used with one fruit per line:

perl -lne '$h{$_}++; END{ for (sort { $h{$a} <=> $h{$b} || $a cmp $b } keys %h){ print "$h{$_}\t$_" } }' fruits

0
source share

All Articles