How to find unique characters on an input line?

Is there a way to extract unique characters from each line?

I know that I can find unique lines of a file using

sort -u file 

I would like to define unique characters for each line (something like sort -u for each line).

To clarify: given this input:

 111223234213 111111111111 123123123213 121212122212 

I would like to get this output:

 1234 1 123 12 
+7
bash grep awk sed
source share
7 answers

Using sed

 sed ':;s/\(.\)\(.*\)\1/\1\2/;t' file 

Basically, what he does is grab a character and check if it is displayed anywhere else on the line. It also captures all characters between them. Then he replaces all of this, including the second occurrence with the first entry, and then what was between them.

t is a test and goes to the label : if the previous command was successful. Then this is repeated until the s/// command works, which means that only unique characters remain.

; just separates the teams.

 1234 1 123 12 

Keeps order.

+5
source share

It does not receive things in the original order, but this single-line awk works:

 awk '{for(i=1;i<=length($0);i++){a[substr($0,i,1)]=1} for(i in a){printf("%s",i)} print "";delete a}' input.txt 

Separate for easier reading, it can be standalone:

 #!/usr/bin/awk -f { # Step through the line, assigning each character as a key. # Repeated keys overwrite each other. for(i=1;i<=length($0);i++) { a[substr($0,i,1)]=1; } # Print items in the array. for(i in a) { printf("%s",i); } # Print a newline after we've gone through our items. print ""; # Get ready for the next line. delete a; } 

Of course, the same concept can be implemented quite easily and in pure bash:

 #!/usr/bin/env bash while read s; do declare -A a while [ -n "$s" ]; do a[${s:0:1}]=1 s=${s:1} done printf "%s" "${!a[@]}" echo "" unset a done < input.txt 

Note that this is dependent on bash 4, due to the associative array. And it really does things in the original order, because bash does a better job of keeping the array keys in order than awk.

And I think that you have a solution using sed from Jose, although it has a bunch of additional pipe connections. :)

The last tool you talked about was grep . I'm sure you can't do this in traditional grep, but maybe some brave soul can build a perl-regexp (i.e. grep -P ) option using -o and backlinks. They need more coffee than me, though now.

+3
source share

Another solution

 while read line; do grep -o . <<< $line | sort -u | paste -s -d '\0' -; done < file 

grep -o . convert row row to column row

sort -u sort letters and remove duplicate letters
paste -s -d '\0' - convert column row to row row
- as an argument to the file name to insert it in order to use standard input.

+3
source share

One way to use perl :

 perl -F -lane 'print do { my %seen; grep { !$seen{$_}++ } @F }' file 

Results:

 1234 1 123 12 
+2
source share

This awk should work:

 awk -F '' '{delete a; for(i=1; i<=NF; i++) a[$i]; for (j in a) printf "%s", j; print ""}' file 1234 1 123 12 

Here:

-F '' will split the char entry into char, giving us one character in $1 , $2 , etc.

Note: To use non-gnu awk:

 awk 'BEGIN{FS=""} {delete a; for(i=1; i<=NF; i++) a[$i]; for (j in a) printf "%s", j; print ""}' file 
+1
source share

This may work for you (GNU sed):

 sed 's/\B/\n/g;s/.*/echo "&"|sort -u/e;s/\n//g' file 

Divide each line into a series of lines. Unique sorts these lines. Combine the result back into one line.

+1
source share

A unique and sorted alternative to others using sed and gnu tools:

 sed 's/\(.\)/\1\n/g' file | sort | uniq 

which produces one character per line; If you want them to be on the same line, just do:

 sed 's/\(.\)/\1\n/g' file | sort | uniq | sed ':a;N;$!ba;s/\n//g;' 

This has the advantage that characters are displayed in sorted order, rather than in order of appearance.

0
source share

All Articles