How to find unique characters on an input line?

Question

How to find unique characters on an input line?

Is there a way to extract unique characters from each line?

I know that I can find unique lines of a file using

sort -u file

I would like to define unique characters for each line (something like sort -u for each line).

To clarify: given this input:

 111223234213 111111111111 123123123213 121212122212

I would like to get this output:

 1234 1 123 12

+7

bash grep awk sed

user1436187 Aug 21 '15 at 4:44

source share

7 answers

It does not receive things in the original order, but this single-line awk works:

 awk '{for(i=1;i<=length($0);i++){a[substr($0,i,1)]=1} for(i in a){printf("%s",i)} print "";delete a}' input.txt

Separate for easier reading, it can be standalone:

 #!/usr/bin/awk -f { # Step through the line, assigning each character as a key. # Repeated keys overwrite each other. for(i=1;i<=length($0);i++) { a[substr($0,i,1)]=1; } # Print items in the array. for(i in a) { printf("%s",i); } # Print a newline after we've gone through our items. print ""; # Get ready for the next line. delete a; }

Of course, the same concept can be implemented quite easily and in pure bash:

 #!/usr/bin/env bash while read s; do declare -A a while [ -n "$s" ]; do a[${s:0:1}]=1 s=${s:1} done printf "%s" "${!a[@]}" echo "" unset a done < input.txt

Note that this is dependent on bash 4, due to the associative array. And it really does things in the original order, because bash does a better job of keeping the array keys in order than awk.

And I think that you have a solution using sed from Jose, although it has a bunch of additional pipe connections. :)

The last tool you talked about was grep . I'm sure you can't do this in traditional grep, but maybe some brave soul can build a perl-regexp (i.e. grep -P ) option using -o and backlinks. They need more coffee than me, though now.

+3

ghoti Aug 21 '15 at 4:54

source share

Another solution

 while read line; do grep -o . <<< $line | sort -u | paste -s -d '\0' -; done < file

grep -o . convert row row to column row

sort -u sort letters and remove duplicate letters
paste -s -d '\0' - convert column row to row row
- as an argument to the file name to insert it in order to use standard input.

+3

Jose Ricardo Bustos M. Aug 21 '15 at 5:03

source share

One way to use perl :

 perl -F -lane 'print do { my %seen; grep { !$seen{$_}++ } @F }' file

Results:

 1234 1 123 12

+2

Steve Aug 21 '15 at 5:44

source share

This awk should work:

 awk -F '' '{delete a; for(i=1; i<=NF; i++) a[$i]; for (j in a) printf "%s", j; print ""}' file 1234 1 123 12

Here:

-F '' will split the char entry into char, giving us one character in $1 , $2 , etc.

Note: To use non-gnu awk:

 awk 'BEGIN{FS=""} {delete a; for(i=1; i<=NF; i++) a[$i]; for (j in a) printf "%s", j; print ""}' file

+1

anubhava Aug 21 '15 at 4:57

source share

This may work for you (GNU sed):

 sed 's/\B/\n/g;s/.*/echo "&"|sort -u/e;s/\n//g' file

Divide each line into a series of lines. Unique sorts these lines. Combine the result back into one line.

+1

potong Aug 21 '15 at 8:12

source share

A unique and sorted alternative to others using sed and gnu tools:

 sed 's/\(.\)/\1\n/g' file | sort | uniq

which produces one character per line; If you want them to be on the same line, just do:

 sed 's/\(.\)/\1\n/g' file | sort | uniq | sed ':a;N;$!ba;s/\n//g;'

This has the advantage that characters are displayed in sorted order, rather than in order of appearance.

0

Riot Mar 09 '17 at 20:51

source share

123 · Accepted Answer · 2015-08-21T08:29:10+0000

Using sed

 sed ':;s/\(.\)\(.*\)\1/\1\2/;t' file

Basically, what he does is grab a character and check if it is displayed anywhere else on the line. It also captures all characters between them. Then he replaces all of this, including the second occurrence with the first entry, and then what was between them.

t is a test and goes to the label : if the previous command was successful. Then this is repeated until the s/// command works, which means that only unique characters remain.

; just separates the teams.

 1234 1 123 12

Keeps order.

How to find unique characters on an input line?

More articles: