Sort by unique values ​​of multiple fields in a UNIX shell script

I am new to unix and would like to be able to do the following, but don't know how to do this.

Take a text file with lines such as:

TR=P567;dir=o;day=su;TI=12:10;stn=westborough;Line=worcester TR=P567;dir=o;day=su;TI=12:10;stn=westborough;Line=lowell TR=P567;dir=o;day=su;TI=12:10;stn=westborough;Line=worcester TR=P234;dir=o;day=su;TI=12:10;stn=westborough;Line=lowell TR=P234;dir=o;day=su;TI=12:10;stn=westborough;Line=lowell TR=P234;dir=o;day=su;TI=12:10;stn=westborough;Line=worcester 

And print this:

 TR=P567;dir=o;day=su;TI=12:10;stn=westborough;Line=worcester TR=P567;dir=o;day=su;TI=12:10;stn=westborough;Line=lowell TR=P234;dir=o;day=su;TI=12:10;stn=westborough;Line=lowell TR=P234;dir=o;day=su;TI=12:10;stn=westborough;Line=worcester 

I would like the script to be able to find all the lines for each TR value that have a unique Line value.

thanks

+1
source share
1 answer

Since you're apparently OK with a random selection of dir , day , TI and stn you can write:

 sort -u -t ';' -k 1,1 -k 6,6 -s < input_file > output_file 

Explanation:

  • The sort utility, "sorting lines of text files," allows you to sort / compare / merge lines from files. (See the GNU Coreutils documentation .)
  • The -u or --unique , "prints only the first of an equal start," tells sort that if the two input lines are equal, then you want only one of them.
  • The -k POS[,POS2] or --key=POS1[,POS2] , "run the key in POS1 (start 1), end it on POS2 (default is the end of the line)," tells sort , where the "keys" are what we want to sort by. In our case, -k 1,1 means that one key consists of the first field (from field 1 through field 1 ), and -k 6,6 means that one key consists of the sixth field (from field 6 through field 6 ).
  • The option -t SEP or --field-separator=SEP tells sort that we want to use SEP - in our case ';' - to separate and count fields. (Otherwise, it would be clear that the fields are separated by spaces, and in our case, it processes the entire line as a single field.)
  • The -s or --stabilize , "stabilize sorting by disabling last resort comparison" tells sort that we want to compare strings only as we specified; if two lines have the same β€œkeys” described above, then they are considered equivalent, even if they differ in other respects. Since we use -u , this means that it means that one of them will be dropped. (If we did not use -u , this would mean that sort would not reorder them relative to each other.)
+3
source

All Articles