How to sort by multiple fields with different field separators

I want to sort a file by multiple fields and multiple field separators. Please help. Here is my example data file:

$ cat Data3 My Text|50002/100/43 My Message|50001/100/7 Help Text|50001/100/7 Help Message|50002/100/11 Text Message|50001/100/63 Visible Text|50001/100/52 Invisible Text|50002/100/1 

The first field delimiter is the pipe symbol, and the second field delimiter is / . I want to sort this data first in the second field, and then that the data should be sorted in the order of the last field (split by / ). Finally, my sorted data should look like this:

 Help Text|50001/100/7 My Message|50001/100/7 Visible Text|50001/100/52 Text Message|50001/100/63 Invisible Text|50002/100/1 Help Message|50002/100/11 My Text|50002/100/43 

Using sort -k2,2n -t'|' I can sort by field 2 ( 50001/50002 ), but then within this value, how can I sort the last field (separated by / )?

+7
source share
4 answers

The simplest trick for this data set is processing the second column - the version number.

 $ cat Data3 | sort -k2,2V -t'|' Help Text|50001/100/7 My Message|50001/100/7 Visible Text|50001/100/52 Text Message|50001/100/63 Invisible Text|50002/100/1 Help Message|50002/100/11 My Text|50002/100/43 

However, this does not always work depending on your input. This will work because the values ​​in the second column are the same.

You could do what fedorqui suggested and run the sort twice, and the second time you make a stable look. From manpage: -s, --stable (stabilize sorting by disabling last resort comparison)

First sort by secondary sorting criteria. Then create a stable view that preserves the sort order in rows that use the same key from the main sort criteria.

 $ cat Data3 | sort -k3,3n -t'/' | sort -k2,2n -t'|' -s Help Text|50001/100/7 My Message|50001/100/7 Visible Text|50001/100/52 Text Message|50001/100/63 Invisible Text|50002/100/1 Help Message|50002/100/11 My Text|50002/100/43 

In this case, you're in luck, since -k2,2n -t ​​'|' will treat the second column β€œ50001/100/7” as a number, which is likely to be 50001. You may find yourself in strange situations if it were separated by commas rather than a slash, and you used a different language in your environment. For example, by default in my environment, I run en_US.UTF-8, which behaves as follows.

 $ cat Data3 | tr '/' ',' | sort -k3,3n -t',' | LC_NUMERIC=en_US.UTF-8 sort -k2,2n -t'|' -s Help Text|50001,100,7 My Message|50001,100,7 Invisible Text|50002,100,1 Visible Text|50001,100,52 Text Message|50001,100,63 Help Message|50002,100,11 My Text|50002,100,43 

What do you expect from this:

 $ cat Data3 | tr '/' ',' | sort -k3,3n -t',' | LC_NUMERIC=C sort -k2,2n -t'|' -s Help Text|50001,100,7 My Message|50001,100,7 Visible Text|50001,100,52 Text Message|50001,100,63 Invisible Text|50002,100,1 Help Message|50002,100,11 My Text|50002,100,43 
+9
source

The following code works for me as long as there are no extra characters '|' in the text .

tr '|' '/' | sort -n -t '/' -k3 -k4 | sed -re 's/^([^/]*)\/(.*)$/\1|\2/'

+4
source

You can use this (inefficient but simple) script:

 #!/usr/bin/perl print sort { @ka = split ?[|/]?, $a; @kb = split ?[|/]?, $b; $ka[1] <=> $kb[1] || $ka[3] <=> $kb[3] || $ka[0] cmp $kb[0] } <> 

You can omit the line || $ka[0] cmp $kb[0] || $ka[0] cmp $kb[0] if you do not need strings with equal values ​​to sort by text message.

+2
source

awk little trick

 $ cat Data3 | awk -F'[|/]' '{print $2"\t"$4"\t"$0}' | sort -k1 -k2 -n | cut -f3- Help Text|50001/100/7 My Message|50001/100/7 Visible Text|50001/100/52 Text Message|50001/100/63 Invisible Text|50002/100/1 Help Message|50002/100/11 My Text|50002/100/43 
  • you can use awk with all -F'[|/]' delimiters specified for the initial print of the sort keys $2"\t"$4 , and then print the input line $0
  • then do one sort with a few keys -k1 -k2 (note: not the same as -k1,2 )
  • then cut back to input line

universal for many scenarios

+2
source

All Articles