Is there a way to ignore header lines in UNIX sorting?

I have a fixed-width file that I am trying to sort using a UNIX sorting utility (Cygwin, in my case).

The problem is that there is a two-line header at the top of the file, which is sorted at the bottom of the file (since each header line starts with a colon).

Is there a way to tell sorting to either "pass the first two lines in unsorted", or specify an order that sorts the colon lines at the top - the rest of the lines always start with a 6-digit number (which is actually the key that I sort of), if that helps.

Example:

:0:12345 :1:6:2:3:8:4:2 010005TSTDOG_FOOD01 500123TSTMY_RADAR00 222334NOTALINEOUT01 477821USASHUTTLES21 325611LVEANOTHERS00 

should sort by:

 :0:12345 :1:6:2:3:8:4:2 010005TSTDOG_FOOD01 222334NOTALINEOUT01 325611LVEANOTHERS00 477821USASHUTTLES21 500123TSTMY_RADAR00 
+83
command-line sorting unix
Jan 28 '13 at 12:49 on
source share
12 answers
 (head -n 2 <file> && tail -n +3 <file> | sort) > newfile 

The brackets create a subshell, ending stdout so that you can migrate or redirect it, as if it came from a single command.

+97
Jan 28 '13 at 13:03
source share

If you don't mind using awk , you can use awk built-in channel features.

eg.

 extract_data | awk 'NR<3{print $0;next}{print $0| "sort -r"}' 

This prints the first two lines verbatim and the rest through sort .

Note that this has a very specific advantage: the ability to selectively sort parts of the input channel. all other suggested methods will only sort regular files that can be read several times. It works for anything.

+45
Mar 09 '14 at 11:54 on
source share

Here is the version that works with channel data:

 (read -r; printf "%s\n" "$REPLY"; sort) 

If your title has multiple lines:

 (for i in $(seq $HEADER_ROWS); do read -r; printf "%s\n" "$REPLY"; done; sort) 

This solution is from here.

+27
Dec 08 '14 at 23:11
source share

You can use tail -n +3 <file> | sort ... tail -n +3 <file> | sort ... (tail will print the contents of the file from the third line).

+6
Jan 28 '13 at 12:56
source share
 head -2 <your_file> && nawk 'NR>2' <your_file> | sort 

Example:

 > cat temp 10 8 1 2 3 4 5 > head -2 temp && nawk 'NR>2' temp | sort -r 10 8 5 4 3 2 1 
+4
Jan 28 '13 at 13:13
source share

Only 2 lines of code are required ...

 head -1 test.txt > a.tmp; tail -n+2 test.txt | sort -n >> a.tmp; 

Numeric data requires -n. For alpha sorting, -n is not required.

Example file:
$ cat test.txt

heading
8
5
one hundred
one
-one

Result:
$ cat a.tmp

heading
-one
one
5
8
one hundred

+3
Feb 01 '15 at 21:05
source share

So here is the bash function, where the arguments are exactly like sort. Support files and pipes.

 function skip_header_sort() { if [[ $# -gt 0 ]] && [[ -f ${@: -1} ]]; then local file=${@: -1} set -- "${@:1:$(($#-1))}" fi awk -vsargs="$*" 'NR<2{print; next}{print | "sort "sargs}' $file } 

How does it work. This line checks to see if there is at least one argument and whether the last argument is a file.

  if [[ $# -gt 0 ]] && [[ -f ${@: -1} ]]; then 

This saves the file as a separate argument. Since we're going to erase the last argument.

  local file=${@: -1} 

Here we will remove the last argument. Since we do not want to pass this as a sort argument.

  set -- "${@:1:$(($#-1))}" 

Finally, we execute the awk part by passing arguments (minus the last argument, if it was a file) to sort in awk. This was originally proposed by Dave and modified to accept sorting arguments. We rely on the fact that $file will be empty if we pipe, therefore it is ignored.

  awk -vsargs="$*" 'NR<2{print; next}{print | "sort "sargs}' $file 

An example of using a comma-separated file.

 $ cat /tmp/test A,B,C 0,1,2 1,2,0 2,0,1 # SORT NUMERICALLY SECOND COLUMN $ skip_header_sort -t, -nk2 /tmp/test A,B,C 2,0,1 0,1,2 1,2,0 # SORT REVERSE NUMERICALLY THIRD COLUMN $ cat /tmp/test | skip_header_sort -t, -nrk3 A,B,C 0,1,2 2,0,1 1,2,0 
+1
Feb 14 '18 at 22:37
source share

With Python:

 import sys HEADER_ROWS=2 for _ in range(HEADER_ROWS): sys.stdout.write(next(sys.stdin)) for row in sorted(sys.stdin): sys.stdout.write(row) 
0
Oct 21 '14 at 12:28
source share

Here's a bash shell function derived from other answers. It processes files and channels. The first argument is the file name or '-' for stdin. The remaining arguments are passed for sorting. A few examples:

 $ hsort myfile.txt $ head -n 100 myfile.txt | hsort - $ hsort myfile.txt -k 2,2 | head -n 20 | hsort - -r 

Shell Function:

 hsort () { if [ "$1" == "-h" ]; then echo "Sort a file or standard input, treating the first line as a header."; echo "The first argument is the file or '-' for standard input. Additional"; echo "arguments to sort follow the first argument, including other files."; echo "File syntax : $ hsort file [sort-options] [file...]"; echo "STDIN syntax: $ hsort - [sort-options] [file...]"; return 0; elif [ -f "$1" ]; then local file=$1; shift; (head -n 1 $file && tail -n +2 $file | sort $*); elif [ "$1" == "-" ]; then shift; (read -r; printf "%s\n" "$REPLY"; sort $*); else >&2 echo "Error. File not found: $1"; >&2 echo "Use either 'hsort <file> [sort-options]' or 'hsort - [sort-options]'"; return 1 ; fi } 
0
Jan 27 '15 at 7:26
source share

This is the same as Jan Sherbin's answer, but my implementation: -

 cut -d'|' -f3,4,7 $arg1 | uniq > filetmp.tc head -1 filetmp.tc > file.tc; tail -n+2 filetmp.tc | sort -t"|" -k2,2 >> file.tc; 
0
Mar 05 '16 at 7:56
source share

In simple cases, sed can do the job elegantly:

  your_script | (sed -u 1q; sort) 

or equivalently

  cat your_data | (sed -u 1q; sort) 

The key is in 1q - print the first line (header) and exit (leaving the remaining data for sort ).

For the given example, 2q will do a 2q thing.

The -u switch (unbuffered) is necessary for those sed (particularly GNU) that would otherwise read the input in chunks, thereby consuming the data that you want to pass through sort instead.

-one
May 15 '19 at 14:31
source share
 cat file_name.txt | sed 1d | sort 

This will do what you want.

-four
Mar 09 '16 at 12:22
source share



All Articles