How to sort tab File format based on K column length

Question

How to sort tab File format based on K column length

I have a delimited table file that looks like this:

>NODE 28 length 23 cov 11.043478 ACATCCCGTTACGGTGAGCCGAAAGACCTTATGTATTTTGTGG >NODE 32 length 21 cov 13.857142 ACAGATGTCATGAAGAGGGCATAGGCGTTATCCTTGACTGG >NODE 33 length 28 cov 14.035714 TAGGCGTTATCCTTGACTGGGTTCCTGCCCACTTCCCGAAGGACGCAC

How can I use Unix sort to sort by DNA sequence length [ATCG]?

+4

sorting linux unix bash awk

neversaint Jun 23 '10 at 2:05

source share

4 answers

If the length is in the 4th column, sort -n -k4 should do the trick.

If the answer is to determine the length, then you are looking for a preprocessing step before sorting. Maybe python, which just prints the length of a partition divided by 7th space, like the last or first column.

+6

Slartibartfast Jun 23 '10 at 2:11

source share

  awk '{print length($NF) $0|"sort -n"}' file | sed 's/^.[^>]*>/>/'

+1

ghostdog74 Jun 23 '10 at 2:58

source share

With Perl:

 perl -e' print sort { length +($a =~ /(\S+)$/)[0] <=> length +($b =~ /(\S+)$/)[0] } <>' infile

With GNU awk:

 WHINY_USERS= gawk 'END { for (L in l) print l[L] } { l[sprintf("%15s", length($NF))] = $0 }' infile

+1

Dimitre radoulov Jun 23 '10 at 15:09

source share

josephj1989 · Accepted Answer · 2010-06-23T02:24:37+0000

This pipeline command will determine the length as well. My Unix is a little rusty, doing other things for a while

 $ awk '{printf("%d %s\n", length($NF), $0)}' junk.lst|sort -n -k1,1|sed 's/^[0-9]* //'

How to sort tab File format based on K column length

More articles: