Using Perl, how can I sort an array using the value of the number inside each element of the array?

Let's say I have an array, @theArr, which contains about 1000 elements, such as:

01 '12 16 sj.1012804p1012831.93.gz' 02 '12 16 sj.1012832p1012859.94.gz' 03 '12 16 sj.1012860p1012887.95.gz' 04 '12 16 sj.1012888p1012915.96.gz' 05 '12 16 sj.1012916p1012943.97.gz' 06 '12 16 sj.875352p875407.01.gz' 07 '12 16 sj.875408p875435.02.gz' 08 '12 16 sj.875436p875535.03.gz' 09 '12 16 sj.875536p875575.04.gz' 10 '12 16 sj.875576p875603.05.gz' 11 '12 16 sj.875604p875631.06.gz' 12 '12 16 sj.875632p875659.07.gz' 13 '12 16 sj.875660p875687.08.gz' 14 '12 16 sj.875688p875715.09.gz' 15 '12 16 sj.875716p875743.10.gz' ... 

If my first set of numbers (between "sj." And "p") was always 6 digits, I would not have a problem. But, when the numbers are turned upside down by 7 digits, the default sorting stops working, since the larger 7-digit numbers arrive up to the smaller 6-digit number.

Is there any way to tell Perl to sort by this number inside the string in each element of the array?

+4
source share
4 answers

It sounds like you need a Schwartzian Transform :

 #!/usr/bin/perl use strict; use warnings; my @a = <DATA>; print map { $_->[1] } #get the original value back sort { $a->[0] <=> $b->[0] } #sort arrayrefs numerically on the sort value map { /sj\.(.*?)p/; [$1, $_] } #build arrayref of the sort value and orig @a; __DATA__ 12 16 sj.1012804p1012831.93.gz 12 16 sj.1012832p1012859.94.gz 12 16 sj.1012860p1012887.95.gz 12 16 sj.1012888p1012915.96.gz 12 16 sj.1012916p1012943.97.gz 12 16 sj.875352p875407.01.gz 12 16 sj.875408p875435.02.gz 12 16 sj.875436p875535.03.gz 12 16 sj.875536p875575.04.gz 12 16 sj.875576p875603.05.gz 12 16 sj.875604p875631.06.gz 12 16 sj.875632p875659.07.gz 12 16 sj.875660p875687.08.gz 12 16 sj.875688p875715.09.gz 12 16 sj.875716p875743.10.gz 
+18
source

You can use a regular expression to print the number from each line inside the block that you pass to the sort function:

 @newArray = sort { my ($anum,$bnum); $a =~ /sj\.([0-9]+)p/; $anum = $1; $b =~ /sj\.(\d+)p/; $bnum = $1; $anum <=> $bnum } @theArr; 

However, an hour. Owens solution is better because only regex matches one for each element.

+3
source

Here's an example that sorts them in ascending order, assuming you don't care too much about efficiency:

 use strict; my @theArr = split(/\n/, <<END_SAMPLE); 12 16 sj.1012804p1012831.93.gz 12 16 sj.1012832p1012859.94.gz 12 16 sj.1012860p1012887.95.gz 12 16 sj.1012888p1012915.96.gz 12 16 sj.1012916p1012943.97.gz 12 16 sj.875352p875407.01.gz 12 16 sj.875408p875435.02.gz 12 16 sj.875436p875535.03.gz 12 16 sj.875536p875575.04.gz 12 16 sj.875576p875603.05.gz END_SAMPLE my @sortedArr = sort compareBySJ @theArr; print "Before:\n".join("\n", @theArr)."\n"; print "After:\n".join("\n", @sortedArr)."\n"; sub compareBySJ { # Capture the values to compare, against the expected format # NOTE: This could be inefficient for large, unsorted arrays # since you'll be matching the same strings repeatedly my ($aVal) = $a =~ /^\d+\s+\d+\s+sj\.(\d+)p/ or die "Couldn't match against value $a"; my ($bVal) = $b =~ /^\d+\s+\d+\s+sj\.(\d+)p/ or die "Couldn't match against value $a"; # Return the numerical comparison of the values (ascending order) return $aVal <=> $bVal; } 

Outputs:

 Before: 12 16 sj.1012804p1012831.93.gz 12 16 sj.1012832p1012859.94.gz 12 16 sj.1012860p1012887.95.gz 12 16 sj.1012888p1012915.96.gz 12 16 sj.1012916p1012943.97.gz 12 16 sj.875352p875407.01.gz 12 16 sj.875408p875435.02.gz 12 16 sj.875436p875535.03.gz 12 16 sj.875536p875575.04.gz 12 16 sj.875576p875603.05.gz After: 12 16 sj.875352p875407.01.gz 12 16 sj.875408p875435.02.gz 12 16 sj.875436p875535.03.gz 12 16 sj.875536p875575.04.gz 12 16 sj.875576p875603.05.gz 12 16 sj.1012804p1012831.93.gz 12 16 sj.1012832p1012859.94.gz 12 16 sj.1012860p1012887.95.gz 12 16 sj.1012888p1012915.96.gz 12 16 sj.1012916p1012943.97.gz 
+2
source

Yes. The sort function accepts an optional comparison function that will be used to compare two elements. It can take the form of either a code block or the name of a function being called.

There is an example in the linked document similar to what you want to do:

 # inefficiently sort by descending numeric compare using # the first integer after the first = sign, or the # whole record case-insensitively otherwise @new = sort { ($b =~ /=(\d+)/)[0] <=> ($a =~ /=(\d+)/)[0] || uc($a) cmp uc($b) } @old; 
+1
source

All Articles