I am trying to do some filtering based on compositions on a large set of strings (protein sequence).
I wrote a group of three routines to take care of this, but I ran into difficulties in two ways: one small, one large. The small problem is that when I use List :: MoreUtils "in pairs" , I get warnings about use $aand $bonly once and they are not initialized. But I believe that I correctly find this method (based on the CPAN record for it and some examples from the Internet). The main problem is the error"Can't use string ("17/32") as HASH ref while "strict refs" in use..."
It seems that this can only happen if loop foreachin &compgives the hash value as a string instead of evaluating the division operation. I am sure that the rookie made a mistake, but I can not find the answer on the Internet. The first time I even looked at perl code, last Wednesday ...
use List::Util;
use List::MoreUtils;
my @alphabet = (
'A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I',
'L', 'K', 'M', 'F', 'P', 'S', 'T', 'W', 'Y', 'V'
);
my $gapchr = '-';
sub getcounts {
my %counts = ();
foreach my $chr (@alphabet) {
$counts{$chr} = ( $_[0] =~ tr/$chr/$chr/ );
}
$counts{'gap'} = ( $_[0] =~ tr/$gapchr/$gapchr/ );
return %counts;
}
sub comp {
my %comp = getcounts( $_[0] );
foreach my $chr (@alphabet) {
$comp{$chr} = $comp{$chr} / ( length( $_[0] ) - $comp{'gap'} );
}
return %comp;
}
sub dcomp {
my @dcomp = pairwise { $a - $b } @{ values( %{ comp( $_[0] ) } ) }, @{ values( %{ comp( $_[1] ) } ) };
@dcomp = apply { $_ ** 2 } @dcomp;
my $dcomp = sqrt( sum( 0, @dcomp ) ) / 20;
return $dcomp;
}
Thanks so much for any answers or tips!
source
share