How to find the nearest neighbors for each item in the list?

Question

How to find the nearest neighbors for each item in the list?

I have two sets of integers Aand B(the size is Aless than or equal to B), and I want to answer the question: "How close is it Ato B?". The way I want to answer this question is to determine how far you have to go from given Ain Ato find Bin B.

The specific measure I want to do does the following: for each, Afind the closest B, only catch that when I compare Bwith A, I can no longer use what Bmatches any other A. (EDIT: the algorithm that I am trying to implement always prefers a shorter match. Therefore, if it Bis the closest neighbor to more than one A, choose the Aclosest to B. Of course, what if more than one Ahas the same distance to B, right now I choose A, which precedesB, but this is rather arbitrary and not necessarily optimal.) Measure I For these sets, the final product is a histogram showing the number of pairs on the vertical axis and the distance of the pairs along the x axis.

So, if you A = {1, 3, 4}and B = {1, 5, 6, 7}I get the following pairs a,b: 1,1, 4,5, 3,6. For this data, the histogram should show one pair with a distance of zero, one pair with a distance of 1, and one pair with a distance of 3.

(The actual amount of these sets has an upper boundary of approximately 100,000 cells, and I read from disk already sorted from low to high integers ranging from 1 to about 20 000 000. EDIT:. As elements Aand Bare unique, i.e., items are not repeated.)

, , . Perl, .

, A B, , , A, B , $hash{5} = {a=>1, b=>1}, 5 . ( A, $hash{5} = {a=>1}.)
A, -, A B, .
- - , , - $hash{6} = {b=>1, previous=>4, next=>8}. , A B.
, d=1, d, , , A.

:

for ($d=1; @a > 0; $d++) {
    @left = ();
    foreach $a in @a {
        $next = $a;
        # find closest b ahead of $a, stop searching if you pass $d
        while (exists $hash{$next}{next} && $next - $a < $d) {
            $next = $hash{$next}{next};
        }
        if ($next is in B && $next - $a == $d) {
            # found a pair at distance $d
            mark_in_measure($a, $next);
            remove_from_linked_list($next);
            remove_from_linked_list($a);
            next;
        }

        # do same thing looking behind $a
        $prev = $a;
        ...

        # you didn't find a match for $a
        push @left, $a;
    }
    @a = @left;
}

, B, A; , , , ( ). , , - .

+5

language-agnostic algorithm nearest-neighbor

flies 12 . '11 15:50

2

#!/usr/bin/perl

use strict;
use warnings FATAL => 'all';
use diagnostics;  

# http://www.hungarianalgorithm.com/solve.php?c=3-2-6-22--7-2-2-18--13-8-4-12--23-18-14-2&random=1
# https://www.topcoder.com/community/data-science/data-science-tutorials/assignment-problem-and-hungarian-algorithm/
# http://www.cse.ust.hk/~golin/COMP572/Notes/Matching.pdf

my @mat;
my @out_mat;

my $spaces    = 6;
my $precision = 0;

my $N = 10;
my $M = 12;
my $r = 100;

my @array1; my @array2;

for my $i (1..$N) {
    push @array1, sprintf( "%.${precision}f",  rand($r)  );
}

for my $i (1..$M) {
    push @array2, sprintf( "%.${precision}f",  rand($r)  );
}

#@array1 = ( 1, 3, 4);      # $mat[i]->[j] = abs( array1[i] - array2[j] )
#@array2 = ( 1, 5, 6, 7);

#                1     5     6     7

#     1     [    0*    4     5     6 ]

#     3     [    2     2*    3     4 ]

#     4     [    3     1     2*    3 ]

my $min_size  = $#array1 < $#array2 ? $#array1 : $#array2;
my $max_size  = $#array1 > $#array2 ? $#array1 : $#array2;

for (my $i = 0; $i < @array1; $i++){
   my @weight_function;
   for (my $j = 0; $j < @array2; $j++){
      my $dif = sprintf( "%.${precision}f", abs ($array1[$i] - $array2[$j])  );
      #my $dif = sprintf( "%.${precision}f", ($array1[$i] - $array2[$j])**2  ); 
      push @weight_function, $dif;
   }
   push @mat, \@weight_function;
}


# http://cpansearch.perl.org/src/TPEDERSE/Algorithm-Munkres-0.08/lib/Algorithm/Munkres.pm

Algorithm::Munkres::assign(\@mat,\@out_mat);


print "\n\@out_mat index  = [";
for my $index (@out_mat) {
   printf("%${spaces}d", $index);
}
print " ]\n";

print "\@out_mat values = [";

my %hash;
for my $i (0 .. $max_size){
   my $j = $out_mat[$i];
   last if ( $i > $min_size and $#array1 < $#array2 );
   next if ( $j > $min_size and $#array1 > $#array2 );
   my $dif = $mat[$i]->[$j];
   printf( "%${spaces}.${precision}f", $dif );
   $hash{ $dif } { $i } { 'index_array1' } = $i;
   $hash{ $dif } { $i } { 'index_array2' } = $j;
   $hash{ $dif } { $i } { 'value_array1' } = $array1[$i];
   $hash{ $dif } { $i } { 'value_array2' } = $array2[$j]; 
}

print " ]\n\n";


my $soma_da_dif = 0;

foreach my $min_diferenca ( sort { $a <=> $b } keys %hash ){
   foreach my $k ( sort { $a <=> $b } keys %{$hash{$min_diferenca}} ){
      $soma_da_dif += $min_diferenca;
      my $index_array1 = $hash{ $min_diferenca } { $k } { 'index_array1' };
      my $index_array2 = $hash{ $min_diferenca } { $k } { 'index_array2' };
      my $value_array1 = $hash{ $min_diferenca } { $k } { 'value_array1' };
      my $value_array2 = $hash{ $min_diferenca } { $k } { 'value_array2' };
      printf( "   index (%${spaces}.0f,%${spaces}.0f), values (%${spaces}.${precision}f,%${spaces}.${precision}f), dif = %${spaces}.${precision}f\n", 
              $index_array1, $index_array2, $value_array1, $value_array2, $min_diferenca );

   }
}
print "\n\nSum = $soma_da_dif\n";





#-------------------------------------------------#
#------------------ New-Package ------------------# 

{ # start scope block

package Algorithm::Munkres;

use 5.006;
use strict;
use warnings;

require Exporter;
our @ISA = qw(Exporter);
our @EXPORT = qw( assign );
our $VERSION = '0.08';

...
... <---- copy all the 'package Algorithm::Munkres' here
...

return $minval;
}

1;  # don't forget to return a true value from the file

} # end scope block

+1

Claudio Fsr 14 . '16 2:47

hugomg · Accepted Answer · 2011-10-12T16:59:13+0000

, ( ).

O (N ^ 3), , , .

How to find the nearest neighbors for each item in the list?

More articles: