I have two sets of integers Aand B(the size is Aless than or equal to B), and I want to answer the question: "How close is it Ato B?". The way I want to answer this question is to determine how far you have to go from given Ain Ato find Bin B.
The specific measure I want to do does the following: for each, Afind the closest B, only catch that when I compare Bwith A, I can no longer use what Bmatches any other A. (EDIT: the algorithm that I am trying to implement always prefers a shorter match. Therefore, if it Bis the closest neighbor to more than one A, choose the Aclosest to B. Of course, what if more than one Ahas the same distance to B, right now I choose A, which precedesB, but this is rather arbitrary and not necessarily optimal.) Measure I For these sets, the final product is a histogram showing the number of pairs on the vertical axis and the distance of the pairs along the x axis.
So, if you A = {1, 3, 4}and B = {1, 5, 6, 7}I get the following pairs a,b: 1,1, 4,5, 3,6. For this data, the histogram should show one pair with a distance of zero, one pair with a distance of 1, and one pair with a distance of 3.
(The actual amount of these sets has an upper boundary of approximately 100,000 cells, and I read from disk already sorted from low to high integers ranging from 1 to about 20 000 000. EDIT:. As elements Aand Bare unique, i.e., items are not repeated.)
, , . Perl, .
, A B, , , A, B , $hash{5} = {a=>1, b=>1}, 5 . ( A, $hash{5} = {a=>1}.)
A, -, A B, .
- - , , - $hash{6} = {b=>1, previous=>4, next=>8}. , A B.
, d=1, d, , , A.
:
for ($d=1; @a > 0; $d++) {
@left = ();
foreach $a in @a {
$next = $a;
while (exists $hash{$next}{next} && $next - $a < $d) {
$next = $hash{$next}{next};
}
if ($next is in B && $next - $a == $d) {
mark_in_measure($a, $next);
remove_from_linked_list($next);
remove_from_linked_list($a);
next;
}
$prev = $a;
...
push @left, $a;
}
@a = @left;
}
, B, A; , , , ( ). , , - .