I noticed that the similarity operator that you have on your matrices defines a metric space . I.e:
- D (M1, M2) = 0 if and only if M1 = M2
- D (M1, M2)? 0 for any M1, M2.
- D (M1, M2) = D (M2, M1) and
- D (M1, M3)? D (M1, M2) + D (M2, M3) ( triangle inequality )
As a result, one way that you could apparently store all your data is a tree of metric spaces, a type of data structure for storing objects in metric space, which makes it easy to find all the "close" elements to some initial element.
Your data has the added benefit that it is a discrete metric space, which means that the distance function you provided always outputs an integral answer. That is, you will not have two matrices at a distance of 1.5 from each other, and you will not be able to get two matrices at a distance of & pi;
Therefore, you may want to save your matrices in a BK-tree . The BK tree is often used to store strings, but, as a rule, it can store elements in any discrete metric space. This allows you to search for nearest neighbors by individual matrices quite efficiently (usually without having to look at all the matrices in your collection), although admittedly it will not insert a doubly linked list through all of your elements.
Intuitively, the BK tree is structured as follows. Select the matrix of your choice as the "root node". Then compare each matrix in the collection with the root matrix and distribute it among the subtrees based on their distance from the root matrix. Then you recursively subdivide each of these subtrees equally. Due to triangle inequality, you can search for a BK tree for neighboring matrices using a simple recursive algorithm.
Hope this helps!
source share