A fast (er) way to map a function to a database

I am working on a project in which I have a function in the image, described as a set of X and Y coordinates (5-10 points for each function) that are unique to this function. I also have a database with thousands of functions, each of which has the same type of descriptor. The result is as follows:

myFeature: (x1,y1), (x2,y2), (x3,y3)... myDatabase: Feature1: (x1,y1), (x2,y2), (x3,y3)... Feature2: (x1,y1), (x2,y2), (x3,y3)... Feature3: (x1,y1), (x2,y2), (x3,y3)... ... 

I want to find the best match for myFeature in myDatabase functions.

What is the fastest way to map these functions? Currently, I step every function into the database and compare every single point:

 bestScore = 0 for each feature in myDatabase: score = 0 for each point descriptor in MyFeature: find minimum distance from the current point to the... points describing the current feature in the database if the distance < threshold: there is a match to the current point in the target feature score += 1 if score > bestScore: save feature as new best match 

This search works, but it clearly painfully slows down in large databases. Does anyone know a faster method for this type of search, or at least if there is a way to quickly exclude functions that obviously won't match the handle?

+4
source share
2 answers

Create a bit set (an array of 1 and 0) from each function.

Create such a bitmask for your search criteria, and then just use a bitmask and compare the search mask with your functions.

With this approach, you can transfer most of the work to the routines responsible for maintaining the material. In addition, the creation of bitmasks should not be so computationally intensive.

If you just want to eliminate features that are completely out of sync, then your mask creation algorithm should take care of this and create bitmasks a bit fuzzy.

The easiest way to create such masks is probably to create a matrix the size of the matrix of your functions and put one in each coordinate set for this function, and zero in every coordinate that doesn't exist. Then we turn this matrix into a one-dimensional row. Compare the string function then with the search mask in different ways.

This is similar to how bitmap indexes work with large databases (e.g. oracle, for example), but with a different intent and without a complete bitmap of all database rows in memory.

The strength of this is in bitwise comparisons.

On a 32-bit computer, you can do 32 comparisons per instruction, if you can just do one with integers in a point comparison. This gives even more bonuses for floating point operations, depending on the architecture.

+2
source

This generally looks like a spatial index problem. This is not my field, but you probably need to create some kind of tree index, such as a quadrant, which you can use to easily search for functions. You can find links to this article on Wikipedia: http://en.wikipedia.org/wiki/Spatial_index

This may be a problem that you can easily implement in an existing spatial database. It is very similar to GIS in its description.

One thing you can do is calculate the gravity point for each function and use it to slightly reduce the search space (one-dimensional search is much easier to create an index), but it has the disadvantage of being just a heuristic (depending on the shape of your function, the point severity may be in strange places).

+1
source

All Articles