R Tree 50,000 feet review?

I am working on a school project that involves taking a lat / long point and searching for the five best closest points in a known list of places. The list should be stored in memory with the caution that we must select the “appropriate data structure”, that is, we cannot just store all the places in the array and compare the distances one by one in a linear fashion. Teacher suggested grouping location data in the US state to prevent distance calculations for places that are clearly too far away. I think I can do better.

From my online research, it seems that R-Tree or one of its variants might be a neat solution. Unfortunately, this proposal went so far as to understand with the help of real technology, since the literature is simply too dense for my non-academic head.

  • Can someone give me a very high overview of how this process is designed to populate the R-tree with lat / long data, and then cross the tree to find these 5 nearest neighbors of a given point?

  • In addition, the project is in C, and I don’t need to reinvent the wheel on this, so if you used the existing C implementation with the open source CR Tree, I would be interested to know about your experience.

UPDATE: This blog post describes a simple search algorithm for a geographically separated space (for example, the PR quadrant). Hope this helps the future reader.

+6
c gis r-tree
source share
2 answers

Have you considered alternative data structures? I believe that instead of the R-tree, Point Quadtree will be more efficient for your needs. Demographic demos provide some demos for a list of possible structure data, including the R-tree and Point Quadtree. Hope this gives understanding.

+7
source share

Square trees

A square tree occupies a square of space and divides it into four children and a half sizes along the X and Y axis.

+---+---+ | | | Each square is a child | | | of the parent; when you +---+---+ get to leaves a node has | | | a single point or a list | | | of points. +---+---+ 

This data structure is recursive and you look for points by checking which child holds the point until you hit the sheet. The sheet either has one member (a point with coordinates X, Y), or a list of members, depending on the implementation. If you fill in the node, you divide it by 4 and give it to the children. In fact, the data structure is a generalization of the binary tree, so it is not necessarily balanced.

Balancing a quad tree may not be necessary for your purposes and is left as an exercise for the reader - try to find a "balanced quad tree" on the Internet

Note that this data structure cannot index elements that may overlap, but if you only save points, this will not be a problem.

Search for nearest neighbors in a square tree

Above my head, here's a quick and dirty algorithm for finding the closest neighbors "n" to your point. This is not necessarily optimally effective, but it will be quite simple to implement. If anyone has a link to the best, feel free to post it in the comments or responses.

  • Find the square node tree containing your item, keeping a list of your parents.

  • Click all the points in the node in the priority queue based on their distance from your base point (i.e., according to the length of the hypotenuse according to the Pythagorean theorem). depending on the implementation, there may be one or more per node. For a simple implementation of the priority queue, the data structure is a binary heap search.

  • If any of the "n" points is farther, then add the contents of its neighbors to the edges of the bounding box. those. if your base point is close to the edge of the bounding box, it is possible that neighboring nodes in the tree may contain points that are closer than the points found in your bounding box. To do this, you need to back up the tree, so you need to keep track of your parent nodes.

  • When all the "nearest" points are closer to the edges of your bounding box, you know that there could not be neighbors that you missed. Therefore, the "n" nearest points in this field should be your "n" nearest neighbors.

+5
source share

All Articles