Increased SELECT query performance with large 3D dataset

I have a large dataset (about 1.9 million rows) of three-dimensional points that I select. The statement you use most often is like:

SELECT * FROM points WHERE x > 100 AND x < 200 AND y > 100 AND y < 200 AND z > 100 AND z < 200 AND otherParameter > 10 

I have pointers to x, y and z, as well as to another parameter. I also tried adding an index with several parts to x, y, z, but that did not help.

Any tips on how to make this SELECT query faster?

+6
sql mysql
source share
2 answers

B-Tree indexes are not very helpful for such a query.

What you need is an R-Tree index and a minimal bounding box query on it.

Unfortunately, MySQL does not support R-Tree indexes over 3d points, only 2d . However, you can create an index of, say, X and Y together, which will be more selective if only one of the B-Tree indices is for X and Y :

 ALTER TABLE points ADD xy POINT; UPDATE points SET xy = Point(x, y); ALTER TABLE points MODIFY xy POINT NOT NULL; CREATE SPATIAL INDEX sx_points_xy ON points (xy); SELECT * FROM points WHERE MBRContains(LineString(Point(100, 100), Point(200, 200), xy) AND z BETWEEN 100 and 200 AND otherParameter > 10; 

This is only possible if your table is MyISAM .

+6
source share

I don't have a mySQL test, but I wonder how effective its INTERSECT is:

  select points.* from points join ( select id from points where x > 100 AND x < 200 intersect select id from points where y > 100 AND y < 200 intersect select id from points where z > 100 AND z < 200 ) as keyset on points.id = keyset.id 

It is not necessary to recommend it, but it is something to try, especially if you have separate indices on x, y and z.

EDIT: Since mySQl does not support INTERSECT, the above query can be rewritten using JOINS inline views. Each view will contain a set of keys, and each view will have the advantage of separate indexes that you placed on x, y, and z. Performance will depend on the number of keys returned and the intersect / join algorithm.

First, I tested the intersection approach (in SQLite) to find out if there are ways to improve performance in spatial queries without using their R-Tree module. INTERSECT was actually slower than using one non-composite index on one of the spatial values, and then scanning a subset of the base table to get other spatial values. But the results may vary depending on the size of the database. After the table has reached gigantic size, and the I / O disk becomes more important as a performance factor, it may be more efficient to cross discrete sets of keys, each of which was created from the index, than to check the underlying subequent table against the original sample from the index .

-one
source share