I have a table containing about 500 points, and I'm looking for duplicates within tolerance. It takes less than a second and gives me 500 lines. Most of them have zero distance because it gives the same point (PointA = PointB)
DECLARE @TOL AS REAL SET @TOL = 0.05 SELECT PointA.ObjectId as ObjectIDa, PointA.Name as PTNameA, PointA.[Description] as PTdescA, PointB.ObjectId as ObjectIDb, PointB.Name as PTNameB, PointB.[Description] as PTdescB, ROUND(PointA.Geometry.STDistance(PointB.Geometry),3) DIST FROM CadData.Survey.SurveyPoint PointA JOIN [CadData].Survey.SurveyPoint PointB ON PointA.Geometry.STDistance(PointB.Geometry) < @TOL
If I use the commented lines below, I get 14 lines, but the execution time is increased to 14 seconds. Not such a big deal until my point table expands to 10 thousand.
I apologize in advance if the answer is already there. I really looked, but, being new, I lose reading messages that pass over my head.
ADDENDUM: ObjectID is bigint and PK for the table, so I realized that I can change the statement to
AND PointA.ObjectID > PointB.ObjectID
Now it takes half the time and gives me half the results (7 rows in 7 seconds). Now I do not get duplicates (as at point 4, close to point 8, then point 8 is close to point 4). However, performance still concerns me, as the table will be very large, so any performance problems will become a problem.
ADDENDUM 2: Changing the order of JOIN and AND (or WHERE, as suggested), as shown below, makes no difference.
DECLARE @TOL AS REAL SET @TOL = 0.05 SELECT PointA.ObjectId as ObjectIDa, PointA.Name as PTNameA, PointA.[Description] as PTdescA, PointB.ObjectId as ObjectIDb, PointB.Name as PTNameB, PointB.[Description] as PTdescB, ROUND(PointA.Geometry.STDistance(PointB.Geometry),3) DIST FROM CadData.Survey.SurveyPoint PointA JOIN [CadData].Survey.SurveyPoint PointB ON PointA.ObjectId < PointB.ObjectID WHERE PointA.Geometry.STDistance(PointB.Geometry) < @TOL ORDER BY ObjectIDa
It seems fascinating to me that I can change the @Tol value to something more, which returns more than 100 lines without changing performance, although it takes a lot of computation. But then adding simple A