Adding Simple And After JOIN Kills Performance

I have a table containing about 500 points, and I'm looking for duplicates within tolerance. It takes less than a second and gives me 500 lines. Most of them have zero distance because it gives the same point (PointA = PointB)

DECLARE @TOL AS REAL SET @TOL = 0.05 SELECT PointA.ObjectId as ObjectIDa, PointA.Name as PTNameA, PointA.[Description] as PTdescA, PointB.ObjectId as ObjectIDb, PointB.Name as PTNameB, PointB.[Description] as PTdescB, ROUND(PointA.Geometry.STDistance(PointB.Geometry),3) DIST FROM CadData.Survey.SurveyPoint PointA JOIN [CadData].Survey.SurveyPoint PointB ON PointA.Geometry.STDistance(PointB.Geometry) < @TOL -- AND -- PointA.ObjectId <> PointB.ObjectID ORDER BY ObjectIDa 

If I use the commented lines below, I get 14 lines, but the execution time is increased to 14 seconds. Not such a big deal until my point table expands to 10 thousand.

I apologize in advance if the answer is already there. I really looked, but, being new, I lose reading messages that pass over my head.

ADDENDUM: ObjectID is bigint and PK for the table, so I realized that I can change the statement to

 AND PointA.ObjectID > PointB.ObjectID 

Now it takes half the time and gives me half the results (7 rows in 7 seconds). Now I do not get duplicates (as at point 4, close to point 8, then point 8 is close to point 4). However, performance still concerns me, as the table will be very large, so any performance problems will become a problem.

ADDENDUM 2: Changing the order of JOIN and AND (or WHERE, as suggested), as shown below, makes no difference.

 DECLARE @TOL AS REAL SET @TOL = 0.05 SELECT PointA.ObjectId as ObjectIDa, PointA.Name as PTNameA, PointA.[Description] as PTdescA, PointB.ObjectId as ObjectIDb, PointB.Name as PTNameB, PointB.[Description] as PTdescB, ROUND(PointA.Geometry.STDistance(PointB.Geometry),3) DIST FROM CadData.Survey.SurveyPoint PointA JOIN [CadData].Survey.SurveyPoint PointB ON PointA.ObjectId < PointB.ObjectID WHERE PointA.Geometry.STDistance(PointB.Geometry) < @TOL ORDER BY ObjectIDa 

It seems fascinating to me that I can change the @Tol value to something more, which returns more than 100 lines without changing performance, although it takes a lot of computation. But then adding simple A

+6
source share
4 answers

This is a fun question.

It is not true that you get a significant performance improvement by changing the value from "<>" to ">".

As others have noted, the trick is to make the most of your indexes. Of course, using the ">", you should easily get the server to limit this specific range on your PC - avoiding the backward search when you have already checked the forward.

This improvement will scale - it will help when adding rows. But you are right to worry that this will not help prevent an increase in work. What do you think correctly, if you need to scan more lines, it will take longer. And this case is here because we always want to compare everything.

If the first part looks good, just a TOL check, have you thought about completely separating the second part?

Change the first part to dump to temp table as

 SELECT PointA.ObjectId as ObjectIDa, PointA.Name as PTNameA, PointA.[Description] as PTdescA, PointB.ObjectId as ObjectIDb, PointB.Name as PTNameB, PointB.[Description] as PTdescB, ROUND(PointA.Geometry.STDistance(PointB.Geometry),3) DIST into #AllDuplicatesWithRepeats FROM CadData.Survey.SurveyPoint PointA JOIN [CadData].Survey.SurveyPoint PointB ON PointA.Geometry.STDistance(PointB.Geometry) < @TOL ORDER BY ObjectIDa 

And they can write a direct request that skips the duplicates below. This is not a feature, but against this small set in the temp table, it should be very fast.

 Select * from #AllDuplicatesWithRepeats d1 left join #AllDuplicatesWithRepeats d2 on ( d1.objectIDa = d2.objectIDb and d1.objectIDb = d2.objectIDa ) where d2.objectIDb is null 
+2
source

The execution plan probably does something behind the scenes when you add ObjectID to the comparison. Check the execution plan to see if there are two different versions of the query, for example, using index search versus table scan. If so, consider experimenting with tips .

As a workaround, you can always use a subquery:

 DECLARE @TOL AS REAL SET @TOL = 0.05 SELECT ObjectIDa, PTNameA, PTdescA, ObjectIDb, PTNameB, PTdescB, DIST FROM ( SELECT PointA.ObjectId as ObjectIDa, PointA.Name as PTNameA, PointA.[Description] as PTdescA, PointB.ObjectId as ObjectIDb, PointB.Name as PTNameB, PointB.[Description] as PTdescB, ROUND(PointA.Geometry.STDistance(PointB.Geometry),3) DIST FROM CadData.Survey.SurveyPoint PointA JOIN [CadData].Survey.SurveyPoint PointB ON PointA.Geometry.STDistance(PointB.Geometry) < @TOL -- AND -- PointA.ObjectId <> PointB.ObjectID ) Subquery WHERE ObjectIDa <> ObjectIDb ORDER BY ObjectIDa 
+2
source

Try using PointA.ObjectId <> PointB.ObjectID with a WHERE between the JOIN ORDER BY and the ORDER BY .

Same:

 DECLARE @TOL AS REAL SET @TOL = 0.05 SELECT PointA.ObjectId as ObjectIDa, PointA.Name as PTNameA, PointA.[Description] as PTdescA, PointB.ObjectId as ObjectIDb, PointB.Name as PTNameB, PointB.[Description] as PTdescB, ROUND(PointA.Geometry.STDistance(PointB.Geometry),3) DIST FROM CadData.Survey.SurveyPoint PointA JOIN [CadData].Survey.SurveyPoint PointB ON PointA.Geometry.STDistance(PointB.Geometry) < @TOL WHERE PointA.ObjectId <> PointB.ObjectID ORDER BY ObjectIDa 
+1
source

From kudos to @Mike_M, here is the edited Select, which starts after 2 seconds.

 SELECT PointA.ObjectId as ObjectIDa, PointA.Name as PTNameA, PointA.[Description] as PTdescA, PointB.ObjectId as ObjectIDb, PointB.Name as PTNameB, PointB.[Description] as PTdescB, ROUND(PointA.Geometry.STDistance(PointB.Geometry),3) DIST into #AllDuplicatesWithRepeats FROM CadData.Survey.SurveyPoint PointA JOIN [CadData].Survey.SurveyPoint PointB ON PointA.Geometry.STDistance(PointB.Geometry) < @TOL ORDER BY ObjectIDa Select * from #AllDuplicatesWithRepeats d1 Where d1.ObjectIDa < d1.ObjectIDb 
+1
source

All Articles