I have two entities in my database that are associated with many relationships. I was wondering what would be the best way to list which entities have the most similarities on it?
I tried to count (*) with the intersection, but the query takes too much time for each record in my database (there are about 20 thousand records). When executing the query that I wrote, the CPU utilization reaches 100%, and the database has problems with locking.
Here is the code that shows what I tried:
My tables look something like this:
create table Movie( Id INT PRIMARY KEY, Title varchar(255) ); create table Tags( Id INT PRIMARY KEY, Desc varchar(255) ); create table TagMovies( Movie_Id INT, Tag_Id INT, PRIMARY KEY (Movie_Id, Tag_Id), FOREIGN KEY (Movie_Id) REFERENCES Movie(Id), FOREIGN KEY (Tag_Id) REFERENCES Tags(Id), );
(This works, but it's terribly slow) This is a query I wrote to try to list them: Usually I also filter the top of 1 and add a where clause to get a specific set of related data.
SELECT bk.Id, rh.Id FROM Movies bk CROSS APPLY ( SELECT TOP 15 b.Id, ( SELECT COUNT(*) FROM ( SELECT x.Tag_Id FROM TagMovies x WHERE x.Movie_Id = bk.Id INTERSECT SELECT x.Tag_Id FROM TagMovies x WHERE x.Movie_Id = b.Id ) Q1 ) as Amount FROM Movies b WHERE b.Id <> bk.Id ORDER BY Amount DESC ) rh
Explanation: Movies have tags, and the user can try to find movies that are similar to the ones they selected based on other movies that have similar tags.
source share