Coordination of similar objects on the basis of many, many

I have two entities in my database that are associated with many relationships. I was wondering what would be the best way to list which entities have the most similarities on it?

I tried to count (*) with the intersection, but the query takes too much time for each record in my database (there are about 20 thousand records). When executing the query that I wrote, the CPU utilization reaches 100%, and the database has problems with locking.

Here is the code that shows what I tried:

My tables look something like this:

/* 20k records */ create table Movie( Id INT PRIMARY KEY, Title varchar(255) ); /* 200-300 records */ create table Tags( Id INT PRIMARY KEY, Desc varchar(255) ); /* 200,000-300,000 records */ create table TagMovies( Movie_Id INT, Tag_Id INT, PRIMARY KEY (Movie_Id, Tag_Id), FOREIGN KEY (Movie_Id) REFERENCES Movie(Id), FOREIGN KEY (Tag_Id) REFERENCES Tags(Id), ); 

(This works, but it's terribly slow) This is a query I wrote to try to list them: Usually I also filter the top of 1 and add a where clause to get a specific set of related data.

 SELECT bk.Id, rh.Id FROM Movies bk CROSS APPLY ( SELECT TOP 15 b.Id, /* Tags Score */ ( SELECT COUNT(*) FROM ( SELECT x.Tag_Id FROM TagMovies x WHERE x.Movie_Id = bk.Id INTERSECT SELECT x.Tag_Id FROM TagMovies x WHERE x.Movie_Id = b.Id ) Q1 ) as Amount FROM Movies b WHERE b.Id <> bk.Id ORDER BY Amount DESC ) rh 

Explanation: Movies have tags, and the user can try to find movies that are similar to the ones they selected based on other movies that have similar tags.

+6
source share
2 answers

Hmm ... just an idea, but maybe I didn’t understand ... This query should return the most suitable movies by tags for this video:

 SELECT m.id, m.title, GROUP_CONCAT(DISTINCT t.Descr SEPARATOR ', ') as tags, count(*) as matches FROM stack.Movie m LEFT JOIN stack.TagMovies tm ON m.Id = tm.Movie_Id LEFT JOIN stack.Tags t ON tm.Tag_Id = t.Id WHERE m.id != 1 AND tm.Tag_Id IN (SELECT Tag_Id FROM stack.TagMovies tm WHERE tm.Movie_Id = 1) GROUP BY m.id ORDER BY matches DESC LIMIT 15; 

EDIT: I just realized that this is for M $ SQL ... but maybe something like this can be done ...

+4
source

You should probably decide on a naming convention and stick to it. Are tables singular or plural nouns? I do not want to enter into this discussion, but I choose one or the other.

Without access to your database, I do not know how this will be done. It's just out of my head. You can also limit this to M.id to find the best match for a single movie, which I think will improve performance quite a bit.

Also, TOP x should give you an approximate match of x.

 SELECT M.id, M.title, SM.id AS similar_movie_id, SM.title AS similar_movie_title, COUNT(*) AS matched_tags FROM Movie M INNER JOIN TagsMovie TM1 ON TM1.movie_id = M.movie_id INNER JOIN TagsMovie TM2 ON TM2.tag_id = TM1.tag_id AND TM2.movie_id <> TM1.movie_id INNER JOIN Movie SM ON SM.movie_id = TM2.movie_id GROUP BY M.id, M.title, SM.id AS similar_movie_id, SM.title AS similar_movie_title ORDER BY COUNT(*) DESC 
+1
source

All Articles