MySQL how to evaluate objects by the similarity of several lines of properties

Hello everyone and happy new year

SITUATION:

I have several tables in MySQL db:

results: (Unique identifier, unique (objectID, metricID))

| ID | ObjectID | MetricID | Score | |--------+----------+----------+----------| |0 | 1 | 7 | 0 | |1 | 5 | 3 | 13 | |2 | 7 | 2 | 78 | |3 | 7 | 3 | 22 | |..... |--------+----------+----------+----------| 

Objects: (unique identifier, unique ObjectName)

 | ID | ObjectName | |--------+------------| |0 | Ook | |1 | Oop | |2 | Oww | |3 | Oat | |..... |--------+------------| 

Metrics: (unique identifier, unique name MetricName)

 | ID | MetricName | |--------+------------| |0 | Moo | |1 | Mar | |2 | Mee | |3 | Meep | |..... |--------+------------| 

For this object identifier:

  • There will be a number of points between '0' and 'one per metric'

DEMAND:

For this ObjectID, I want to return a sorted list based on the following criteria:

  • Returned rows ranked in similarity to the provided object
  • The returned rows do not include the provided object
  • (this is a hard bit, I think). The similarity order is determined by the "distance of the account" object from the provided object based on the numerical offset / difference of its rating from the provided object estimate for any indicator for which there is a record for both provided and considered objects
  • Contains objectID, object name, point difference (or something similar)

STATEMENT OF THE PROBLEM:

I do not know the correct SQL syntax for this, and my experiments have so far failed. I would like to do most of this work in the database as much as possible, and did little or no work at all in nasty loops in code or the like.

ADDITIONAL NON-FUNCTIONAL

  • There are currently 200 rows in the Grades table. My calculations show that in the end there can be up to 2,000,000 lines, but probably no more.
  • The Objects table will only have up to 5000 rows
  • Metrics table will only have up to 400 rows
+4
source share
1 answer

Here's an approach to order objects based on their similarities to object 1:

 select other.ObjectID , avg(abs(target.Score - other.Score)) as Delta from Scores target join Scores other on other.MetricID = target.MetricID and other.ObjectID <> target.ObjectID where target.ObjectID = 1 group by other.ObjectID order by Delta 

Similarity is defined as the average difference in common metrics. Objects that do not share at least one metric with object 1 are not indicated. If this answer makes the wrong assumptions, feel free to clarify your question :)

Live example in SQL Fiddle.

+1
source

All Articles