SQL: find rows that match exactly, but not exactly

I have a table inside a PostgreSQL database with columns c1, c2 ... cn. I want to run a query that compares each row with a tuple of values ​​v1, v2 ... vn. The query should not return an exact match, but should return a list of rows in descending order with respect to the vector of v values.

Example:

The table contains sports records:

1,USA,basketball,1956 2,Sweden,basketball,1998 3,Sweden,skating,1998 4,Switzerland,golf,2001 

Now, when I run a query on this table using v = (Sweden, basketball, 1998), I want to get all records that have similarities with this vector, sorted by the number of matching columns in descending order:

 2,Sweden,basketball,1998 --> 3 columns match 3,Sweden,skating,1998 --> 2 columns match 1,USA,basketball,1956 --> 1 column matches 

Line 4 is not returned because it does not match at all.

Edit: all columns are equally important. Although, when I really think about it ... it would be nice to add if I could give each column a different weight coefficient.

Is there any possible SQL query that returns rows in a reasonable amount of time, even when I run it against a million rows?

What does this query look like?

+5
source share
3 answers
 SELECT * FROM countries WHERE country = 'sweden' OR sport = 'basketball' OR year = 1998 ORDER BY cast(country = 'sweden' AS integer) + cast(sport = 'basketball' as integer) + cast(year = 1998 as integer) DESC 

It is not beautiful, but good. You can use boolean expressions as integers and sum them up.

You can easily change the weight by adding a multiplier.

 cast(sport = 'basketball' as integer) * 5 + 
+2
source

Here's how I do it ... the multiplication factors used in the stmts case will handle the importance (weight) of the match, and they will ensure that those records that have matches for the highest weight columns are up, even if the other columns do not match these specific entries.

 /* -- Initial Setup -- drop table sport create table sport (id int, Country varchar(20) , sport varchar(20) , yr int ) insert into sport values (1,'USA','basketball','1956'), (2,'Sweden','basketball','1998'), (3,'Sweden','skating','1998'), (4,'Switzerland','golf','2001') select * from sport */ select * , CASE WHEN Country='sweden' then 1 else 0 end * 100 + CASE WHEN sport='basketball' then 1 else 0 end * 10 + CASE WHEN yr=1998 then 1 else 0 end * 1 as Match from sport WHERE country = 'sweden' OR sport = 'basketball' OR yr = 1998 ORDER BY Match Desc 
+1
source

This can help if you wrote a stored procedure that computes a "similarity metric" between two lines. Then your request may refer to the return value of this procedure directly, and not to the presence of undefined conditions in where-expression and order-by-expression.

0
source

All Articles