SQL: find rows that match exactly, but not exactly

Question

SQL: find rows that match exactly, but not exactly

I have a table inside a PostgreSQL database with columns c1, c2 ... cn. I want to run a query that compares each row with a tuple of values v1, v2 ... vn. The query should not return an exact match, but should return a list of rows in descending order with respect to the vector of v values.

Example:

The table contains sports records:

1,USA,basketball,1956 2,Sweden,basketball,1998 3,Sweden,skating,1998 4,Switzerland,golf,2001

Now, when I run a query on this table using v = (Sweden, basketball, 1998), I want to get all records that have similarities with this vector, sorted by the number of matching columns in descending order:

 2,Sweden,basketball,1998 --> 3 columns match 3,Sweden,skating,1998 --> 2 columns match 1,USA,basketball,1956 --> 1 column matches

Line 4 is not returned because it does not match at all.

Edit: all columns are equally important. Although, when I really think about it ... it would be nice to add if I could give each column a different weight coefficient.

Is there any possible SQL query that returns rows in a reasonable amount of time, even when I run it against a million rows?

What does this query look like?

+5

sql postgresql

Matthias bohlen Jul 30 '16 at 21:42

source share

3 answers

Philipp · Answer 1 · 2016-07-30T21:57:35+0000

 SELECT * FROM countries WHERE country = 'sweden' OR sport = 'basketball' OR year = 1998 ORDER BY cast(country = 'sweden' AS integer) + cast(sport = 'basketball' as integer) + cast(year = 1998 as integer) DESC

It is not beautiful, but good. You can use boolean expressions as integers and sum them up.

You can easily change the weight by adding a multiplier.

 cast(sport = 'basketball' as integer) * 5 +

objectNotFound · Answer 2 · 2016-07-31T02:38:57+0000

Here's how I do it ... the multiplication factors used in the stmts case will handle the importance (weight) of the match, and they will ensure that those records that have matches for the highest weight columns are up, even if the other columns do not match these specific entries.

 /* -- Initial Setup -- drop table sport create table sport (id int, Country varchar(20) , sport varchar(20) , yr int ) insert into sport values (1,'USA','basketball','1956'), (2,'Sweden','basketball','1998'), (3,'Sweden','skating','1998'), (4,'Switzerland','golf','2001') select * from sport */ select * , CASE WHEN Country='sweden' then 1 else 0 end * 100 + CASE WHEN sport='basketball' then 1 else 0 end * 10 + CASE WHEN yr=1998 then 1 else 0 end * 1 as Match from sport WHERE country = 'sweden' OR sport = 'basketball' OR yr = 1998 ORDER BY Match Desc

Bruce david wilner · Answer 3 · 2016-07-30T23:48:01+0000

This can help if you wrote a stored procedure that computes a "similarity metric" between two lines. Then your request may refer to the return value of this procedure directly, and not to the presence of undefined conditions in where-expression and order-by-expression.

SQL: find rows that match exactly, but not exactly

More articles: