Using the same column multiple times in a WHERE clause

Question

Using the same column multiple times in a WHERE clause

I have the following table structure.

USERS

PROPERTY_VALUE

PROPERTY_NAME

USER_PROPERTY_MAP

I am trying to get user / s from table USERS that have corresponding properties in table PROPERTY_VALUE .

One user can have several properties. The sample data here has 2 properties for user '1', but may be greater than 2. I want to use all these custom properties in the WHERE .

This request works if the user has one property, but fails for more than 1 property:

 SELECT * FROM users u INNER JOIN user_property_map upm ON u.id = upm.user_id INNER JOIN property_value pv ON upm.property_value_id = pv.id INNER JOIN property_name pn ON pv.property_name_id = pn.id WHERE (pn.id = 1 AND pv.id IN (SELECT id FROM property_value WHERE value like '101') AND pn.id = 2 AND pv.id IN (SELECT id FROM property_value WHERE value like '102')) and u.user_name = 'user1' and u.city = 'city1'

I understand, since the request has pn.id = 1 AND pn.id = 2 , it always fails, because pn.id can be either 1 or 2, but not simultaneously. So, how can I rewrite it so that it works for n number of properties?

In the above data examples, there is only one user with id = 1 , which has both matching properties used in the WHERE . The query should return a single record with all columns of the USERS table.

To clarify my requirements

I am working on an application that has a user list page in the user interface, which lists all the users in the system. This list contains information such as user ID, username, city, etc. - all columns of the USERS table. Users may have the properties specified in the database above.

The user list page also provides functions for finding users based on these properties. When searching for users with the two properties "property1" and "property2", the page should display and display only the corresponding rows. Based on the test data above, only user "1" is eligible to account.

A user with 4 properties is available, including the properties 'property1' and 'property2'. But a user with the only property property1 will be excluded due to the missing property2 property.

+11

sql postgresql relational-division

ivish Nov 17 '17 at 13:28

source share

8 answers

Erwin Brandstetter · Answer 1 · 2017-11-27 13:23

This is a case of relational-division . I have added a tag.

Indices

Assuming a PK or UNIQUE restriction for USER_PROPERTY_MAP(property_value_id, user_id) - in that order so that my USER_PROPERTY_MAP(property_value_id, user_id) requests USER_PROPERTY_MAP(property_value_id, user_id) fast. Connected with:

Is a composite index also good for queries on the first field?

You must also have a PROPERTY_VALUE(value, property_name_id, id) index PROPERTY_VALUE(value, property_name_id, id) . Again, the columns are in that order. Add the last column id only if you only get index scans .

For a given number of properties

There are many ways to solve it. This should be one of the easiest and fastest for exactly two properties:

 SELECT u.* FROM users u JOIN user_property_map up1 ON up1.user_id = u.id JOIN user_property_map up2 USING (user_id) WHERE up1.property_value_id = (SELECT id FROM property_value WHERE property_name_id = 1 AND value = '101') AND up2.property_value_id = (SELECT id FROM property_value WHERE property_name_id = 2 AND value = '102') -- AND u.user_name = 'user1' -- more filters? -- AND u.city = 'city1'

Do not visit the PROPERTY_NAME table, as you seem to have already resolved property names into identifiers, according to your example query. Otherwise, you can add a join to PROPERTY_NAME in each subquery.

We have put together an arsenal of tricks on this related issue:

How to filter SQL results end-to-end

For an unknown number of properties

@Mike and @Valera have very useful queries in their respective answers. To make it even more dynamic:

 WITH input(property_name_id, value) AS ( VALUES -- provide n rows with input parameters here (1, '101') , (2, '102') -- more? ) SELECT * FROM users u JOIN ( SELECT up.user_id AS id FROM input JOIN property_value pv USING (property_name_id, value) JOIN user_property_map up ON up.property_value_id = pv.id GROUP BY 1 HAVING count(*) = (SELECT count(*) FROM input) ) sub USING (id);

Add / remove only lines from the VALUES expression. Or remove the WITH clause and the JOIN clause so that you don’t have to use property filters at all.

The problem with this query class (considering all partial matches) is performance . My first request is less dynamic, but usually much faster. (Just test with EXPLAIN ANALYZE .) Especially for large tables and a growing number of properties.

The best of both worlds?

This recursive CTE solution should be a good compromise: fast and dynamic:

 WITH RECURSIVE input AS ( SELECT count(*) OVER () AS ct , row_number() OVER () AS rn , * FROM ( VALUES -- provide n rows with input parameters here (1, '101') , (2, '102') -- more? ) i (property_name_id, value) ) , rcte AS ( SELECT i.ct, i.rn, up.user_id AS id FROM input i JOIN property_value pv USING (property_name_id, value) JOIN user_property_map up ON up.property_value_id = pv.id WHERE i.rn = 1 UNION ALL SELECT i.ct, i.rn, up.user_id FROM rcte r JOIN input i ON i.rn = r.rn + 1 JOIN property_value pv USING (property_name_id, value) JOIN user_property_map up ON up.property_value_id = pv.id AND up.user_id = r.id ) SELECT u.* FROM rcte r JOIN users u USING (id) WHERE r.ct = r.rn; -- has all matches

here

Recursive CTE Guide.

The added complexity does not pay off for small tables where the extra overhead outweighs any benefit or the difference is not significant to start with. But it scales much better and more and more surpasses methods of "calculation" with growing tables and a growing number of property filters.

Counting methods should visit all rows in user_property_map for all specified property filters, while this request (as well as the 1st request) can eliminate unnecessary users at an early stage.

Performance optimization

With current table statistics (reasonable settings, running autovacuum ), Postgres knows about the “most common values” in each column and will reorder joins in the 1st query to first evaluate the most selective property filters (or at least the least selective ), To a certain limit: join_collapse_limit . Connected with:

Postgresql join_collapse_limit and time for scheduling queries
Why does a small change in the search query slow down the query so much?

This "deus-ex-machina" intervention is not possible with the 3rd request (recursive CTE). To improve performance (possibly a lot), you should first install more selective filters. But even with the worst order, it will still exceed the number of requests.

Connected with:

Checking Goal Statistics in PostgreSQL

Much more bloody details:

The partial PostgreSQL index is not used when creating a table with existing data.

More explanations in the manual:

Statistics used by the scheduler

Mike · Answer 2 · 2017-11-17 14:32

 SELECT * FROM users u WHERE u.id IN( select m.user_id from property_value v join USER_PROPERTY_MAP m on v.id=m.property_value_id where (v.property_name_id, v.value) in( (1, '101'), (2, '102') ) group by m.user_id having count(*)=2 )

OR

 SELECT u.id FROM users u INNER JOIN user_property_map upm ON u.id = upm.user_id INNER JOIN property_value pv ON upm.property_value_id = pv.id WHERE (pv.property_name_id=1 and pv.value='101') OR (pv.property_name_id=2 and pv.value='102') GROUP BY u.id HAVING count(*)=2

There is no property_name table needed to query if name_name_id is true.

Valera · Answer 3 · 2017-11-21 15:49

If you want to just filter:

 SELECT users.* FROM users where ( select count(*) from user_property_map left join property_value on user_property_map.property_value_id = property_value.id left join property_name on property_value.property_name_id = property_name.id where user_property_map.user_id = users.id -- join with users table and (property_name.name, property_value.value) in ( values ('property1', '101'), ('property2', '102') -- filter properties by name and value ) ) = 2 -- number of properties you filter by

Or, if you want users to order in decreasing order of matches, you can do:

 select * from ( SELECT users.*, ( select count(*) as property_matches from user_property_map left join property_value on user_property_map.property_value_id = property_value.id left join property_name on property_value.property_name_id = property_name.id where user_property_map.user_id = users.id -- join with users table and (property_name.name, property_value.value) in ( values ('property1', '101'), ('property2', '102') -- filter properties by name and value ) ) FROM users ) t order by property_matches desc

Bhanuchander Udhayakumar · Answer 4 · 2017-11-17 13:40

you use the AND operator between two pn.id=1 and pn.id=2 . then how do you get the answer between this:

 (SELECT id FROM property_value WHERE value like '101') and (SELECT id FROM property_value WHERE value like '102')

As above, use the or operator.

Update 1:

 SELECT * FROM users u INNER JOIN user_property_map upm ON u.id = upm.user_id INNER JOIN property_value pv ON upm.property_value_id = pv.id INNER JOIN property_name pn ON pv.property_name_id = pn.id WHERE pn.id in (1,2) AND pv.id IN (SELECT id FROM property_value WHERE value like '101' or value like '102');

MiloBellano · Answer 5 · 2017-11-17 13:53

 SELECT * FROM users u INNER JOIN user_property_map upm ON u.id = upm.user_id INNER JOIN property_value pv ON upm.property_value_id = pv.id INNER JOIN property_name pn ON pv.property_name_id = pn.id WHERE (pn.id = 1 AND pv.id IN (SELECT id FROM property_value WHERE value like '101') ) OR ( pn.id = 2 AND pv.id IN (SELECT id FROM property_value WHERE value like '102')) OR (...) OR (...)

You cannot do AND, because there is no case where id is 1 and 2 for SAME ROW, you specify a where clause for each line!

If you run a simple test, for example

 SELECT * FROM users where id=1 and id=2

You will get 0 results. To achieve this, use

  id in (1,2)

or

  id=1 or id=2

This query can be optimized, but this is a good start, I hope.

Zorkolot · Answer 6 · 2017-11-17 14:39

If you just need different columns in U, this is:

 SELECT DISTINCT u.* FROM Users u INNER JOIN USER_PROPERTY_MAP upm ON u.id = upm.[user_id] INNER JOIN PROPERTY_VALUE pv ON upm.property_value_id = pv.id INNER JOIN PROPERTY_NAME pn ON pv.property_name_id = pn.id WHERE (pn.id = 1 AND pv.[value] = '101') OR (pn.id = 2 AND pv.[value] = '102')

Note that I used pv.[value] = instead of a subquery to re-enter the identifier ... this is a simplification.

Neil B · Answer 7 · 2017-11-27 14:56

If I understood your question correctly, I would do it like this.

 SELECT u.id, u.user_name, u.city FROM users u WHERE (SELECT count(*) FROM property_value v, user_property_map m WHERE m.user_id = u.id AND m.property_value_id = v.id AND v.value IN ('101', '102')) = 2

This should return a list of users who have all the properties listed in the IN section. 2 represents the number of properties sought.

Goose · Answer 8 · 2017-11-28 03:56

Assuming you want to select all fields in the USERS table

 SELECT u.* FROM USERS u INNER JOIN ( SELECT USERS.id as user_id, COUNT(*) as matching_property_count FROM USERS INNER JOIN ( SELECT m.user_id, n.name as property_name, v.value FROM PROPERTY_NAME n INNER JOIN PROPERTY_VALUE v ON n.id = v.property_name_id INNER JOIN USER_PROPERTY_MAP m ON m.property_value_id = v.property_value_id WHERE (n.id = @property_id_1 AND v.value = @property_value_1) -- Property Condition 1 OR (n.id = @property_id_2 AND v.value = @property_value_2) -- Property Condition 2 OR (n.id = @property_id_3 AND v.value = @property_value_3) -- Property Condition 3 OR (n.id = @property_id_N AND v.value = @property_value_N) -- Property Condition N ) USER_PROPERTIES ON USER_PROPERTIES.user_id = USERS.id GROUP BY USERS.id HAVING COUNT(*) = N --N = the number of Property Condition in the WHERE clause -- Note : -- Use HAVING COUNT(*) = N if property matches will be "MUST MATCH ALL" -- Use HAVING COUNT(*) > 0 if property matches will be "MUST MATCH AT LEAST ONE" ) USER_MATCHING_PROPERTY_COUNT ON u.id = USER_MATCHING_PROPERTY_COUNT.user_id

Using the same column multiple times in a WHERE clause

To clarify my requirements

Indices

For a given number of properties

For an unknown number of properties

The best of both worlds?

Performance optimization

More articles: