This is a case of relational-division . I have added a tag.
Indices
Assuming a PK or UNIQUE restriction for USER_PROPERTY_MAP(property_value_id, user_id) - in that order so that my USER_PROPERTY_MAP(property_value_id, user_id) requests USER_PROPERTY_MAP(property_value_id, user_id) fast. Connected with:
You must also have a PROPERTY_VALUE(value, property_name_id, id) index PROPERTY_VALUE(value, property_name_id, id) . Again, the columns are in that order. Add the last column id only if you only get index scans .
For a given number of properties
There are many ways to solve it. This should be one of the easiest and fastest for exactly two properties:
SELECT u.* FROM users u JOIN user_property_map up1 ON up1.user_id = u.id JOIN user_property_map up2 USING (user_id) WHERE up1.property_value_id = (SELECT id FROM property_value WHERE property_name_id = 1 AND value = '101') AND up2.property_value_id = (SELECT id FROM property_value WHERE property_name_id = 2 AND value = '102')
Do not visit the PROPERTY_NAME table, as you seem to have already resolved property names into identifiers, according to your example query. Otherwise, you can add a join to PROPERTY_NAME in each subquery.
We have put together an arsenal of tricks on this related issue:
- How to filter SQL results end-to-end
For an unknown number of properties
@Mike and @Valera have very useful queries in their respective answers. To make it even more dynamic:
WITH input(property_name_id, value) AS ( VALUES -- provide n rows with input parameters here (1, '101') , (2, '102') -- more? ) SELECT * FROM users u JOIN ( SELECT up.user_id AS id FROM input JOIN property_value pv USING (property_name_id, value) JOIN user_property_map up ON up.property_value_id = pv.id GROUP BY 1 HAVING count(*) = (SELECT count(*) FROM input) ) sub USING (id);
Add / remove only lines from the VALUES expression. Or remove the WITH clause and the JOIN clause so that you donโt have to use property filters at all.
The problem with this query class (considering all partial matches) is performance . My first request is less dynamic, but usually much faster. (Just test with EXPLAIN ANALYZE .) Especially for large tables and a growing number of properties.
The best of both worlds?
This recursive CTE solution should be a good compromise: fast and dynamic:
WITH RECURSIVE input AS ( SELECT count(*) OVER () AS ct , row_number() OVER () AS rn , * FROM ( VALUES
here
Recursive CTE Guide.
The added complexity does not pay off for small tables where the extra overhead outweighs any benefit or the difference is not significant to start with. But it scales much better and more and more surpasses methods of "calculation" with growing tables and a growing number of property filters.
Counting methods should visit all rows in user_property_map for all specified property filters, while this request (as well as the 1st request) can eliminate unnecessary users at an early stage.
Performance optimization
With current table statistics (reasonable settings, running autovacuum ), Postgres knows about the โmost common valuesโ in each column and will reorder joins in the 1st query to first evaluate the most selective property filters (or at least the least selective ), To a certain limit: join_collapse_limit . Connected with:
- Postgresql join_collapse_limit and time for scheduling queries
- Why does a small change in the search query slow down the query so much?
This "deus-ex-machina" intervention is not possible with the 3rd request (recursive CTE). To improve performance (possibly a lot), you should first install more selective filters. But even with the worst order, it will still exceed the number of requests.
Connected with:
- Checking Goal Statistics in PostgreSQL
Much more bloody details:
More explanations in the manual: