Optimizing the SQL Where clause for subquery queries

Question

Optimizing the SQL Where clause for subquery queries

Let's say I have the following hypothetical data structure:

create table "country" ( country_id integer, country_name varchar(50), continent varchar(50), constraint country_pkey primary key (country_id) ); create table "person" ( person_id integer, person_name varchar(100), country_id integer, constraint person_pkey primary key (person_id) ); create table "event" ( event_id integer, event_desc varchar(100), country_id integer, constraint event_pkey primary key (event_id) );

I want to request the number of rows of people and events in each country. I decided to use a subquery.

 select c.country_name, sum(sub1.person_count) as person_count, sum(sub2.event_count) as event_count from "country" c left join (select country_id, count(*) as person_count from "person" group by country_id) sub1 on (c.country_id=sub1.country_id) left join (select country_id, count(*) as event_count from "event" group by country_id) sub2 on (c.country_id=sub2.country_id) group by c.country_name

I know that you can do this using select commands in the list of fields, but the advantage of using subqueries is that I am more flexible in modifying SQL to compile it and use another field. Say, if I changed the query to display it by continent, it would be as simple as replacing the c.country_name field with c.continent.

My problem is with filtering. If we add a where clause like this:

 select c.country_name, sum(sub1.person_count) as person_count, sum(sub2.event_count) as event_count from "country" c left join (select country_id, count(*) as person_count from "person" group by country_id) sub1 on (c.country_id=sub1.country_id) left join (select country_id, count(*) as event_count from "event" group by country_id) sub2 on (c.country_id=sub2.country_id) where c.country_name='UNITED STATES' group by c.country_name

The subqueries seem to still do the counting for all countries. Suppose people and events tables are huge, and I already have country_id indexes for all tables. It is very slow. Should the database only fulfill subqueries for the country that has been filtered? Should I recreate the country filter for each subquery (this is very tedious, and the code is not easily modified)? I use both PostgreSQL 8.3 and 9.0, but I think this happens in other databases.

+4

sql postgresql subquery where

clj Oct 28 '11 at 0:54

source share

2 answers

Can you filter / group strings with country_id not country_name ? I suppose you do not have an index by name.
Subqueries do not use any index, this is normal because you are viewing the entire table. If you want to reduce the scan, you must filter the data.

0

ravnur Oct 28 '11 at 7:41

source share

Mike Sherrill 'Cat Recall' · Accepted Answer · 2011-10-28T02:53:56+0000

If the database does not fulfill only subqueries for the country that has been filtered?

No. The first step in a query like yours should be to create a worksheet from all the table constructors in the FROM clause. After that, the WHERE clause is computed.

Imagine how you would do it if sub1 and sub2 were both base tables and not subheadings. They would have two columns, and they would have one row for each country_id. And if you want to JOIN all lines, you should write this as follows.

 from "country" c left join sub1 on (c.country_id=sub1.country_id) left join sub2 on (c.country_id=sub2.country_id)

But if you want to join a single line, you must write something equivalent to this.

 from "country" c left join (select * from sub1 where country_id = ?) on (c.country_id=sub1.country_id) left join (select * from sub2 where country_id = ?) on (c.country_id=sub2.country_id)

Joe Celko, who helped develop early SQL standards, often wrote about what the Usenet SQL evaluation order looks like .

Optimizing the SQL Where clause for subquery queries

More articles: