Sorting is not a problem - in fact, the cost of the processor and sorting memory is close to zero, since Postgres has the form Top-N, where the result set is checked, keeping a small sorting buffer containing only Top -N lines up to date.
select count(*) from (1 million row table) -- 0.17 s select * from (1 million row table) order by x limit 10; -- 0.18 s select * from (1 million row table) order by x; -- 1.80 s
So, you see that Top-10 sorting adds 10 ms to an odd fast count (*) compared to much longer for real sorting. This is a very neat feature, I use it a lot.
OK, now without EXPLAIN ANALYZE itβs impossible to be sure, but I feel the cross-connection is the real problem. You basically filter the rows in both tables using:
where (A.power_peak between 1.0 AND 100.0) and A.area_acre >= 500 and A.solar_avg >= 5.0 AND A.pc_num <= 1000 and (A.fips_level1 = '06' AND A.fips_country = 'US' AND A.fips_level2 = '025') and B.volt_mn_kv >= 69 and B.fips_code like '%US06%' and B.status = 'active'
OK I don't know how many rows are selected in both tables (only EXPLAIN ANALYZE can tell), but this is probably important. Knowing these numbers will help.
Then we got the worst CROSS JOIN condition:
and ST_within(ST_Centroid(A.wkb_geometry), ST_Buffer((B.wkb_geometry), 1000))
This means that all lines of A are matched with all lines of B (therefore, this expression will be evaluated many times) using a bunch of rather complex, slow and intensive processes.
Of course it's awfully slow!
When you delete ORDER BY, postgres just appears (by chance?) With a bunch of matching lines right at the beginning, displays them and stops after reaching LIMIT.
Here is a small example:
Tables a and b are identical and contain 1000 rows and a column of type BOX.
select * from a cross join b where (ab && bb)
Here, 1,000,000 overlap tests (operator &) completed in 0.28 s. The test data set is generated so that the result set contains only 1000 rows.
create index a_b on a using gist(b); create index b_b on a using gist(b); select * from a cross join b where (ab && bb)
Here, the index is used to optimize cross-connects, and the speed is ridiculous.
You need to optimize the geometry.
- add columns to be cached:
- ST_Centroid (A.wkb_geometry)
- ST_Buffer ((B.wkb_geometry), 1000)
There is NO POINT when reprocessing these slow functions a million times during your CROSS JOIN, so save the results in a column. Use a trigger to keep them up to date.
add columns of type BOX that will cache:
- Boundary Box ST_Centroid (A.wkb_geometry)
- Boundary Box ST_Buffer ((B.wkb_geometry), 1000)
add gist indices to BOXes
add a box overlap test (using the & & operator) that will use the index
save ST_Within, which will act as the last filter in the lines that pass
Perhaps you can just index the columns ST_Centroid and ST_Buffer ... and use the (indexed) operator "contains", see here:
http://www.postgresql.org/docs/8.2/static/functions-geometry.html