Postgres: Large Connection Optimization

I have two tables, say

CREATE TABLE a (
  a_a BIGINT,
  a_b BIGINT,
  a_c BIGINT,
  a_someval NUMERIC
);

CREATE TABLE b (
  b_a BIGINT,
  b_b BIGINT,
  b_c BIGINT,
  b_someval NUMERIC
);

I join them as follows:

SELECT *
FROM a
  JOIN b ON (a.a_a = b.b_a AND a.a_b = b.b_b AND a.a_c = b.b_c)
;

Explain that the scheduler should sort these tables into columns used in JOINs.

Is there a way to pre-sort these tables so that they are not sorted every time they join?

Some things that may be important:

  • the query uses the whole contents of both tables (rather than a small subset of rows)
  • There are hundreds of millions of rows in each table.
  • the contents of the tables will not change - both of these tables are generated (CREATE TABLE x AS SELECT ...) in a snapshot of the production database used for analytical needs.
+4
source share
2 answers

, , join ed .

, join ed . , .

:

CREATE MATERIALIZED VIEW ab_mat AS
SELECT *
FROM a
JOIN b ON (a.a_a = b.b_a AND a.a_b = b.b_b AND a.a_c = b.b_c);

, , ( ​​ - , , , ). .

, cron, REFRESH MATERIALIZED VIEW . (, 5 ) , , .

, , , , . , , , , .

, 9.3.

OP:

, , , , .

:

SELECT *
FROM ab_mat
-- optional ordering
order by a, b, c;

, join, .

0

, , , , . , .

, , . , PostgreSQL , Oracle, , , :

SELECT *
FROM a_part01
JOIN b_part01 ON (a.a_a = b.b_a AND a.a_b = b.b_b AND a.a_c = b.b_c)
union all
SELECT *
FROM a_part02
JOIN b_part02 ON (a.a_a = b.b_a AND a.a_b = b.b_b AND a.a_c = b.b_c)
union all
...
union all
SELECT *
FROM a_part0n
JOIN b_part0n ON (a.a_a = b.b_a AND a.a_b = b.b_b AND a.a_c = b.b_c);

... :

CREATE TABLE result
AS
SELECT *
FROM a_part01
JOIN b_part01 ON (a.a_a = b.b_a AND a.a_b = b.b_b AND a.a_c = b.b_c);

...

INSERT INTO result
SELECT *
FROM a_part0n
JOIN b_part0n ON (a.a_a = b.b_a AND a.a_b = b.b_b AND a.a_c = b.b_c)

.

, , PostgreSQL ORDER BY , , . , , , . , , , , . , .

- .

, . , , .

0

All Articles