Based on Convert the recursive function to view , I would like to speed up the extraction of the path from any arbitrary node on the graph to its root by creating a snapshot of the node's parents. The idea is that the recursive tree-like movement is limited by the presence of intermediate snapshots that exclude further recursion and therefore speed up execution time. I did not perform the load test, so I donโt know how it works outside of this simple example, but early tests already point to some bottlenecks. I would be happy for comments on how to speed up the process / simplify queries. I am using Postgres 9.2.2.0 (20).
DROP TABLE IF EXISTS revision CASCADE; CREATE TABLE revision ( id serial primary key, parent_revision_id int references revision(id), username varchar(128), ts timestamp without time zone ); DROP TABLE IF EXISTS revision_snapshot CASCADE; CREATE TABLE revision_snapshot ( id serial primary key, revision_id int, parent_revision_id int, depth int ); CREATE OR REPLACE FUNCTION create_revision_snapshot(_rev int) RETURNS void AS $func$ DELETE FROM revision_snapshot WHERE revision_id=$1; INSERT INTO revision_snapshot (revision_id, parent_revision_id, depth) (SELECT $1, id, depth FROM revision_tree($1)); $func$ LANGUAGE sql; -- Recursively return path from '_rev' to root CREATE OR REPLACE FUNCTION revision_tree(_rev int) RETURNS TABLE(id int, parent_revision_id int, depth int) AS $func$ WITH RECURSIVE rev_list(id, parent_revision_id, depth) AS ( SELECT t.id, t.parent_revision_id, 1 FROM revision t WHERE t.id = $1 UNION ALL SELECT t.id, t.parent_revision_id, r.depth + 1 FROM rev_list r JOIN revision t ON t.id = r.parent_revision_id ) SELECT t.id, t.parent_revision_id, t.depth FROM rev_list t ORDER BY t.id; $func$ LANGUAGE sql; -- Fast version of 'revision_tree' (to be). This version will return the -- revision tree making use of snapshots (recursively returning the path from -- specified revision id to last snapshot of the path to the root + the snapshot) CREATE OR REPLACE FUNCTION revision_tree_perf(_rev int) RETURNS TABLE(parent_revision_id int) AS $func$ BEGIN CREATE TEMP TABLE graph_result ON COMMIT DROP AS WITH RECURSIVE rev_list(id, parent_revision_id, depth) AS ( SELECT t.id, t.parent_revision_id, 1 FROM revision t WHERE t.id = $1 UNION ALL SELECT t.id, t.parent_revision_id, r.depth + 1 FROM rev_list r JOIN revision t ON t.id = r.parent_revision_id WHERE not(t.id in (select revision_id from revision_snapshot)) ) SELECT t.id, t.parent_revision_id, t.depth FROM rev_list t ORDER BY t.id; RETURN QUERY SELECT g.parent_revision_id FROM graph_result AS g WHERE g.parent_revision_id IS NOT NULL UNION SELECT s.parent_revision_id FROM revision_snapshot AS s WHERE s.revision_id = (SELECT min(q.parent_revision_id) FROM graph_result as q) ORDER BY parent_revision_id; END; $func$ LANGUAGE 'plpgsql'; -- Example tree -- -- +-- <10> -- / -- +-- <4> -- <8> --- <9> -+- <11> --- <15> --- <16> --- <17> -- / -- <1> --- <2> --- <3> -+ -- \ -- +-- <5> --- <6> --- <7> --- <12> -+- <14> --- <18> -- \ -- \ -- \ -- \ -- +-- <13> --- <19> --- <20> --- <21> -- INSERT INTO revision (username, ts, parent_revision_id) VALUES ('someone', now(), null) -- 1 ,('someone', now(), 1) -- 2 ,('someone', now(), 2) -- 3 ,('someone', now(), 3) -- 4 ,('someone', now(), 3) -- 5 ,('someone', now(), 5) -- 6 ,('someone', now(), 6) -- 7 ,('someone', now(), 4) -- 8 ,('someone', now(), 8) -- 9 ,('someone', now(), 9) -- 10 ,('someone', now(), 9) -- 11 ,('someone', now(), 7) -- 12 ,('someone', now(), 12) -- 13 ,('someone', now(), 12) -- 14 ,('someone', now(), 11) -- 15 ,('someone', now(), 15) -- 16 ,('someone', now(), 16) -- 17 ,('someone', now(), 14) -- 18 ,('someone', now(), 13) -- 19 ,('someone', now(), 19) -- 20 ,('someone', now(), 20); -- 21 -- Create a revision snapsnot select create_revision_snapshot(13); -- This query is meant to be faster ... select * from revision_tree_perf(21); -- ... than this one select * from revision_tree(21);
Above example
select create_revision_snapshot(13); select * from revision_tree_perf(21);
It is intended to create a record set that denotes a path from 21 to the root, i.e. (21, 20, 19, 13, 12, 7, 6, 5, 3, 2, 1) . Part of the solution was taken by walking through the tree (from 21 to 13, since there is a 13-shot, so there is no need to go further through the tree) and using the already โcachedโ path from 13 to the root (taken from revision_snapshot). Hope this is a little easier to understand ...
Update:
I came up with a potential improvement. Itโs just a blow in the dark, but I can imagine that the offers exists quite expensive. Now I have noted the presence of a snapshot in the revision table:
CREATE TABLE revision ( id serial primary key, parent_revision_id int references revision(id), username varchar(128), has_snapshot boolean default false, ts timestamp without time zone ); CREATE OR REPLACE FUNCTION create_revision_snapshot(_rev int) RETURNS void AS $func$ DELETE FROM revision_snapshot WHERE revision_id=$1; INSERT INTO revision_snapshot (revision_id, parent_revision_id, depth) (SELECT $1, id, depth FROM revision_tree($1));
This changes the CTE revision_tree_perf SP part to
WITH RECURSIVE rev_list(id, parent_revision_id, depth) AS ( SELECT t.id, t.parent_revision_id, 1
This should be pretty fast. Another part of the puzzle will be to return the contents of the snapshot revision from the revision identifier, where has_snapshot=true , and combine the two results. The problem is how to get this revision ID from the CTE. I could save the result of the CTE query in the temp table and request the revision identifier, or perhaps it would be advisable not to write the CTE and write instead. Thus, it would be possible to track the revision identifier, according to which the loop will exit (when has_snapshot = true ). But I'm not sure how this will compare to CTE.
What do people think of this approach?