PostgreSQL - tree organization

I am working on a project that needs a category tree organized as id, parent, title table. What are the best ways to retrieve a category and its subcategories (and a complete tree if the root categories have parent = 0) in Postgres? I am looking for a clean solution for the database, but if there is a way for Ruby and PHP it will be great too.

The main goal is the speed of select statements, because the data in this table is not critical for the update / insert / delete speed.

UPD: there will also be a search for paths, I mean the path from the current vertex (category) to the root category.

+4
source share
6 answers

get a category and its subcategories

If you have only a limited depth of sub-elements, you can do this using self-join, for example. two levels:

SELECT * FROM categories AS child LEFT JOIN categories AS parent ON parent.id=child.parent LEFT JOIN categories AS grandparent ON grandparent.id=parent.parent WHERE child.id=(id) OR parent.id=(id) OR grandparent.id=(id); 

You cannot do this for a hierarchy with arbitrary depth using standard SQL according to a scheme such as a parent identifier-foreign key.

Some DBMSs provide non-standard hierarchical tools that allow something similar in different ways, but if you want to adhere to DBMS compatible code, you need to rewrite the schema into one of the best hierarchy representation models. Two popular:

  • Nested set . Preserves a linear ordering representing a depth-of-tree search in two columns of the target table (one of which you will already have if your target has an explicit order).

  • Attitude Attachment . Stores each pair of ancestors / descendants in a separate join table.

There are advantages and disadvantages for each approach, as well as numerous options (for example, sparse numbering of numbered sets, β€œdistance in AR”), which can affect how expensive the various operations of adding / removing / moving-position are. Personally, I am leaning toward the default simplified nested set model, since it contains less redundancy than AR.

+2
source

Take a look at the "ltree" contrib module.

+3
source

I played with ltree , which is the PostgreSQL Contributor module, to find out if this is good for thread comments. You create a column in your table that saves the path and creates the ltree index on it. Then you can execute the following queries:

  ltreetest=# select path from test where path ~ '*.Astronomy.*'; path ----------------------------------------------- Top.Science.Astronomy Top.Science.Astronomy.Astrophysics Top.Science.Astronomy.Cosmology Top.Collections.Pictures.Astronomy Top.Collections.Pictures.Astronomy.Stars Top.Collections.Pictures.Astronomy.Galaxies Top.Collections.Pictures.Astronomy.Astronauts 

I have not played with him enough to determine how well it works with things like inserts, updates, or deletes. I assume the deletion will look like this:

 DELETE FROM test WHERE path ~ '*.Astronomy.*'; 

I think a table with comments might look like this:

 CREATE SEQUENCE comment_id_seq INCREMENT 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 78616 CACHE 1; CREATE TABLE comments ( comment_id int PRIMARY KEY, path ltree, comment text ); CREATE INDEX comments_path_idx ON comments USING gist (path); 

The rough (and untested) insert looks like this:

 CREATE FUNCTION busted_add_comment(text the_comment, int parent_comment_id) RETURNS void AS $BODY$ DECLARE INT _new_comment_id; -- our new comment_id TEXT _parent_path; -- the parent path BEGIN _new_comment_id := nextval('comment_id_seq'::regclass); SELECT path INTO _parent_path FROM comments WHERE comment_id = parent_comment_id; -- this is probably busted SQL, but you get the idea... this comment path looks like -- the.parent.path.US -- -- eg (if parent_comment_id was 5 and our new comment_id is 43): -- 3.5.43 INSERT INTO comments (comment_id, comment, path) VALUES (_new_comment_id, the_comment, CONCAT(_parent_path, '.', _new_comment_id)); END; $BODY$ LANGUAGE 'plpgsql' VOLATILE; 

Or something. Basically, a path is simply a hierarchy consisting of all primary keys.

+2
source

I loved the nested dialing model for this kind of situation. Updates and inserts can be a bit complicated, but the selections are usually very compressed and fast. Performance might be even better if you add the actual reference to the parent element of the node (in some cases it will exclude the union, as well as the natural sorting of the child nodes.

A typical query for the current node and all children will look like this:

 select name from nestedSet c inner join nestedSet p ON c.lft BETWEEN p.lft AND p.rgt where p.id = 1 order by lft 

A few well-placed group by clauses will also provide some quick statistics about your tree.

+1
source

There is an acts_as_tree plugin for Rails that worked well for me in the past. I had a rather small tree, although about 15,000 nodes.

0
source

Just add the article Hierarchical data management in MySQL has a good explanation of the address list model and nested set models, including an example of SQL for tree processing, etc.

Hierarchies in a DBMS are a complex topic. I have Joe Celkos Trees and Hierarchies in SQL for Smarties on my wish list to buy and read some day.

0
source

All Articles