Which hierarchical model should I use? Neighborhood Nested or Enumerated?

I have a table that contains the location of all the geographic locations in the world and their relationships.

Here is an example showing a hierarchy. You will see that the data is actually stored as all three

  • Enumerated path
  • Contact List
  • Nested set

Data, obviously, never changes. The following is an example of the direct ancestors of Brighton's location in England, which has a density of 13911.

Table: geoplanet_places (has 5.6 million rows) Ancestors Large Image: http://tinyurl.com/68q4ndx

Then I have another table called entities . This table stores my objects, which I would like to compare with the geographical location. I store some basic information, but most importantly I store woeid , which is the foreign key from geoplanet_places . enter image description here

Ultimately, the entities table will contain several thousand objects. And I would like to be able to return a complete tree of all nodes that contain objects.

I plan to create something that will make it easier to filter and search for objects based on their geographic location and be able to find out how many objects can be found on this particular node.

So, if I have only one entity in my entities table, I might have something like this

`Earth (1)

United Kingdom (1)

England (1)

East Sussex (1)

Brighton and Hove City (1)

Brighton (1) `

Lets then say that I have another entity that is in Devon, then it will show something like:

Earth (2)

United Kingom (2)

England (2)

Devon (1)

East Sussex (1) ... etc.

(Graphs) that will say how many objects β€œinside” each geographic location do not need to live. I can live by creating my object every hour and caching it.

The goal is to create an interface that can start by showing only countries that have entities.

Like it so much

Argentina (1021) , Chile (291) , ... , United States (32,103) , United Kingdom (12,338)

The user then clicks on a location such as United Kindom, and then all immediate child nodes that are descendants of the United Kingdom and have an entity in them will be provided.

If there are 32 environments in United Kindgdom, but only 23 of them, after all, when you expand, have entities stored in them, then I do not want to display the other 9. These are only locations.

This site accurately demonstrates the functionality I want to achieve: http://www.homeaway.com/vacation-rentals/europe/r5 enter image description here

How do you recommend me to manage such a data structure?

Things I use.

  • Php
  • MySQL
  • Solr

I plan for the cuts to be as quick as possible. I want to create an AJAX interface that will seem useless to search.

I would also be interested to know in which columns you would recommend indexing.

+6
mysql enumeration adjacency-list nested-sets
source share
2 answers

There are usually three types of queries in hierarchies that cause problems:

  • The return of all ancestors
  • Return all descendants
  • Returns all children (immediate descendants).

Here is a small table showing the performance of various methods in MySQL :

  Ancestors Descendants Children Maintainability InnoDB Adjacency list Good Decent Excellent Easy Yes Nested sets (classic) Poor Excellent Poor/Excellent Very hard Yes Nested sets (spatial) Excellent Very good Poor/Excellent Very hard No Materialized path Excellent Very good Poor/Excellent Hard Yes 

In children , poor/excellent means that the answer depends on whether you mix the method with the adjacency list, i. e. keeping parentID in each record.

For your task, you will need all three queries:

  • All ancestors to show Earth / UK / Devon thing
  • All children to show "Destinations in Europe" (items)
  • All descendants to show "Destinations in Europe" (calculations)

I would go on materialized paths, since this hierarchy rarely changes (only in case of war, rebellion, etc.).

Create a varchar column named path , index it and fill it with the following value:

 1:234:6345:45454: 

where the numbers are the primary keys of the respective parents, in the correct order ( 1 for Europe, 234 for the UK, etc.)

You will also need a table called levels to save numbers from 1 to 20 (or any level of maximum nesting level).

To select all ancestors:

 SELECT pa.* FROM places p JOIN levels l ON SUBSTRING_INDEX(p.path, ':', l.level) <> p.path JOIN places pa ON pa.path = CONCAT(SUBSTRING_INDEX(p.path, ':', l.level), ':') WHERE p.id = @id_of_place_in_devon 

To select all the children and the number of places inside them:

 SELECT pc.*, COUNT(pp.id) FROM places p JOIN places pc ON pc.parentId = p.id JOIN places pp ON pp.path BETWEEN pc.path AND CONCAT(pc.path, ':') AND pp.id NOT IN ( SELECT parentId FROM places ) WHERE p.id = @id_of_europe GROUP BY pc.id 
+8
source share

This is the query I came up with. This is an adaptation of what you offer Quassnoi.

 SELECT pa.*, level, SUBSTRING_INDEX(p.ancestry, '/', l.level), p.* FROM geoplanet_places p JOIN levels l ON SUBSTRING_INDEX(p.ancestry, '/', l.level) <> p.ancestry JOIN geoplanet_places pa ON pa.woeid = SUBSTRING_INDEX( SUBSTRING_INDEX(p.ancestry, '/', l.level),'/',-1) WHERE p.woeid = "13911" 

This returns all Brighton's parents.

The problem with your request was that it did not return the path to the parents, but instead there was a node that shared the same path.

 SELECT pa.*, GROUP_CONCAT(pa.name ORDER BY pa.lft asc),group_concat( pa.lft ), pa.ancestry FROM geo_places p JOIN levels l ON SUBSTRING_INDEX(CONCAT(p.ancestry, p.woeid,'/'), '/', l.level) <> p.ancestry JOIN geo_places pa ON pa.woeid = SUBSTRING_INDEX( SUBSTRING_INDEX(CONCAT(p.ancestry, p.woeid,'/'), '/', l.level),'/',-1) WHERE p.woeid IN ("12767488","12832668","12844837","131390","131391","12846428","24534461") GROUP BY p.woeid 
0
source share

All Articles