A hierarchical style query has terrible performance in Cypher. Should I use the Traverser API?

My cypher request is as follows (I am looking to find out users who bought in sectors)

START n=node:sectors('SECTOR_ID:65, SECTOR_ID:66 ...') // 20 sectors MATCH (n)-[:HAS_DOMAIN]->(dom)-[:HAS_CAT]->(cat)<-[:BELONGS_TO]-(prod)-[:BOUGHT_BY]->(user) RETURN n.sector_name, COUNT(user), COLLECT(DISTINCT(product.name)), ... etc. 

I find that since the number of paths increases exponentially on each traversal, the final query has a result of 25 seconds. So, if a sector has 50 domains, each domain has 1000 categories and each category has 250K ++ products.

It seems to me that this is a “supernova problem” ... or too many ways!

Should I use the Traverser API? Should I try to model my data differently?

Any ideas are welcome!

Neo4j 1.8.3, Linux

Thanks!

+7
neo4j cypher
source share
1 answer

Your main problem is, in fact, memory when you try to load all paths. If this is the statistics you want, you will almost certainly be better off with the Traverser API so you can control the load / aggregation of nodes.

If you want to stick with Cypher, this can help split each sector into its own request so that they can work better in parallel. If you have full control over reading / writing the database, then another option is to create initial nodes that you can update when reading / writing. Thus, you do not need to know all the ways how this change affects statistics. You can also create direct relationships from the sector to interesting sites and simply collect the necessary information into one element, for example

 (sector)-[:HAS_CAT]->(cat) WITH sector, collect(cat.name) as Categories 

Thus, with each match, you can consolidate back to the original number of columns.

0
source share

All Articles