ArangoDB: insert as example request function

Part of my schedule is built using a giant connection between two large collections, and I run it every time I add documents to any collection. The request is based on an older post .

FOR fromItem IN fromCollection FOR toItem IN toCollection FILTER fromItem.fromAttributeValue == toItem.toAttributeValue INSERT { _from: fromItem._id, _to: toItem._id, otherAttributes: {}} INTO edgeCollection 

It will take about 55,000 seconds for my dataset. I would greatly welcome suggestions for making this faster.

But I have two related problems:

  • I need to update. Usually an upsert would be nice, but in this case, since I have no way to find out the key in front, that would not help me. To get the key up, I will need to request an example to find the key of an identical, existing edge. This seems reasonable if it does not kill my performance, but I don’t know how to conditionally build my query in AQL so that it inserts an edge if the equivalent edge does not exist yet, but does nothing if the equivalent edge exists. How can i do this?
  • I need to run it every time the data is added to any collection. I need a way to run this only on the latest data so that it doesn't try to join the entire collection. How can I write AQL that allows me to join only recently inserted records? They are added with Arangoimp, and I have no guarantee in which order they will be updated, so I can not create edges at the same time as creating the nodes. How can I join only new data? I do not want to spend 55 thousand seconds when adding a record.
+6
source share
1 answer

If you run your query as written without any indexes, then it will have to run two nested full collections scan, which you can see by looking at the output

db._explain(<your query here>);

which shows something like:

  1 SingletonNode 1 * ROOT 2 EnumerateCollectionNode 3 - FOR fromItem IN fromCollection /* full collection scan */ 3 EnumerateCollectionNode 9 - FOR toItem IN toCollection /* full collection scan */ 4 CalculationNode 9 - LET #3 = (fromItem.`fromAttributeValue` == toItem.`toAttributeValue`) /* simple expression */ /* collections used: fromItem : fromCollection, toItem : toCollection */ 5 FilterNode 9 - FILTER #3 ... 

If you do

 db.toCollection.ensureIndex({"type":"hash", fields ["toAttributeValue"], unique:false})` 

Then fromCollection one full scan of the table collections will be carried out, and for each element found, a hash search will be found in toCollection , which will be much faster. Everything will happen in the parties, so this should already improve the situation. db._explain() will show this:

  1 SingletonNode 1 * ROOT 2 EnumerateCollectionNode 3 - FOR fromItem IN fromCollection /* full collection scan */ 8 IndexNode 3 - FOR toItem IN toCollection /* hash index scan */ 

Just working with recently inserted elements in fromCollection relatively simple: just add the import timestamp to all the vertices and use:

 FOR fromItem IN fromCollection FILTER fromItem.timeStamp > @lastRun FOR toItem IN toCollection FILTER fromItem.fromAttributeValue == toItem.toAttributeValue INSERT { _from: fromItem._id, _to: toItem._id, otherAttributes: {}} INTO edgeCollection 

and of course put the skiplist index in the timeStamp attribute in fromCollection .

This should work just fine to discover new vertices in fromCollection . It will “ignore” new vertices in toCollection , which are associated with old vertices in fromCollection .

You can detect them by changing the roles fromCollection and toCollection in your request (do not forget the index on fromAttributeValue in fromCollection ) and remember that you only need to place the edges if the vertex is old, as in:

 FOR toItem IN toCollection FILTER toItem.timeStamp > @lastRun FOR fromItem IN fromCollection FILTER fromItem.fromAttributeValue == toItem.toAttributeValue FILTER fromItem.timeStamp <= @lastRun INSERT { _from: fromItem._id, _to: toItem._id, otherAttributes: {}} INTO edgeCollection 

The two together should do what you want. Please find a fully processed example here .

+8
source

All Articles