The time complexity of $ addToset vs $ push when an element does not exist in the array

The aforesaid: Connection is Safe = True, therefore, the Update return will contain information about the update.

Let's say I have documents that look like this:

[{'a': [1]}, {'a': [2]}, {'a': [1,2]}] 

And I give out:

 coll.update({}, {'$addToSet': {'a':1}}, multi=True) 

Result:

 {u'connectionId': 28, u'err': None, u'n': 3, u'ok': 1.0, u'updatedExisting': True } 

Even with documents, this value already exists. To avoid this, I can execute the command.

 coll.update({'a': {'$ne': 1}}, {'$push': {'a':1}}, multi=True) 

What is a time complexity comparison for $ addToSet vs $ push with $ ne validation?

+7
source share
3 answers

It looks like $ addToSet does the exact same thing as your command: $ push with $ ne validation. Both would be O (N)

https://github.com/mongodb/mongo/blob/master/src/mongo/db/ops/update_internal.cpp

If speed is really important, then why not use a hash:

instead:

 {'$addToSet': {'a':1}} {'$addToSet': {'a':10}} 

using:

 {$set: {'a.1': 1} {$set: {'a.10': 1} 
+14
source

Edit

Well, since I read your question all the time, it still turns out that in fact you look at two different queries and judge the time complexity between them.

First request:

 coll.update({}, {'$addToSet': {'a':1}}, multi=True) 

And the second:

 coll.update({'a': {'$ne': 1}}, {'$push': {'a':1}}, multi=True) 

The first problem arises here, without indexes. $addToSet , being an update modifier, I do not believe that it uses an index, so you do a full table scan to accomplish what you need.

In fact, you are looking for all documents that do not have 1 in a already and look at the $push value of 1 on this array of a .

So, 2 points to the second request before we get the time complexity, because the first request:

  • Does not use indexes
  • A full table scan will be deployed.
  • Then it will perform a full scan of the array (without index) to $addToSet

So, I pretty much thought that the second request is what you are looking for before any Big O note.

There is a problem of using the O note to explain the time complexity of each request here:

  • I'm not sure which perspective you want, whether for each document or for the entire collection.
  • I am not sure about the indices as such. Using indexes will actually create a log algorithm on a , but not using indexes.

However, the first query will look something like this: O (n) for each document, because:

  • $ addToSet will need to iterate over each element
  • $ addToSet will then need to perform O (1) operation to insert the set if it does not exist. I should note that I'm not sure whether O (1) is canceled or not (reading into the light indicates my version), I canceled it here.

In a collection, without an index, this will be: O (2n2), since the complexity of iterating a will increase exponentially with each new document.

The second query without indexes will look something like this: O (2n2) (O (n) per document) I believe that $ne will have the same problems as $addToSet without indexes. However, with the indexes, I believe that in fact it will be O (log n log n) (O (log n) for each document), since first it will find all documents with a , and then all documents without 1 in their set on based b-tree.

So, based on time complexity and notes at the beginning, I would say query 2 is better.

To be honest, I'm not used to explaining “Big O” Notation, so it's experimental.

Hope this helps,

+2
source

Adding my observation to the difference between addToSet and push from mass update of 100k documents.

when you do a bulk update. addToSet will be executed separately.

eg,

 bulkInsert.find({x:y}).upsert().update({"$set":{..},"$push":{ "a":"b" } , "$setOnInsert": {} }) 

first inserts and sets the document. And then it executes the addToSet request.

I saw a clear 10k difference between

 db.collection_name.count() #gives around 40k db.collection_name.count({"a":{$in:["b"]}}) # it gives only around 30k 

But when replacing $ addToSet with $ push. both count requests return the same value.

note: when you are not worried about a duplicate entry in an array. you can go with $ push.

+2
source

All Articles