Edit
Well, since I read your question all the time, it still turns out that in fact you look at two different queries and judge the time complexity between them.
First request:
coll.update({}, {'$addToSet': {'a':1}}, multi=True)
And the second:
coll.update({'a': {'$ne': 1}}, {'$push': {'a':1}}, multi=True)
The first problem arises here, without indexes. $addToSet , being an update modifier, I do not believe that it uses an index, so you do a full table scan to accomplish what you need.
In fact, you are looking for all documents that do not have 1 in a already and look at the $push value of 1 on this array of a .
So, 2 points to the second request before we get the time complexity, because the first request:
- Does not use indexes
- A full table scan will be deployed.
- Then it will perform a full scan of the array (without index) to
$addToSet
So, I pretty much thought that the second request is what you are looking for before any Big O note.
There is a problem of using the O note to explain the time complexity of each request here:
- I'm not sure which perspective you want, whether for each document or for the entire collection.
- I am not sure about the indices as such. Using indexes will actually create a log algorithm on
a , but not using indexes.
However, the first query will look something like this: O (n) for each document, because:
- $ addToSet will need to iterate over each element
- $ addToSet will then need to perform O (1) operation to insert the set if it does not exist. I should note that I'm not sure whether O (1) is canceled or not (reading into the light indicates my version), I canceled it here.
In a collection, without an index, this will be: O (2n2), since the complexity of iterating a will increase exponentially with each new document.
The second query without indexes will look something like this: O (2n2) (O (n) per document) I believe that $ne will have the same problems as $addToSet without indexes. However, with the indexes, I believe that in fact it will be O (log n log n) (O (log n) for each document), since first it will find all documents with a , and then all documents without 1 in their set on based b-tree.
So, based on time complexity and notes at the beginning, I would say query 2 is better.
To be honest, I'm not used to explaining “Big O” Notation, so it's experimental.
Hope this helps,