I have a large collection (~ 2.7 million documents) in mongodb, and there are many duplicates. I tried running ensureIndex({id:1}, {unique:true, dropDups:true})in collections. Mongo refuses him for a while before he decides that too many dups on index build with dropDups=true.
ensureIndex({id:1}, {unique:true, dropDups:true})
too many dups on index build with dropDups=true
How to add an index and get rid of duplicates? Or vice versa, what's the best way to remove multiple duplicates so that the mongos can successfully create the index?
For bonus points, why is there a limit on the number of duplicates that can be discarded?
MongoDB, , , . dropDups , ( " ", ).
dropDups
?
, : id?
id
MongoDB _id, . MongoDB _id ObjectId, , . , ID, .
_id
ObjectId
, id _id. . ( , " ", , )
, " " ( ). , , , , c2, ( ), upsert:
c2
db.c1.find().forEach(function(x){db.c2.update({field1:x.field1, field2:x.field2}, x, {upsert:true})})
field1 field2 . c1 . , , .
field1
field2
c1