How to remove duplicates in MongoDb?

I have a large collection (~ 2.7 million documents) in mongodb, and there are many duplicates. I tried running ensureIndex({id:1}, {unique:true, dropDups:true})in collections. Mongo refuses him for a while before he decides that too many dups on index build with dropDups=true.

How to add an index and get rid of duplicates? Or vice versa, what's the best way to remove multiple duplicates so that the mongos can successfully create the index?

For bonus points, why is there a limit on the number of duplicates that can be discarded?

+5
source share
2 answers

For bonus points, why is there a limit on the number of duplicates that can be discarded?

MongoDB, , , . dropDups , ( " ", ).

?

, : id?

MongoDB _id, . MongoDB _id ObjectId, , . , ID, .

, id _id. . ( , " ", , )

+5

, " " ( ). , , , , c2, ( ), upsert:

db.c1.find().forEach(function(x){db.c2.update({field1:x.field1, field2:x.field2}, x, {upsert:true})})

field1 field2 . c1 . , , .

+3

All Articles