C # Driver SafeMode off Upserts - not all entries are updated / inserted

In our application, we make large amounts of attachments / updates (from 1 to 100 thousand), and I noticed that not all records are saved. It saves from 90% to 95% of records with safemode disabled.

Running upsert with safemode on successful update of all records, but too slow. I remember reading somewhere that even with a safe time there should not be any reason why updating / inserting should fail if the server is unavailable.

I wrote a small application to test this, and included the code below. He is trying to insert 100,000 ints into Mongo, and after checking after starting, I see about 90,000 entries in the collection.

(Note: I use Parallel update, because I update _id, and Mongo 2.0 supports parallel operations when using _id. If you do not use Parallel.Foreach, I still see the loss of records, although not so great)

MongoServer server = MongoServer.Create(host); MongoDatabase test = server.GetDatabase("testDB"); var list = Enumerable.Range(0, 100000).ToList(); using (server.RequestStart(test)) { MongoCollection coll = test.GetCollection("testCollection"); Parallel.ForEach(list, i => { var query = new QueryDocument("_id", i); coll.Update(query, Update.Set("value",100), UpdateFlags.Upsert, SafeMode.False);; }); } 

So, I think, my question is: what is the best thing to do with a lot of updates quickly, with 100% success?

I cannot use insertion because I have several processes writing Mongo and cannot be sure if any specific document exists or not, therefore I use Upsert.

+4
source share
4 answers

When you use SafeMode.False, the C # driver simply writes Insert / Update messages to the socket and does not wait for a response. When you quickly write a lot of data to a socket, it will be buffered on the client side, and the network stack will compress bytes on the network as fast as possible. If you saturate the network, everything can be a little reinforced.

I assume that you are leaving your process before the network stack can write all remaining bytes to the network. This explains the lost documents.

It’s best to call Count at the end, not once, but several times until the count becomes what you think it is. At this point, you know that there is no data to transmit.

However, if any of the inserts for some reason did not work (for example, violating the unique index), the counter will never reach your expected value. There is no 100% way to find out if the insert / update worked without using SafeMode.True.

Note that most long-running server processes never have this problem because they never quit.

+7
source

I found your question very interesting, so I did some tests myself.

It seems that calling coll.Count () periodically did the trick in my tests.

You will need to further test the performance, but I think it's still better than doing SafeMode.True

Here is the test block code to prove the fix:

  [TestMethod] public void TestMethod1() { MongoServer server = MongoServer.Create(ConfigurationManager.ConnectionStrings["MongoUnitTestConnStr"].ConnectionString); MongoDatabase test = server.GetDatabase("unit_test_db"); int totalDocuments = 100000; var list = Enumerable.Range(0, totalDocuments).ToList(); int count = 0; DateTime start, end; using (server.RequestStart(test)) { MongoCollection coll = test.GetCollection("testCollection"); start = DateTime.Now; Parallel.ForEach(list, i => { var query = new QueryDocument("_id", i); coll.Update(query, Update.Set("value", 100), UpdateFlags.Upsert, SafeMode.False); // Calling a count periodically (but sparsely) seems to do the trick. if (i%10000 == 0) count = coll.Count(); }); // Call count one last time to report in the test results. count = coll.Count(); end = DateTime.Now; } Console.WriteLine(String.Format("Execution Time:{0}. Expected No of docs: {2}, Actual No of docs {3}", (end-start).TotalSeconds, count, totalDocuments)); } 

Test results:

Execution Time:105.8125.812. Expected No of docs: 100000, Actual No of docs 100000

+2
source

Extrapolating Robert’s point, how about using safe mode for the last recording? Will it be at the back of the line so that it does not end until the socket turns red? Solves the problem of non-deterministic counting.

Of interest, why is RequestStart used? Surely this will affect the performance, and not the distribution of load through the connection pool? (assuming the network is not maximized)

0
source

I had the same problem. I have made many investments in batch operations on a MongoDb server through a command line application that runs periodically. Each time, when a different number of documents was entered into the database, using the same input.

It seems that this is exactly what Robert described above, and I solved it using

 MongoServer.Disconnect() 

before you exit the process. It seems like it disables all outgoing data on the server before doing the shutdown, and so all my documents are inserted every time.

As Robert said, this will almost never be a problem in lengthy processes such as servers.

0
source

All Articles