What is a cursor in MongoDB?

We are worried that in the end, asList cursor not found exceptions for some asList Queries asList and I found a hint for SO that this could be quite a bit of memory.

Now I would like to know a little more about the history of the issue: can anyone explain (in English) what Cursor really is (in MongoDB)? Why can it be left open or not found?


The documentation defines the cursor as:

Pointer to a set of query results. Customers can move the cursor to get results. By default, the cursor waits after 10 minutes of inactivity.

But this is not very revealing. It might be useful to define batch for the query results, because the documentation also says :

The MongoDB server returns the query results in batch mode. The package size will not exceed the maximum BSON document size. For most queries, the first batch returns 101 documents or enough documents in excess of 1 megabyte. The size of the subsequent batch is 4 megabytes. [...] For queries that include a sort operation without an index, the server must load all the documents into memory in order to sort before returning any results.

Note: in our queries we do not use sorting operators at all, but neither limit nor offset .

+18
source share
3 answers

I do not mean the mongodb expert, but I just want to add some observations from working in the medium-sized mongo system over the past year. Also thanks to @xameeramir for a great walk on how cursors work in general.

There may be several reasons for excluding the cursor. This answer explains the one I noticed.

The cursor lives on the server side. It does not apply to a set of replicas, but exists on an instance that is primary at creation time. This means that if another instance takes over as the main one, the cursor will be lost to the client. If the old primary object is still working, and around it can still be, but is not used. Probably after a while it is removed. Therefore, if your mongo replica set is unstable or you have a shaky network in front of you, you're out of luck with long queries.

If the full content of what the cursor wants to return does not fit into memory on the server, the request can be very slow. The RAM on your servers should be more than the largest request that you run.

All of this can be partially avoided if better designed. For use cases with large long queries, you might be better off with a few small collections of databases rather than a large one.

+6
source

Here is a comparison between toArray() and cursors after find() in the MongoDB Node.js. driver Common code:

 var MongoClient = require('mongodb').MongoClient, assert = require('assert'); MongoClient.connect('mongodb://localhost:27017/crunchbase', function (err, db) { assert.equal(err, null); console.log('Successfully connected to MongoDB.'); const query = { category_code: "biotech" }; // toArray() vs. cursor code goes here }); 

Here is the code toArray() which goes in the section above.

  db.collection('companies').find(query).toArray(function (err, docs) { assert.equal(err, null); assert.notEqual(docs.length, 0); docs.forEach(doc => { console.log('${doc.name} is a ${doc.category_code} company.'); }); db.close(); }); 

According to the documentation,

The caller is responsible for ensuring that there is enough memory to store the results.

Here's a cursor based approach using the cursor.forEach() method:

  const cursor = db.collection('companies').find(query); cursor.forEach( function (doc) { console.log('${doc.name} is a ${doc.category_code} company.'); }, function (err) { assert.equal(err, null); return db.close(); } ); }); 

When using the forEach() method, instead of extracting all the data from memory, we pass the data to our application. find() immediately creates a cursor because it does not actually query the database until we try to use some of the documents that it will provide. cursor should describe our request. The second parameter to cursor.forEach shows what to do when an error occurs.

In the original version of the above code, it was toArray() called the database call. This means that we need ALL documents and we want them to be in an array .

Note that MongoDB returns data in batch mode. The figure below shows the requests from cursors (from the application) to MongoDB :

MongoDB cursor graphic

forEach scales better than toArray because we can process documents as they arrive until we reach the end. Compare this to toArray - where we expect to receive ALL documents and build the entire array. This means that we do not get any benefits from the fact that the driver and the database system work together to batch process the results for your application. Packaging is designed to be efficient in terms of memory overhead and runtime. Use this in your application if you can.

+23
source

This error also occurs when a large data set and batch processing of this data, and each packet takes longer than the default cursor lifetime.

Then you need to change this default time to tell the mongo that this cursor will not expire until processing is complete.

Do Not Check No TimeOut Documentation

0
source

All Articles