A way to reduce memory usage with mongoose while executing a query

I am working on node backend trying to optimize a very heavy request to mongodb through mongoose. The expected return size is significant, but for some reason, when I make a request, node starts consuming a huge amount of memory, for example, 200mb + for one large request.

Given that the return size is less than 10 MB, in most cases this seems to be wrong. He also refuses to let go of memory after its completion, I know that this is probably just a V8 GC that performs its default behavior, but I am concerned about the huge amount of memory consumed for a single find () request.

I highlighted it using testing to call find (). After making the call, it does some post-processing, then sends the data to the callback, all in an anonymous function. I tried using querystream instead of model.find (), but it does not show any real improvements.

Looking around, I didn’t give any answers, so I’ll ask if there is a known way to reduce, control or optimize memory usage in mongoose? Does anyone know why so much extra memory is used for one call?

EDIT

As suggested by Johnny and Blakes, using a fast () mix with streaming and using pause and resume significantly improved the use of runtime and memory usage. Thanks!

+5
source share
2 answers

You can use the lean parameter to query Mongoose, since you only need simple JavaScript documents, not full instances of Mongoose instances. This leads to faster operation and reduced memory usage.

model.find().lean().exec(function(err, docs) {...}); 

You can also combine lean() with streaming results, which will further reduce memory usage.

 var stream = model.find().lean().stream(); 
+7
source

By default, mongoose .find() , of course, returns all the results as an "array", so it will always use memory with large results, so this leaves the "stream" interface.

The main problem is that you are using the stream interface (since it is inherited from the main stream node) the "fires" event and the event handler associated with it are executed continuously.

This means that even with the "thread", your subsequent actions in the event handler "stack" upwards, at least consuming a lot of memory and, possibly, waiting for the call stack if there are additional asynchronous processes.

So, the best thing you can do is to “limit” the actions in your stream processing. This is as simple as calling the .pause() method:

 var stream = model.find().stream(); // however you call stream.on("data",function() { // call pause on entry stream.pause(); // do processing stream.resume(); // then resume when done }); 

So .pause() stops the events in the thread that throws, and this allows you to perform actions in the event handler before continuing, so that they do not all arrive immediately.

When your processing code is complete, you call .resume() either directly inside the block, as shown here in the callback block of any asynchronous action performed inside the block. Note that the same rules apply for asynchronous actions, and that "everyone" should signal completion before you call a resume.

There are other optimizations that can be applied as well, and you might find that the available queue processing or asynchronous flow control modules are available to help you achieve better performance while doing this in parallel.

But basically think of .pause() , then process and .resume() to continue to avoid a lot of memory in your processing.

Also, be aware of your “exits” and likewise try to use the “stream” again if you are creating something to answer. All this will be in vain if the work that you do just actually creates another variable in memory, so it helps to realize this.

+6
source

All Articles