Import a very large set of records into MongoDB using nodejs

Before diving into my question, I would like to indicate that I am doing this in part to familiarize myself with node and mongo. I understand that there are probably better ways to achieve my ultimate goal, but what I want to learn from this is a general methodology that can be applied to other situations.

Purpose:

I have a csv file containing 6 million geo-ip entries. Each record contains only 4 fields, and the file is approximately 180 mb.

I want to process this file and insert each entry into the MongoDB collection called "Blocks". Each "Block" will have 4 fields from the csv file.

My current approach

I use mongoose to create the "Block" model and ReadStream to process the file line by line. The code I use to process the file and retrieve the entries works, and I can get it to print each entry on the console if I want.

For each entry in the file, it calls a function that creates a new Blocks object (using mongoose), fills in the fields, and saves them.

This is the code inside the function that is called every time a string is read and parsed. The variable "rec" contains an object representing one record from the file.

block = new Block(); block.ipFrom = rec.startipnum; block.ipTo = rec.endipnum; block.location = rec.locid; connections++; block.save(function(err){ if(err) throw err; //console.log('.'); records_inserted++; if( --connections == 0 ){ mongoose.disconnect(); console.log( records_inserted + ' records inserted' ); } }); 

Problem

Since the file is read asynchronously, several lines are processed at the same time, and the file is read much faster than MongoDB can write so that the whole process takes about 282,000 records and reaches the same level as a 5k + simultaneous Mongo connection. This is not a failure .. he just sits there, doing nothing and does not seem to be recovering, and the number of items in the Mongo collection does not increase.

What I will do here is a general approach to solving this problem. How do I limit the number of concurrent Mongo connections? I would like to take the opportunity to insert multiple records at the same time, but I don’t have a way to control the flow.

Thanks in advance.

+8
mongodb mongoose
source share
2 answers

I would try the option to import CSV code from Mongodb - it should do what you need without writing code

+1
source share

Do not respond to your exact situation with importing from a .csv file, but instead, doing bulk inserts (s)

-> First of all, there are no special operations with "bulk" inserts, after all, they will all be forEach.

-> if you try to read a large async-ly file, which will be much faster than the writing process, then you should consider changing your approach, first of all finding out how much your setup can process (or just a hit-and-test).

---> After that, change the way you read from the file, you don’t need to read every line from the file, asynchronously, learn to wait, use forEach, forEachSeries from Async.js to bring your readings next to the mongodb write level, and you are good to go.

+2
source share

All Articles