Is memory an efficient way to import large files and data into MongoDB?

After a recent experiment with MongoDB, I tried several different methods for importing / pasting large amounts of data into collections. So far, the most efficient method I have found is mongoimport. It works fine, but still overhead. Even after import is complete, the memory will not be available unless I restart my machine.

Example:

mongoimport -d flightdata -c trajectory_data --type csv --file trjdata.csv --headerline 

where my header and data look like this:

 'FID','ACID','FLIGHT_INDEX','ORIG_INDEX','ORIG_TIME','CUR_LAT', ... '20..','J5','79977,'79977','20110116:15:53:11','1967', ... 

With 5.3 million rows per 20 columns, about 900 MB, I get the following:

Overhead

This will not work for me in the long run; I may not always be able to reboot, or my memory shortage will end up. What would be a more efficient way to import into MongoDB? I read about periodic RAM flushing, how could I implement something like the above example?

Update: I don’t think my business would benefit greatly from fsync, syncdelay or logging settings. I'm just wondering when this would be a good idea and best practice, even if I were working on servers with high RAM.

+4
source share
1 answer

I assume that memory is used by mongodb itself, not mongoimport. Mongodb by design tries to save all its data in memory and relies on the OS to exchange files with memory mapping when there is not enough space. Therefore, I would give you two tips:

  • Don't worry about what your OS tells you about how much memory is β€œfree” - a modern, well-functioning OS usually uses every bit of RAM available for something.

  • If you can’t comply with # 1, do not run mongodb on your laptop.

+2
source

All Articles