MongoDB: Stopping on the same machine. Does this make sense?

created a collection in MongoDB, consisting of 11446615 documents.

Each document has the following form:

{ "_id" : ObjectId("4e03dec7c3c365f574820835"), "httpReferer" : "http://www.somewebsite.pl/art.php?id=13321&b=1", "words" : ["SEX", "DRUGS", "ROCKNROLL", "WHATEVER"], "howMany" : 3 } 

httpReferer : URL only

words : words parsed using the URL above. The size of the list is from 15 to 90.

I plan to use this database to get a list of web pages that have similar content.

I will query this collection using a word field so that I create (or rather start creating) an index in this field:

 db.my_coll.ensureIndex({words: 1}) 

Creating this collection takes a lot of time. I tried two approaches (the tests below were done on my laptop):

  • Paste and indexing Paste takes 5.5 hours mainly due to intensive data preprocessing. Indexing took 30 hours.
  • Indexing before insertion It takes several days to insert all the data into the assembly.

My main focus is on reducing collection generation time. I don't need replication (at least not yet). The request also does not have to be fast.

Now, time for the question:

I have only one machine with one drive, I can run my application. Does it make sense to run more than one database instance and share my data between them?

+7
source share
5 answers

Yes , it makes sense to outline on a single server.

  • Currently, MongoDB still uses global locking on the mongodb server. Creating multiple servers will free the server from blocking.

  • If you run a multi-core machine with separate NUMA, it can also improve performance.

  • If your load increases too much for your server, the initial splinter makes horizontal scaling easy in the future. You could also do it now.

Cars vary. I suggest writing my own test program to insert the insert and deploy a large number of skulls of the MongoDB server. I have a 16-core RAIDed machine, and I find that 3-4 shards seem perfect for my heavy database. I found that my two NUMA is my bottleneck.

+15
source

On modern day (2015) with mongodb v3.0.x, there is a build level lock using mmap, which slightly increases write throughput (provided that you write several collections), but if you use the wiredtiger engine there is a document level lock, which has a much higher write throughput. This eliminates the need for splinters on a single machine. Although you can technically still increase the performance of mapReduce by splinters on a single machine, in this case you would be better off using only an aggregation infrastructure that can use multiple cores. If you rely heavily on map reduction algorithms, it might make sense to use something like Hadoop.

The only reason for the fragmentation Mongob is horizontal. Thus, in the event that on one machine there can not be enough disk space, memory or processor power (rarely), then the fragments become useful. I think it’s really rare that someone has enough data that they need to outline, even a large business, especially since wiredtiger has added compression support that can reduce disk usage by up to 80% less. Its also rare that someone uses mongodb to do really heavy processor requests on a large scale, because there are much better technologies for that. In most cases, IO is the most important factor in performance, not so many requests - this is the processor intensity, unless you are performing many complex aggregates, even geo-information is indexed during insertion.

Most likely, you need to beware if you have many indexes that consume a lot of RAM, wiredtiger reduces this, but it is still the most common reason for shards. Where, like fragments from one machine, is likely to lead to undesirable overhead, with very little or no benefits.

+4
source

This should not be a matter of mango, it is a matter of a common operating system. There are three possible bottlenecks for using your database.

  • network (i.e. you are on a gigabit line, you use most of it at peak times, but your database is not really loaded)
  • CPU (your processor is almost 100% but the drive and network are barely ticking)
  • disk

In the case of a network, rewrite your network protocol, if possible, otherwise outline other machines. In the case of the processor, if you are 100% on several cores, while others are free, then on one machine productivity will improve. If the disk is fully used, add more disks and a splinter on them - the path is cheaper than adding more machines.

+2
source

No, it makes no sense to cheat on one server.

There are a few exceptional cases, but they mostly boil down to concurrency issues related to things like running map / reduce or javascript.

+1
source

This is the answer in the first paragraph of the replica installation tutorial

http://www.mongodb.org/display/DOCS/Replica+Set+Tutorial

-2
source

All Articles