Too long; Do not read
The concurrency bottleneck question I am experiencing on MongoDB. If I make one request, 1 unit of time will be required to return; if I make 2 parallel requests, both take 2 units of time to return; usually, if I make n simultaneous requests, they all take n units of time to return. My question is what can be done to improve Mongo response time when faced with concurrent requests.
Customization
I have an m3.medium instance on AWS that is running MongoDB 2.6.7 server. A m3.medium has 1 vCPU (1 core Xeon E5-2670 v2), 3.75 GB and a 4 GB solid state drive.
I have a database with one collection named user_products . The document in this collection has the following structure:
{ user: <int>, product: <int> }
There are 1000 users and 1000 products, and there is a document for each pair of user products that sums up a million documents.
The collection has an index { user: 1, product: 1 } , and my results below are all indexOnly.
Test
The test was run from the same machine that MongoDB runs on. I am using the benchRun function that comes with Mongo. During tests, no other access to MongoDB was performed, and the tests included only read operations.
For each test, several simultaneous clients are modeled, each of which makes one request as long as possible until the test is completed. Each test lasts 10 seconds. concurrency is tested in degree 2, from 1 to 128 concurrent clients.
Command to run tests:
mongo bench.js
Here's the full script (bench.js):
var seconds = 10, limit = 1000, USER_COUNT = 1000, concurrency, savedTime, res, timediff, ops, results, docsPerSecond, latencyRatio, currentLatency, previousLatency; ops = [ { op : "find" , ns : "test_user_products.user_products" , query : { user : { "#RAND_INT" : [ 0 , USER_COUNT - 1 ] } }, limit: limit, fields: { _id: 0, user: 1, product: 1 } } ]; for (concurrency = 1; concurrency <= 128; concurrency *= 2) { savedTime = new Date(); res = benchRun({ parallel: concurrency, host: "localhost", seconds: seconds, ops: ops }); timediff = new Date() - savedTime; docsPerSecond = res.query * limit; currentLatency = res.queryLatencyAverageMicros / 1000; if (previousLatency) { latencyRatio = currentLatency / previousLatency; } results = [ savedTime.getFullYear() + '-' + (savedTime.getMonth() + 1).toFixed(2) + '-' + savedTime.getDate().toFixed(2), savedTime.getHours().toFixed(2) + ':' + savedTime.getMinutes().toFixed(2), concurrency, res.query, currentLatency, timediff / 1000, seconds, docsPerSecond, latencyRatio ]; previousLatency = currentLatency; print(results.join('\t')); }
results
The results always look like this (some output columns were omitted for easier understanding):
concurrency queries/sec avg latency (ms) latency ratio 1 459.6 2.153609008 - 2 460.4 4.319577324 2.005738882 4 457.7 8.670418178 2.007237636 8 455.3 17.4266174 2.00989353 16 450.6 35.55693474 2.040380754 32 429 74.50149883 2.09527338 64 419.2 153.7325095 2.063482104 128 403.1 325.2151235 2.115460969
If only one client is active, it is able to execute about 460 requests per second during a 10 second test. The average response time for a request is about 2 ms.
When 2 clients send requests at the same time, the request throughput is maintained at about 460 requests per second, indicating that Mongo did not increase the response throughput. Average latency, on the other hand, literally doubled.
For 4 clients, the template continues. The same request throughput, average latency doubles in relation to two clients. The latency ratio column is the ratio between the current and previous latency of the test average. See that it always shows a doubling of the delay.
Update: more processor power
I decided to test different types of instances by changing the number of vCPUs and the amount of RAM available. The goal is to see what happens when you add more processor power. Checked instance types:
Type vCPUs RAM(GB) m3.medium 1 3.75 m3.large 2 7.5 m3.xlarge 4 15 m3.2xlarge 8 30
Here are the results:


m3.medium
concurrency queries/sec avg latency (ms) latency ratio 1 459.6 2.153609008 - 2 460.4 4.319577324 2.005738882 4 457.7 8.670418178 2.007237636 8 455.3 17.4266174 2.00989353 16 450.6 35.55693474 2.040380754 32 429 74.50149883 2.09527338 64 419.2 153.7325095 2.063482104 128 403.1 325.2151235 2.115460969
m3.large
concurrency queries/sec avg latency (ms) latency ratio 1 855.5 1.15582069 - 2 947 2.093453854 1.811227185 4 961 4.13864589 1.976946318 8 958.5 8.306435055 2.007041742 16 954.8 16.72530889 2.013536347 32 936.3 34.17121062 2.043083977 64 927.9 69.09198599 2.021935563 128 896.2 143.3052382 2.074122435
m3.xlarge
concurrency queries/sec avg latency (ms) latency ratio 1 807.5 1.226082735 - 2 1529.9 1.294211452 1.055566166 4 1810.5 2.191730848 1.693487447 8 1816.5 4.368602642 1.993220402 16 1805.3 8.791969257 2.01253581 32 1770 17.97939718 2.044979532 64 1759.2 36.2891598 2.018374668 128 1720.7 74.56586511 2.054769676
m3.2xlarge
concurrency queries/sec avg latency (ms) latency ratio 1 836.6 1.185045183 - 2 1585.3 1.250742872 1.055438974 4 2786.4 1.422254414 1.13712774 8 3524.3 2.250554777 1.58238551 16 3536.1 4.489283844 1.994745425 32 3490.7 9.121144097 2.031759277 64 3527 18.14225682 1.989033023 128 3492.9 36.9044113 2.034168718
Starting with the type xlarge , we begin to see that it finally processes 2 simultaneous requests, keeping the request delay almost the same (1.29 ms). However, this is not too long, and for 4 clients it again doubles the average delay.
With the 2xlarge type , Mongo can handle up to 4 concurrent clients without increasing the average latency. After that, he begins to double again.
Question: what can be done to improve Mongo response time regarding concurrent requests? I expected to see an increase in query throughput, and I did not expect to see it doubling the average delay. This clearly shows that Mongo cannot parallelize the requests that arrive.
There some kind of bottleneck somewhere restricts Mongo, but this, of course, does not help to restrain more processor power, since the cost will be prohibitive. I do not think that the memory problem is here, since my entire test database fits easily in RAM. Is there anything else I could try?