MongoDB Concurrency Bottleneck

Too long; Do not read

The concurrency bottleneck question I am experiencing on MongoDB. If I make one request, 1 unit of time will be required to return; if I make 2 parallel requests, both take 2 units of time to return; usually, if I make n simultaneous requests, they all take n units of time to return. My question is what can be done to improve Mongo response time when faced with concurrent requests.

Customization

I have an m3.medium instance on AWS that is running MongoDB 2.6.7 server. A m3.medium has 1 vCPU (1 core Xeon E5-2670 v2), 3.75 GB and a 4 GB solid state drive.

I have a database with one collection named user_products . The document in this collection has the following structure:

 { user: <int>, product: <int> } 

There are 1000 users and 1000 products, and there is a document for each pair of user products that sums up a million documents.

The collection has an index { user: 1, product: 1 } , and my results below are all indexOnly.

Test

The test was run from the same machine that MongoDB runs on. I am using the benchRun function that comes with Mongo. During tests, no other access to MongoDB was performed, and the tests included only read operations.

For each test, several simultaneous clients are modeled, each of which makes one request as long as possible until the test is completed. Each test lasts 10 seconds. concurrency is tested in degree 2, from 1 to 128 concurrent clients.

Command to run tests:

 mongo bench.js 

Here's the full script (bench.js):

 var seconds = 10, limit = 1000, USER_COUNT = 1000, concurrency, savedTime, res, timediff, ops, results, docsPerSecond, latencyRatio, currentLatency, previousLatency; ops = [ { op : "find" , ns : "test_user_products.user_products" , query : { user : { "#RAND_INT" : [ 0 , USER_COUNT - 1 ] } }, limit: limit, fields: { _id: 0, user: 1, product: 1 } } ]; for (concurrency = 1; concurrency <= 128; concurrency *= 2) { savedTime = new Date(); res = benchRun({ parallel: concurrency, host: "localhost", seconds: seconds, ops: ops }); timediff = new Date() - savedTime; docsPerSecond = res.query * limit; currentLatency = res.queryLatencyAverageMicros / 1000; if (previousLatency) { latencyRatio = currentLatency / previousLatency; } results = [ savedTime.getFullYear() + '-' + (savedTime.getMonth() + 1).toFixed(2) + '-' + savedTime.getDate().toFixed(2), savedTime.getHours().toFixed(2) + ':' + savedTime.getMinutes().toFixed(2), concurrency, res.query, currentLatency, timediff / 1000, seconds, docsPerSecond, latencyRatio ]; previousLatency = currentLatency; print(results.join('\t')); } 

results

The results always look like this (some output columns were omitted for easier understanding):

 concurrency queries/sec avg latency (ms) latency ratio 1 459.6 2.153609008 - 2 460.4 4.319577324 2.005738882 4 457.7 8.670418178 2.007237636 8 455.3 17.4266174 2.00989353 16 450.6 35.55693474 2.040380754 32 429 74.50149883 2.09527338 64 419.2 153.7325095 2.063482104 128 403.1 325.2151235 2.115460969 

If only one client is active, it is able to execute about 460 requests per second during a 10 second test. The average response time for a request is about 2 ms.

When 2 clients send requests at the same time, the request throughput is maintained at about 460 requests per second, indicating that Mongo did not increase the response throughput. Average latency, on the other hand, literally doubled.

For 4 clients, the template continues. The same request throughput, average latency doubles in relation to two clients. The latency ratio column is the ratio between the current and previous latency of the test average. See that it always shows a doubling of the delay.

Update: more processor power

I decided to test different types of instances by changing the number of vCPUs and the amount of RAM available. The goal is to see what happens when you add more processor power. Checked instance types:

 Type vCPUs RAM(GB) m3.medium 1 3.75 m3.large 2 7.5 m3.xlarge 4 15 m3.2xlarge 8 30 

Here are the results:

Queries per second

Query latency

m3.medium

 concurrency queries/sec avg latency (ms) latency ratio 1 459.6 2.153609008 - 2 460.4 4.319577324 2.005738882 4 457.7 8.670418178 2.007237636 8 455.3 17.4266174 2.00989353 16 450.6 35.55693474 2.040380754 32 429 74.50149883 2.09527338 64 419.2 153.7325095 2.063482104 128 403.1 325.2151235 2.115460969 

m3.large

 concurrency queries/sec avg latency (ms) latency ratio 1 855.5 1.15582069 - 2 947 2.093453854 1.811227185 4 961 4.13864589 1.976946318 8 958.5 8.306435055 2.007041742 16 954.8 16.72530889 2.013536347 32 936.3 34.17121062 2.043083977 64 927.9 69.09198599 2.021935563 128 896.2 143.3052382 2.074122435 

m3.xlarge

 concurrency queries/sec avg latency (ms) latency ratio 1 807.5 1.226082735 - 2 1529.9 1.294211452 1.055566166 4 1810.5 2.191730848 1.693487447 8 1816.5 4.368602642 1.993220402 16 1805.3 8.791969257 2.01253581 32 1770 17.97939718 2.044979532 64 1759.2 36.2891598 2.018374668 128 1720.7 74.56586511 2.054769676 

m3.2xlarge

 concurrency queries/sec avg latency (ms) latency ratio 1 836.6 1.185045183 - 2 1585.3 1.250742872 1.055438974 4 2786.4 1.422254414 1.13712774 8 3524.3 2.250554777 1.58238551 16 3536.1 4.489283844 1.994745425 32 3490.7 9.121144097 2.031759277 64 3527 18.14225682 1.989033023 128 3492.9 36.9044113 2.034168718 

Starting with the type xlarge , we begin to see that it finally processes 2 simultaneous requests, keeping the request delay almost the same (1.29 ms). However, this is not too long, and for 4 clients it again doubles the average delay.

With the 2xlarge type , Mongo can handle up to 4 concurrent clients without increasing the average latency. After that, he begins to double again.

Question: what can be done to improve Mongo response time regarding concurrent requests? I expected to see an increase in query throughput, and I did not expect to see it doubling the average delay. This clearly shows that Mongo cannot parallelize the requests that arrive.

There some kind of bottleneck somewhere restricts Mongo, but this, of course, does not help to restrain more processor power, since the cost will be prohibitive. I do not think that the memory problem is here, since my entire test database fits easily in RAM. Is there anything else I could try?

+5
source share
1 answer

You are using a server with 1 core, and you are using benchRun. On the benchRun page:

This benchRun team is designed as a baseline performance measurement tool; it is not intended to be a "test".

Latency scaling with concurrency digits is suspiciously accurate. Are you sure the calculation is correct? I could believe that ops / sec / runner remained the same, while the latency / op also remained the same as the number of runners, and if you added all the delays, you would see results similar to yours.

0
source

All Articles