IOPS or bandwidth? - Recording bottleneck detection in an Amazon RDS instance

We have overnight download jobs that write several hundred thousand records to the Mysql report database running on Amazon RDS.

Download tasks take several hours, but it's hard for me to determine where the bottleneck is.

The instance currently works with general purpose storage (SSD). Looking at the cloud rates, it seems I am averaging less than 50 IOPS over the past week. However, network bandwidth is less than 0.2 MB / s.

Can I still say from these data if I experience a bottleneck due to network latency (we are currently downloading data from a remote server ... this will eventually change) or Write IOPS?

If IOPS is a bottleneck, I can easily upgrade to Provisioned IOPS. But if the network problem is a problem, I will have to redesign our download jobs to load raw data from EC2 instances instead of our remote servers, which will take some time to implement.

Any advice is appreciated.

UPDATE : More about my instance. I am using an instance of m3.xlarge. It is designed for 500 GB. Download jobs are performed using the Pento ETL tool. They are retrieved from multiple (remote) source databases and inserted into an RDS instance using multiple streams.

RDS Cloudwatch Metrics

+5
source share
5 answers

You do not use much CPU. Your memory is very low. An instance with more memory should be a good victory.

You only make 50-150 iops. With such a low level, you should get 3000 in batch mode on a standard SSD level drive. However, if your database is small, it is likely to hurt you (since you get 3 iops per GB), so if you are 50 GB or less of the database, consider paying for the provided iops).

You can also try Aurora; it says mysql and may have excellent performance.

If you can distribute your notes, the spikes will be smaller.

0
source

Your most likely attacker accessing the database remotely is actually a round-trip delay. The effect is easy to overlook or underestimate.

If the remote database has, for example, 75 millisecond round-trip time, you cannot execute more than 1000 (milliseconds / s) / 75 (milliseconds / round trip) = 13.3 queries per second using one connection. There is no circumvention of the laws of physics.

The spikes suggest the inefficiency of the boot process, where it gathers for a while, and then boots for a while, and then gathers for a while, and then boots for a while.

Separate, but connected, if the client / server compression protocol MySQL is not enabled on the client side, find out how to enable it. (The server always supports compression, but the client must request it during the initial connection.) This will not fix the underlying problem, but should improve the situation a bit, since less data for physical transfer can mean less time wasted. / P>

0
source

I am not an RDS expert, and I do not know if my own particular case can shed light. In any case, hope this gives you some insight.

I have db.t1.micro preinstalled with 200 GB (which gives 600 basic IOPS features) on a shared SSD drive.

The hardest workload is when I collect thousands of records from a pool of about 2.5 million rows from a table of 10 million rows and another one of 8 million rows. I do it every day. This is what I average (this is stable performance, unlike yours, where I see a pattern of spikes):

  • Write / ReadIOPS: +600 IOPS
  • NetworkTrafficReceived / Bandwidth Transfer: <3000 bytes / s (my requests are relatively short)
  • Database Connections: 15 (work agents on a parallel basis)
  • Queue Depth: 7.5 points
  • Read / Write Bandwidth: 10 MB per second

The whole task of aggregation takes about 3 hours.

Also check out 10 tips to help improve the performance of your application in the AWS slideshow from the AWS 2014 Summit.

I do not know what else to say, since I am not an expert! Good luck

0
source

A very quick test is to buy prepared IOPS, but be careful, as you may get less than what you currently have during the surge.

Another quick spot to identify your bottleneck is to profile your job application with a profiler that understands your database driver. If you use Java, JProfiler will show the characteristics of your job and use the database.

Third, configure the database driver to print statistics about the database workload. This may tell you that you are issuing far more requests than you expected.

0
source

In my case, it was the number of records. I only recorded 30 records per minute and had an IOPS round record with roughly the same 20-30. But it was eating in the CPU, which significantly reduced processor credit. Therefore, I took all the data in this table and transferred it to another "historical" table. And cleared all the data in this table.

The CPU went down to normal, but Write IOPS remained about the same, but that was normal. Problem: indexes, I think, because so many records that needed to be indexed at insertion, it took more CPU for this indexing with so many rows. Although the only index I had was the Primary Key.

The moral of my story, the problem is not always where you think it lies, although I increased Write IOPS, this was not the main cause of the problem, but rather the processor that was used to create the indexing during insertion caused a CPU drop.

Even X-RAY on Lambda could not catch the increased request time. That is, when I started looking directly at the database.

0
source

Source: https://habr.com/ru/post/1213575/


All Articles