Cassandra Amazon EC2 Reading Experiments

I need help to improve Cassandra reading performance. I am worried about the degradation of read performance as the size of the column family increases. We have the following statistics for a single node camera.

Operating System: Linux - CentOS Version 5.4 (Final)
Cassandra Version: apache-cassandra-1.1.0
Java version: "1.6.0_14" Java (TM) SE Runtime Environment (build 1.6.0_14-b08) Java HotSpot (TM) 64-bit server VM (build 14.0-b16, mixed mode)

Cassandra Configuration: (cassandra.yaml)

  • rpc_server_type: hsha
  • disk_access_mode: mmap
  • concurrent_reads: 64
  • concurrent_writes: 32

Platform: Amazon-ec2 / Rightscale m1.X is a large instance with 4 ephemeral disks with raid0. (15 GB shared memory, 4 virtual cores, 2 ECUs, shared ECU = 8)


Experiment configurations: I tried to do some experiments with GC

Cassandra config:
10 GB of RAM is allocated with a piece of Cassandra, 3500 MB is the size of the NEW heap.

JVM configuration:
JVM_OPTS = "$ JVM_OPTS -XX: + UseParNewGC"
JVM_OPTS = "$ JVM_OPTS -XX: + UseConcMarkSweepGC"
JVM_OPTS = "$ JVM_OPTS -XX: + CMSParallelRemarkEnabled"
JVM_OPTS = "$ JVM_OPTS -XX: SurvivorRatio = 1000"
JVM_OPTS = "$ JVM_OPTS -XX: MaxTenuringThreshold = 0"
JVM_OPTS = "$ JVM_OPTS -XX: CMSInitiatingOccupancyFraction = 40"
JVM_OPTS = "$ JVM_OPTS -XX: + UseCMSInitiatingOccupancyOnly -XX: + UseCompressedOops"



OpsCenter 2.0 Community Results Statistics:

Read requests from 208 to 240 per second
Write Requests 18 to 28 per second
OS Boot 24.5 to 25.85
Write request delay from 127 to 160 microns
Read request delay from 82202 to 94612 microns
OS-sent network traffic 44646 KB avg per second
Received network network traffic 4338 KB avg per second
OS disk queue size from 13 to 15 requests
Read requests pending from 25 to 32

OS disk latency from 48 to 56 ms
6.6 Mbps disk throughput

Disk IOPs Read 420 Seconds Per Second

IOWait 80% CPU avg

Idle 13% CPU avg

Rowcache is disabled.


Speaker family
One of the column family that I only read is created through the CLI
create column family XColFam with column_type='Standard' and comparator = CompositeType(BytesType,IntegerType)';" 

SSTable Column Family Size = 7.10 GB, SSTable Count = 2

Column family

XColFam is numbered 59499904. of estimated row strings (most of them are utf8 literal with variable length, evaluated via mx4jtools) with columns such as thin in nature with a value of 0 bytes ..... now.

Most rows should have very few column numbers, maybe 1 to 10, so with about 20-30 bytes of the 1st component of the column name and the 2nd of 8 bytes ..... The second component of the composite speaker column can repeated, but the probability is low ....... The 1st component is repeated in varieties, but the number of columns in rows can be different.

I tried SnappyCompression squeeze the column family, but there was no change in size.

I have a scheduled service that runs for 20 threads for hours and makes random read requests for several keys (currently there are 2 keys per request) in this column family and reads full rows, not a single column slice, etc.

I think that it does not work well now, because it processes too few requests per minute. It used to work better when the family size of the columns was not so large. It was 3 to 4 GB.

I am afraid that read performance degrades too quickly with the size of the column family.

I also tried to configure some GC files and memory, because before that I had a lot of use of GC and CPU. When the data size was smaller and the waveform was very small iowait.


How to increase the performance of Cassandra. Your suggestions will be appreciated.
+6
source share
2 answers

Look, cassandra is a relative I / O dependency. Instances of "infinity" are introduced I / O on design (Xen virtualization). And my first recommendation is to use Cassandra on real equipment, where you have control. for example, you can use an SSD drive for CommitLog. View Cassandra equipment offers .

However, switching to your own equipment is a bit radical. To stay with Amazon, try EBS

The Amazon Elastic Block Store (EBS) provides block-level storage for use with Amazon EC2 instances. Amazon EBS volumes are network-attached and persist regardless of the life of the example. Amazon EBS provides highly available, highly reliable, predictable storage volumes that can be connected to a running Amazon EC2 and exposed as a device inside an instance. Amazon EBS is particularly suitable for applications that require a database, system file, or access to raw level storage.

Amazon EBS allows you to create storage volumes from 1 GB to 1 TB that can be installed as devices with Amazon EC2 instances. You can install multiple volumes on the same instance. Amazon EBS allows you to provide a certain level of I / O performance, if necessary, by selecting the amount of allocated I / O. This allows you to predict scaling up to thousands of IOPS per Amazon EC2 instance.

Also check out Cassandra Performance Testing on EC2

0
source

Short answer: line cache and key cache.

If your data contains subsets that will be read frequently, like most systems, try using line caches and key caches.

String cascades are a memory cache that stores frequently read lines completely in memory. Please keep in mind that this may not have the desired effect if you are disseminating data.

Key caches are usually more suitable since they only store partition keys and their offsets on disk. This will usually help to skip Cassandra searches (no need to use section indexes and section summaries).

Try turning on the key cache using the keyboard and table and test your performance.

0
source

Source: https://habr.com/ru/post/922454/


All Articles