I need help to improve Cassandra reading performance. I am worried about the degradation of read performance as the size of the column family increases. We have the following statistics for a single node camera.
Operating System: Linux - CentOS Version 5.4 (Final)
Cassandra Version: apache-cassandra-1.1.0
Java version: "1.6.0_14" Java (TM) SE Runtime Environment (build 1.6.0_14-b08) Java HotSpot (TM) 64-bit server VM (build 14.0-b16, mixed mode)
Cassandra Configuration: (cassandra.yaml)
- rpc_server_type: hsha
- disk_access_mode: mmap
- concurrent_reads: 64
- concurrent_writes: 32
Platform: Amazon-ec2 / Rightscale m1.X is a large instance with 4 ephemeral disks with raid0. (15 GB shared memory, 4 virtual cores, 2 ECUs, shared ECU = 8)
Experiment configurations: I tried to do some experiments with GC
Cassandra config:
10 GB of RAM is allocated with a piece of Cassandra, 3500 MB is the size of the NEW heap.
JVM configuration:
JVM_OPTS = "$ JVM_OPTS -XX: + UseParNewGC"
JVM_OPTS = "$ JVM_OPTS -XX: + UseConcMarkSweepGC"
JVM_OPTS = "$ JVM_OPTS -XX: + CMSParallelRemarkEnabled"
JVM_OPTS = "$ JVM_OPTS -XX: SurvivorRatio = 1000"
JVM_OPTS = "$ JVM_OPTS -XX: MaxTenuringThreshold = 0"
JVM_OPTS = "$ JVM_OPTS -XX: CMSInitiatingOccupancyFraction = 40"
JVM_OPTS = "$ JVM_OPTS -XX: + UseCMSInitiatingOccupancyOnly -XX: + UseCompressedOops"
OpsCenter 2.0 Community Results Statistics:Read requests from 208 to 240 per second
Write Requests 18 to 28 per second
OS Boot 24.5 to 25.85
Write request delay from 127 to 160 microns
Read request delay from 82202 to 94612 microns
OS-sent network traffic 44646 KB avg per second
Received network network traffic 4338 KB avg per second
OS disk queue size from 13 to 15 requests
Read requests pending from 25 to 32
OS disk latency from 48 to 56 ms
6.6 Mbps disk throughput
Disk IOPs Read 420 Seconds Per Second
IOWait 80% CPU avg
Idle 13% CPU avg
Rowcache is disabled.
Speaker familyOne of the column family that I only read is created through the CLI
create column family XColFam with column_type='Standard' and comparator = CompositeType(BytesType,IntegerType)';"
SSTable Column Family Size = 7.10 GB, SSTable Count = 2
Column family
XColFam is numbered 59499904. of estimated row strings (most of them are utf8 literal with variable length, evaluated via mx4jtools) with columns such as thin in nature with a value of 0 bytes ..... now.
Most rows should have very few column numbers, maybe 1 to 10, so with about 20-30 bytes of the 1st component of the column name and the 2nd of 8 bytes ..... The second component of the composite speaker column can repeated, but the probability is low ....... The 1st component is repeated in varieties, but the number of columns in rows can be different.
I tried SnappyCompression squeeze the column family, but there was no change in size.
I have a scheduled service that runs for 20 threads for hours and makes random read requests for several keys (currently there are 2 keys per request) in this column family and reads full rows, not a single column slice, etc.
I think that it does not work well now, because it processes too few requests per minute. It used to work better when the family size of the columns was not so large. It was 3 to 4 GB.
I am afraid that read performance degrades too quickly with the size of the column family.
I also tried to configure some GC files and memory, because before that I had a lot of use of GC and CPU. When the data size was smaller and the waveform was very small iowait.
How to increase the performance of Cassandra. Your suggestions will be appreciated.