Context
We have 6 copies of Kassandra hosted on AWS, divided into 3 different regions, 2 into regions (2 in the east, 2 in the us-west, 2 in the southeast).
2 days ago, we transferred 2 of our EC2 Cassandra instances from us-west-1 to us-east-1. When I say “move,” I mean that we wrote them off and added 2 new instances to our cluster.
We performed nodetool repairthat did nothing and nodetool rebuildthat synchronized our data from the eu-west data center. After this change, we noticed that several instances of our Cassandra cluster use more than 70% of the processor and have incoming traffic.
At first we thought it was replication, but given that we only had 500 MB of data and that it still works, we are puzzled by what is happening.
Instances:
All our instances work on m3.medium, which means that we are on:
- 1 CPU, 2.5 GHz
- 3.75 GB RAM
- 4 GB SSD
We also installed the EBS volume for /var/lib/cassandra, which is actually RAID0 of 6 SSDs on EBS:
Link: Amazon instance types
Software Version:
Cassandra Version: 2.0.12
Thoughts:
After analyzing our data, we thought it was due to the Cassandra data compaction.
There is another question about using stackoverflow on the same issue: Cassandra compaction tasks are stuck .
SSD (Azure Premium Storage - ) RAID0 Cassandra, , , ( RAID0 , ?).
, AWS , . , , .
, , , , , EBS / 3 .
, 300-400 EBS, RAID0, 6 , = 1,8-2,4 /. ~ 450 , PER 3 . READ.
, , , CI- , , , Gossip .
nodetool status:
Datacenter: cassandra-eu-west-1-A
=================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN xxx.xxx.xxx.xxx 539.5 MB 256 17.3% 12341234-1234-1234-1234-12341234123412340cd7 eu-west-1c
UN xxx.xxx.xxx.xxx 539.8 MB 256 14.4% 30ff8d00-1ab6-4538-9c67-a49e9ad34672 eu-west-1b
Datacenter: cassandra-ap-southeast-1-A
======================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN xxx.xxx.xxx.xxx 585.13 MB 256 16.9% a0c45f3f-8479-4046-b3c0-b2dd19f07b87 ap-southeast-1a
UN xxx.xxx.xxx.xxx 588.66 MB 256 17.8% b91c5863-e1e1-4cb6-b9c1-0f24a33b4baf ap-southeast-1b
Datacenter: cassandra-us-east-1-A
=================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN xxx.xxx.xxx.xxx 545.56 MB 256 15.2% ab049390-f5a1-49a9-bb58-b8402b0d99af us-east-1d
UN xxx.xxx.xxx.xxx 545.53 MB 256 18.3% 39c698ea-2793-4aa0-a28d-c286969febc4 us-east-1e
nodetool compactionstats:
pending tasks: 64
compaction type keyspace table completed total unit progress
Compaction staging stats_hourly 418858165 1295820033 bytes 32.32%
Active compaction remaining time : 0h00m52s
dstat :

( 300 16-):

EBS:


df -h:
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 33G 11G 21G 34% /
none 4.0K 0 4.0K 0% /sys/fs/cgroup
udev 1.9G 12K 1.9G 1% /dev
tmpfs 377M 424K 377M 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 1.9G 4.0K 1.9G 1% /run/shm
none 100M 0 100M 0% /run/user
/dev/xvdb 3.9G 8.1M 3.7G 1% /mnt
/dev/md0 300G 2.5G 298G 1% /var/lib/cassandra
nodetool tpstats:
Pool Name Active Pending Completed Blocked All time blocked
MutationStage 0 0 3191689 0 0
ReadStage 0 0 574633 0 0
RequestResponseStage 0 0 2698972 0 0
ReadRepairStage 0 0 2721 0 0
ReplicateOnWriteStage 0 0 0 0 0
MiscStage 0 0 62601 0 0
HintedHandoff 0 1 443 0 0
FlushWriter 0 0 88811 0 0
MemoryMeter 0 0 1472 0 0
GossipStage 0 0 979483 0 0
CacheCleanupExecutor 0 0 0 0 0
InternalResponseStage 0 0 25 0 0
CompactionExecutor 1 39 99881 0 0
ValidationExecutor 0 0 62599 0 0
MigrationStage 0 0 40 0 0
commitlog_archiver 0 0 0 0 0
AntiEntropyStage 0 0 149095 0 0
PendingRangeCalculator 0 0 23 0 0
MemtablePostFlusher 0 0 173847 0 0
Message type Dropped
READ 0
RANGE_SLICE 0
_TRACE 0
MUTATION 0
COUNTER_MUTATION 0
BINARY 0
REQUEST_RESPONSE 0
PAGED_RANGE 0
READ_REPAIR 0
iptraf, :
