Cassandra Works indefinitely - High CPU Usage

Context

We have 6 copies of Kassandra hosted on AWS, divided into 3 different regions, 2 into regions (2 in the east, 2 in the us-west, 2 in the southeast).

2 days ago, we transferred 2 of our EC2 Cassandra instances from us-west-1 to us-east-1. When I say “move,” I mean that we wrote them off and added 2 new instances to our cluster.

We performed nodetool repairthat did nothing and nodetool rebuildthat synchronized our data from the eu-west data center. After this change, we noticed that several instances of our Cassandra cluster use more than 70% of the processor and have incoming traffic.

At first we thought it was replication, but given that we only had 500 MB of data and that it still works, we are puzzled by what is happening.


Instances:

All our instances work on m3.medium, which means that we are on:

  • 1 CPU, 2.5 GHz
  • 3.75 GB RAM
  • 4 GB SSD

We also installed the EBS volume for /var/lib/cassandra, which is actually RAID0 of 6 SSDs on EBS:

  • EBS 300 GB SSD, RAID0

Link: Amazon instance types


Software Version:

Cassandra Version: 2.0.12


Thoughts:

After analyzing our data, we thought it was due to the Cassandra data compaction.

There is another question about using stackoverflow on the same issue: Cassandra compaction tasks are stuck .

SSD (Azure Premium Storage - ) RAID0 Cassandra, , , ( RAID0 , ?).

, AWS , . , , .

, , , , , EBS / 3 .

, 300-400 EBS, RAID0, 6 , = 1,8-2,4 /. ~ 450 , PER 3 . READ.

, , , CI- , , , Gossip .


nodetool status:

Datacenter: cassandra-eu-west-1-A
=================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens  Owns   Host ID                               Rack
UN  xxx.xxx.xxx.xxx 539.5 MB   256     17.3%  12341234-1234-1234-1234-12341234123412340cd7  eu-west-1c
UN  xxx.xxx.xxx.xxx 539.8 MB   256     14.4%  30ff8d00-1ab6-4538-9c67-a49e9ad34672  eu-west-1b
Datacenter: cassandra-ap-southeast-1-A
======================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens  Owns   Host ID                               Rack
UN  xxx.xxx.xxx.xxx 585.13 MB  256     16.9%  a0c45f3f-8479-4046-b3c0-b2dd19f07b87  ap-southeast-1a
UN  xxx.xxx.xxx.xxx 588.66 MB  256     17.8%  b91c5863-e1e1-4cb6-b9c1-0f24a33b4baf  ap-southeast-1b
Datacenter: cassandra-us-east-1-A
=================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens  Owns   Host ID                               Rack
UN  xxx.xxx.xxx.xxx 545.56 MB  256     15.2%  ab049390-f5a1-49a9-bb58-b8402b0d99af  us-east-1d
UN  xxx.xxx.xxx.xxx 545.53 MB  256     18.3%  39c698ea-2793-4aa0-a28d-c286969febc4  us-east-1e

nodetool compactionstats:

pending tasks: 64
          compaction type        keyspace           table       completed           total      unit  progress
               Compaction         staging    stats_hourly       418858165      1295820033     bytes    32.32%
Active compaction remaining time :   0h00m52s

dstat :

dstat on unhealthy instance

( 300 16-):

Compaction history graph

EBS:

EBS Volume 1

EBS Volume 2

df -h:

Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1       33G   11G   21G  34% /
none            4.0K     0  4.0K   0% /sys/fs/cgroup
udev            1.9G   12K  1.9G   1% /dev
tmpfs           377M  424K  377M   1% /run
none            5.0M     0  5.0M   0% /run/lock
none            1.9G  4.0K  1.9G   1% /run/shm
none            100M     0  100M   0% /run/user
/dev/xvdb       3.9G  8.1M  3.7G   1% /mnt
/dev/md0        300G  2.5G  298G   1% /var/lib/cassandra

nodetool tpstats:

Pool Name                    Active   Pending      Completed   Blocked  All time blocked
MutationStage                     0         0        3191689         0                 0
ReadStage                         0         0         574633         0                 0
RequestResponseStage              0         0        2698972         0                 0
ReadRepairStage                   0         0           2721         0                 0
ReplicateOnWriteStage             0         0              0         0                 0
MiscStage                         0         0          62601         0                 0
HintedHandoff                     0         1            443         0                 0
FlushWriter                       0         0          88811         0                 0
MemoryMeter                       0         0           1472         0                 0
GossipStage                       0         0         979483         0                 0
CacheCleanupExecutor              0         0              0         0                 0
InternalResponseStage             0         0             25         0                 0
CompactionExecutor                1        39          99881         0                 0
ValidationExecutor                0         0          62599         0                 0
MigrationStage                    0         0             40         0                 0
commitlog_archiver                0         0              0         0                 0
AntiEntropyStage                  0         0         149095         0                 0
PendingRangeCalculator            0         0             23         0                 0
MemtablePostFlusher               0         0         173847         0                 0

Message type           Dropped
READ                         0
RANGE_SLICE                  0
_TRACE                       0
MUTATION                     0
COUNTER_MUTATION             0
BINARY                       0
REQUEST_RESPONSE             0
PAGED_RANGE                  0
READ_REPAIR                  0

iptraf, :

iptraf sorted by bytes

+4
2

- , .

, , .

, nodetool rebuild nodetool repair, , . , , .

eu-west us-east:

CPU Usage

+2

All Articles