I made a recovery command for nodetool on one node. This node went down and the following error message appeared in the log files:
INFO [STREAM-IN-/192.168.2.100] 2015-02-13 21:36:23,077 StreamResultFuture.java:180 - [Stream
INFO [STREAM-IN-/192.168.2.100] 2015-02-13 21:36:23,078 StreamResultFuture.java:212 - [Stream
INFO [STREAM-IN-/192.168.2.100] 2015-02-13 21:36:23,078 StreamingRepairTask.java:96 - [repair
INFO [AntiEntropyStage:1] 2015-02-13 21:38:52,795 RepairSession.java:237 - [repair
INFO [AntiEntropySessions:27] 2015-02-13 21:38:52,795 RepairSession.java:299 - [repair
INFO [AntiEntropySessions:27] 2015-02-13 21:38:52,795 RepairSession.java:260 - [repair
INFO [AntiEntropySessions:27] 2015-02-13 21:38:52,795 RepairJob.java:145 - [repair
WARN [StreamReceiveTask:74] 2015-02-13 21:41:58,544 CLibrary.java:231 - open(/user/jlor/apache-cassandra/data/data/data/repcode-398f26f0b11511e49faf195596ed1fd9, O_RDONLY) failed, errno (23).
WARN [STREAM-IN-/192.168.2.101] 2015-02-13 21:41:58,672 CLibrary.java:231 - open(/user/jlor/apache-cassandra/data/data/data/repcode-398f26f0b11511e49faf195596ed1fd9, O_RDONLY) failed, errno (23).
WARN [STREAM-IN-/192.168.2.101] 2015-02-13 21:41:58,871 CLibrary.java:231 - open(/user/jlor/apache-cassandra/data/data/data/repcode-398f26f0b11511e49faf195596ed1fd9, O_RDONLY) failed, errno (23).
ERROR [StreamReceiveTask:74] 2015-02-13 21:41:58,986 CassandraDaemon.java:153 - Exception in thread Thread[StreamReceiveTask:74,5,main]
org.apache.cassandra.io.FSWriteError: java.io.FileNotFoundException: /user/jlor/apache-cassandra/data/data/data/repcode-398f26f0b11511e49faf195596ed1fd9/data-repcode-tmp-ka-245139-TOC.txt (Too many open files in system)
at org.apache.cassandra.io.sstable.SSTable.appendTOC(SSTable.java:282) ~[apache-cassandra-2.1.2.jar:2.1.2]
at org.apache.cassandra.io.sstable.SSTableWriter.close(SSTableWriter.java:483) ~[apache-cassandra-2.1.2.jar:2.1.2]
at org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:434) ~[apache-cassandra-2.1.2.jar:2.1.2]
at org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:429) ~[apache-cassandra-2.1.2.jar:2.1.2]
at org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:424) ~[apache-cassandra-2.1.2.jar:2.1.2]
at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:120) ~[apache-cassandra-2.1.2.jar:2.1.2]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_31]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_31]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_31]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_31]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_31]
Caused by: java.io.FileNotFoundException: /usr/jlo/apache-cassandra/data/data/data/repcode-398f26f0b11511e49faf195596ed1fd9/data-repcode-tmp-ka-245139-TOC.txt (Too many open files in system)
at java.io.FileOutputStream.open(Native Method) ~[na:1.8.0_31]
at java.io.FileOutputStream.<init>(FileOutputStream.java:213) ~[na:1.8.0_31]
at java.io.FileWriter.<init>(FileWriter.java:107) ~[na:1.8.0_31]
at org.apache.cassandra.io.sstable.SSTable.appendTOC(SSTable.java:276) ~[apache-cassandra-2.1.2.jar:2.1.2]
... 10 common frames omitted
We have a small cluster with 5 nodes: node0-node4. I have one table with 3.4 billion rows, with replica 3. Here is a description of the table:
CREATE TABLE data.repcode (
rep int,
type text,
code text,
yyyymm int,
trd int,
eq map<text, bigint>,
iq map<text, bigint>,
PRIMARY KEY ((rep, type, pcode), yyyymm, trd))
WITH CLUSTERING ORDER BY (yyyymm ASC, co_trd ASC, md5 ASC)
AND bloom_filter_fp_chance = 0.1
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy', 'max_threshold': '32'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
I am using Cassandra 2.1.2. I set the maximum open files limit for all my nodes to 200,000.
Before I issued the nodetool recovery command, I counted the number of files in my data directories. Here is the score on each of my nodes before the crash:
node0: 27'099
node1: 27'187
node2: 36'131
node3: 26'635
node4: 26'371
Now after the crash:
node0: 946'555
node1: 973'531
node2: 844'211
node3: 1'024'147
node4: 1'971'772
, unix ?
, ?
? ? .
?
?
.