Cassandra bulk solution insert

I have a java program running as a service, this program should insert 50 thousand rows / s (1 row of 25 columns) into the cassandra cluster.

My cluster contains 3 nodes, 1 node has 4 processor cores (2.4 GHz i5 core), 4 GB of RAM.

I used Hector api, a multi-threaded, bulk insert, but the performance is too low, as expected (about 25 thousand lines / s).

Someone has another solution for this. Cassandra supports the inner inner insert (without using Thrift).

+4
source share
3 answers

Astyanax is a high-level Java client for Apache Cassandra. Apache Cassandra is a highly accessible column-oriented database. Astyanax is currently used by Netflix. Problems are usually fixed as quickly as they are allowed and often issued.

https://github.com/Netflix/astyanax

+1
source

I was lucky to create sstables and download them directly. There is a sstableloader tool included in the distribution, as well as a JMX interface. You can create sstables using the SSTableSimpleUnsortedWriter class.

Details here .

+1
source

The fastest way to insert data into Cassandra is the sstableloader utility provided by Cassandra in 0.8 or more. To do this, you first need to create sstables, which is possible using SSTableSimpleUnsortedWriter . This is described here.

Another quick way is Cassandras BulkoutputFormat for hadoop. With this, we can write a Hadoop task to load data into cassandra. See more bulkload to cassandra with hadoo

+1
source

Source: https://habr.com/ru/post/1416313/


All Articles