Using Cassandra and CQL3, how do you insert an entire wide row into a single query?

I want to insert a single row with 50,000 columns in Cassandra 1.2.8. Before insertion, I have all the data for the entire line ready for work (in memory):

+---------+------+------+------+------+-------+ | | 0 | 1 | 2 | ... | 49999 | | row_id +------+------+------+------+-------+ | | text | text | text | ... | text | +---------+------+------+------|------+-------+ 

Column names are integers that can be cut into pagination. Column values ​​are the value in this particular index.

CQL3 table definition:

 create table results ( row_id text, index int, value text, primary key (row_id, index) ) with compact storage; 

Since I already have a pair of numbers_and_and_i and 50,000 names / values ​​in memory, I just want to insert one line in Cassandra in one request / operation so that it is as fast as possible.

The only thing I can find is to execute the following 50,000 times:

 INSERT INTO results (row_id, index, value) values (my_row_id, ?, ?); 

first one ? is the index counter ( i ), and the second ? - The text value to store at i .

It takes a lot of time. Even when we put the above INSERT in the package, it takes a lot of time.

We have all the data we need (a full line) as a whole, I would suggest that it is very simple to say "here, Cassandra, store this data as one line in one request", for example:

 //EXAMPLE-BUT-INVALID CQL3 SYNTAX: insert into results (row_id, (index,value)) values ((0,text0), (1,text1), (2,text2), ..., (N,textN)); 

This example is not possible using the current CQL3 syntax, but I hope that it illustrates the desired effect: everything will be inserted as a single query.

Is it possible to do this in CQL3 and Driver DataStax Java? If not, I believe that instead I will be forced to use the Hector or the Astyanax driver and the Thrift batch_insert operation?

+6
cassandra cql3 datastax-java-driver
source share
4 answers

Edit: Only 4 days after I posted this question regarding Cassandra 1.2.9, the release of Cassandra 2.0 was released. 2.0 supports batch prepared statements, which should be much faster than the unarmored CQL3 that should have been used for C * 2.0. We have not tested this to be sure.

When this question was sent 4 days ago on August 30, 2013, CQL3 did not have version 2.0 for the C * version. This was only possible through the Thrift client, for example. Astyanax MutationBatch .

As suggested by Alex, I created CASSANDRA-5959 as a function request, but was marked as a duplicate of CASSANDRA-4693 , which supposedly solved the problem for C * 2.0.

+3
source share

Multiple INSERT / UPDATEs can be performed using the batch_mutate method in the Thrift API using mutation multi-tables.

 Map<byte[], Map<String, List<Mutation>>> mutationMap = new HashMap<byte[], Map<String, List<Mutation>>>(); List<Mutation> mutationList = new ArrayList<Mutation>(); mutationList.add(mutation); Map<String, List<Mutation>> m = new HashMap<String, List<Mutation>>(); m.put(columnFamily, mutationList); mutationMap.put(key, m); client.batch_mutate(mutationMap, ConsistencyLevel.ALL); 
+3
source share
  • The CQL3 INSERT does not support multiple tuples of values. But I think this can make an interesting addition to CQL, so send a function request .

  • The Java DataStax driver is based on CQL, so it can do anything if the instruction is not supported.

  • Currently, if you need it, the best option would be to use a Trrift-based library (nb: I am not very familiar with the Thrift-based API to confirm that this insertion will be possible, but I think it should)

+2
source share

Use the Batch operator in CQL3 if you want to make multiple attachments.

With C * 2.0, this will be even simpler and faster since they will include a prepared statement in batch mode

0
source share

All Articles