Cassandra - unique string constraint

I would like to know when it is possible in Kassandra to indicate a unique restriction on the string key. Something like SQL Server ADD CONSTRAINT myConstrain UNIQUE (ROW_PK)

In the case of an insert with an existing row key, the existing data will not be overwritten, but I get an exception type or an answer that the update could not be performed due to violation of the restriction.

Maybe there is a workaround for this problem - there are counters that update the seams so that they are atomic.

+11
cassandra
source share
6 answers

Kassandra - a unique constraint can be implemented using a primary key constraint. You must put all the columns as the primary key, the ones you want to be unique. Cassandra will do the rest on her own.

 CREATE TABLE users (firstname text, lastname text, age int, email text, city text, PRIMARY KEY (firstname, lastname)); 

This means that Cassandra will not insert two different rows in this users table when firstname and lastname match.

+1
source share

Easy transactions?

http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_ltwt_transaction_c.html

INSERT INTO customer_account (customerID, customer_email) VALUES ('LauraS', 'lauras@gmail.com') IF NOT EXISTS;

+12
source share

Unfortunately, no, because Cassandra does not perform any write checks. To implement something like this, Cassandra would need to read before each entry to check if the entry is allowed. This will significantly slow down the recording. (The thing is that the recordings are broadcast sequentially, without the need to make any disk images - it reads the interruption of this template and forced execution.)

I canโ€™t think about how counters will help. Counters are not performed using atomic test and dialing. Instead, they essentially store many deltas that add up when you read the counter value.

+9
source share

Today I feel good, and I will not refute all the other posters for saying that itโ€™s even remotely impossible to create a castle with only one Cassandra cluster. I just implemented the Lamportort bakery algorithm and it works just fine. No other strange things are needed, such as zoos, cages, memory tables, etc.

Instead, you can implement a blocking mechanism for multiprocessor / multiplayer computers for a poor person if you can get read and write with minimal QUORUM consistency. This is all you really need to correctly implement this algorithm. (The QUORUM level may vary depending on the type of lock you need: local, rack, full network.)

My implementation will appear in version 0.4.7 of libQtCassandra (in C ++). I already checked and it blocks perfectly. There are a few more things that I want to test, and let you define a set of parameters that are now hard-coded. But the mechanism works fine.

When I found this thread, I thought something was wrong. I searched a little more and found a page on Apache that I mention below. The page is not very advanced, but their MoinMoin does not offer a discussion page ... Anyway, I think it was worth mentioning. We hope that people will begin to implement this locking mechanism in all kinds of languages, such as PHP, Ruby, Java, etc., so that he gets used to it and knows that it works.

Source: http://wiki.apache.org/cassandra/Locking

En http://en.wikipedia.org/wiki/Lamport%27s_bakery_algorithm

The following is more or less the way I implemented my version. This is just a simplified synopsis. I may have to update it a bit more because I made some improvements when testing the resulting code (also the real code uses RAII and includes the option to timeout over TTL.) The final version will be found in the libQtCassandra library .

 // lock "object_name" void lock(QString object_name) { QString locks = context->lockTableName(); QString hosts_key = context->lockHostsKey(); QString host_name = context->lockHostName(); int host = table[locks][hosts_key][host_name]; pid_t pid = getpid(); // get the next available ticket table[locks]["entering::" + object_name][host + "/" + pid] = true; int my_ticket(0); QCassandraCells tickets(table[locks]["tickets::" + object_name]); foreach(tickets as t) { // we assume that t.name is the column name // and t.value is its value if(t.value > my_ticket) { my_ticket = t.value; } } ++my_ticket; // add 1, since we want the next ticket table[locks]["tickets::" + object_name][my_ticket + "/" + host + "/" + pid] = 1; // not entering anymore, by deleting the cell we also release the row // once all the processes are done with that object_name table[locks]["entering::" + object_name].dropCell(host + "/" + pid); // here we wait on all the other processes still entering at this // point; if entering more or less at the same time we cannot // guarantee that their ticket number will be larger, it may instead // be equal; however, anyone entering later will always have a larger // ticket number so we won't have to wait for them they will have to wait // on us instead; note that we load the list of "entering" once; // then we just check whether the column still exists; it is enough QCassandraCells entering(table[locks]["entering::" + object_name]); foreach(entering as e) { while(table[locks]["entering::" + object_name].exists(e)) { sleep(); } } // now check whether any other process was there before us, if // so sleep a bit and try again; in our case we only need to check // for the processes registered for that one lock and not all the // processes (which could be 1 million on a large system!); // like with the entering vector we really only need to read the // list of tickets once and then check when they get deleted // (unfortunately we can only do a poll on this one too...); // we exit the foreach() loop once our ticket is proved to be the // smallest or no more tickets needs to be checked; when ticket // numbers are equal, then we use our host numbers, the smaller // is picked; when host numbers are equal (two processes on the // same host fighting for the lock), then we use the processes // pid since these are unique on a system, again the smallest wins. tickets = table[locks]["tickets::" + object_name]; foreach(tickets as t) { // do we have a smaller ticket? // note: the t.host and t.pid come from the column key if(t.value > my_ticket || (t.value == my_ticket && t.host > host) || (t.value == my_ticket && t.host == host && t.pid >= pid)) { // do not wait on larger tickets, just ignore them continue; } // not smaller, wait for the ticket to go away while(table[locks]["tickets::" + object_name].exists(t.name)) { sleep(); } // that ticket was released, we may have priority now // check the next ticket } } // unlock "object_name" void unlock(QString object_name) { // release our ticket QString locks = context->lockTableName(); QString hosts_key = context->lockHostsKey(); QString host_name = context->lockHostName(); int host = table[locks][hosts_key][host_name]; pid_t pid = getpid(); table[locks]["tickets::" + object_name].dropCell(host + "/" + pid); } // sample process using the lock/unlock void SomeProcess(QString object_name) { while(true) { [...] // non-critical section... lock(object_name); // The critical section code goes here... unlock(object_name); // non-critical section... [...] } } 

IMPORTANT NOTE (2019/05/05): Although a great exercise was implemented to implement the Lamport bakery using Cassandra, it is an anti-pattern for the Cassandra database. This means that it may not work well under heavy load. Since then, I created a new locking system , still using the Lamport algorithm, but keeping all the data in memory (it is very small) and still allowing several computers to participate in the locking, so if one of them crashes, the locking system will continue work as expected (many other locking systems do not have such an opportunity. When the wizard shuts down, you lose the ability to lock until another computer decides to become a new wizard itself ...)

+5
source share

Obviously, you cannot. In cassandra, all your entries are reflected in

  • Commit log
  • Memtable

to scale millions of records and durability

If we consider your case. Before you do this, you need to

  • Check availability in Memtable
  • Check availability in all sstables [If your key is reset with Memtable]

In case 2, all though, cassandra implemented flowering filters, it would be overhead. Each record will read and write.

But your request can reduce merge overhead in cassandra, because at any moment the key will be in only one sstable. But for this, you need to change the architecture of cassandra.

Jus check out this video http://blip.tv/datastax/counters-in-cassandra-5497678 or download this presentation http://www.datastax.com/wp-content/uploads/2011/07/cassandra_sf_counters.pdf to see how the counters came to the existence of cassandra.

+3
source share
+2
source share

Source: https://habr.com/ru/post/651184/


All Articles