Cassandra load balancing with an ordered delimiter?

Question

Cassandra load balancing with an ordered delimiter?

So, I see here that Cassandra does not have automatic load balancing that occurs when using an ordered delimiter (a certain general range of row group values will be stored on a relatively small number of machines, which will then serve most requests).
What is the best practice in developing a Cassandra data model?

I am still new to Cassandra and how it works. how could this problem be avoided so that range queries are still possible? I really have not received the above answers (related url) about adding a hash to the keys.

+6

cassandra

deepblue Nov 20 '09 at 1:35

source share

3 answers

jbellis · Answer 1 · 2009-12-17 15:02

As mentioned in another post, Cassandra 0.5 supports semi-automatic load balancing, where all you have to do is tell node about balancing and it will automatically move to a more occupied place on the marker ring.

This is described at http://wiki.apache.org/cassandra/Operations

MarkR · Answer 2 · 2009-11-20 12:18

I think this problem is best used on the cassandra-user mailing list; that is, people.

Cassandra does not have automatic load balancing, but this could happen in the near future. Now it may be possible to branch 0.5.

Essentially, when you load a node into an already running system, it should find a place in the ring that will best load the balance and place it there. If you add nodes one at a time (i.e., wait until one node finishes loading before adding another), this should work very well, provided that the distribution of keys does not change too much over time.

However, your keys may change over time (especially if they are time-based), so you may need a workaround.

It depends on what you want to scan by range. If you only need to scan the PART key, you can use the hash bit that you do not want to use for scanning and use it as the first part of the key.

I will use the term "section" here to refer to that part of the key that you do not want to display.

function makeWholeKey(partition, key) { return concat(make_hash(partition), partition, key); }

Now, if you want to vary the scan of keys within a given section, you can vary the scan between makeWholeKey (p, start) and makeWholeKey (p, end)

But if you want to scan partitions, you're out of luck.

But you can make your nodes tokens that are evenly distributed over the entire range of make_hash () output, and you will get evenly distributed data (provided that you have ENOUGH sections that not all combine on one or two hash values)

akshat thakar · Answer 3 · 2014-02-20 08:24

Cluster data sharing is controlled by the partitioner parameter in cassandra.yaml :

 partitioner: org.apache.cassandra.dht.Murmur3Partitioner

Using Murmur3Partitioner will generate a random hash code for the Row Key and perform load balancing.

With Cassandra 2.0, you can store multiple tokens (256) on one server, which will also help with load balancing. Using OrderPreservingPartitioner not recommended and is not recommended.

Cassandra load balancing with an ordered delimiter?

More articles: