I think this problem is best used on the cassandra-user mailing list; that is, people.
Cassandra does not have automatic load balancing, but this could happen in the near future. Now it may be possible to branch 0.5.
Essentially, when you load a node into an already running system, it should find a place in the ring that will best load the balance and place it there. If you add nodes one at a time (i.e., wait until one node finishes loading before adding another), this should work very well, provided that the distribution of keys does not change too much over time.
However, your keys may change over time (especially if they are time-based), so you may need a workaround.
It depends on what you want to scan by range. If you only need to scan the PART key, you can use the hash bit that you do not want to use for scanning and use it as the first part of the key.
I will use the term "section" here to refer to that part of the key that you do not want to display.
function makeWholeKey(partition, key) { return concat(make_hash(partition), partition, key); }
Now, if you want to vary the scan of keys within a given section, you can vary the scan between makeWholeKey (p, start) and makeWholeKey (p, end)
But if you want to scan partitions, you're out of luck.
But you can make your nodes tokens that are evenly distributed over the entire range of make_hash () output, and you will get evenly distributed data (provided that you have ENOUGH sections that not all combine on one or two hash values)
MarkR Nov 20 '09 at 12:18 2009-11-20 12:18
source share