A word of warning about HBase and other projects of this nature (I donβt know anything about CouchDB - I think this is not dB at all, just a storage of key values):
- Hbase is not configured for speed; It is configured for scalability. If speed of response is not a problem at all, run some proof of concept before taking this path.
- Hbase does not support connections. If you use ActiveRecord and have more than one relationship ... well, you can see where this is going.
The Hive project, also built on top of Hadoop, supports connections; Pig does the same (but it's not really sql). Paragraph 1 applies to both. They are for heavy data processing tasks, and not for the type of processing you are likely to do with Rails.
If you want scalability for a web application, basically the only strategy that works is to partition your data and do as much as possible to isolate partitions (no need to talk to each other). This is a bit complicated with Rails, as it is assumed by default that there is one central database. Perhaps there were improvements on this front, as I looked at the problem about a year and a half ago. If you can share your data, you can scale horizontally wide enough. A single MySQL machine can process several million rows (PostgreSQL can probably scale to more rows, but it can run a little slower).
Another strategy that works is to set up the master-slave, where all the records are performed by the master, and the reading is shared between subordinates (and possibly with the master). Obviously, this must be done quite carefully! Assuming a high read / write ratio, it can scale very well.
If your organization has deep pockets, check out what Vertica, AsterData and Greenplum have to offer.
source share