George, here is the presentation I gave about understanding HBase circuits from HBaseCon 2012:
http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/video-hbasecon-2012-hbasecon-2012.html
In short, each row in HBase is actually a key / value map, where you can have any number of columns (keys), each of which has a value. (And technically, each of them can have several values ββwith different time stamps).
In addition, "column families" allow you to place multiple key / value cards on the same line in different physical (disk) files. This helps optimize the situation when you have sets of values ββthat are usually accessed incoherently from other sets (therefore, you have less material to read from disk). The trade-off is that, of course, more work is needed to read all the values ββin a row if you split the columns into two column families, because it takes 2x the number of disk accesses.
Unlike the more standard "column-oriented" databases, I have never heard of anyone creating a HBase table that has a column family for each logical column. There are overheads associated with column families, and general recommendations usually have no more than 3 or 4 of them. The column family is information about the development time, that is, you must specify them at the time the table was created (or modified).
As a rule, I think that column families are an extended design option that you would use only after a deep understanding of the HBase architecture and show that it would be a net profit.
Thus, in general, although it is true that HBase can act as a "column-oriented", it is not the standard or most common design pattern in HBase. Itβs better to think of it as a string repository with key / value cards.
Ian varley
source share