In general, I tend to use different columns per key.
1) Obviously, you are imposing that the client is using Avro / Thrift, which is another dependency. This dependency means that you can remove the ability of some tools, such as BI tools, that expect to find values ββin data without conversion.
2) As part of the Avro / Lean scheme, you are pretty much forced to carry all the cost over the wire. Depending on how much data in a row, this may not matter. But if you are only interested in the "city" / column -qualifier fields, you still have to receive "payments", "credit card-information", etc. It can also be a security issue.
3) Updates, if required, will be more complex with Avro / Thrift. Example: you decide to add the key 'hasIphone6'. Avro / Thrift: You will be forced to delete the line and create a new one with an added field. A new record with only a new column is added to the column layout. For a single line, not a big one, but if you do this up to a billion lines, a large summarization operation is required.
4) If configured, you can use compression in HBase, which can exceed avro / thrift serialization, as it can be compressed through a family of columns, not just a single record.
5) BigTable implementations, such as HBase, work very well with very wide, sparse tables, so there won't be as much performance improvement as you might expect.
cmonkey
source share