If you have a column family, are all the columns for rowkey in the same HFile? Can data from rowkey and the same column family mix in different HFiles ?. This is because I thought they were sorted, but I read in a book:
Data from the same column family for a single row does not have to be stored in the same HFile. . Why can a string be too big and doesn't fit any HFile?
The only requirement is that inside the HFile, the data for the row column family is stored together. It seems a bit controversial to me.
Note: I read a little about the topic. HBase uses the LSM tree. I have a rowkey and all the data in one HFile. Later I could add some new data, it will be stored in memory, when the memory is full, HBase will save this data in a new HFile. That way, I could have qualifiers for one line in two HFiles. If I want to perform the operation of receiving or scanning this line, I will have to search in two files. Over time, HBase will perform a large compaction, it will only create an HFile connecting the two old HFiles and removing them after compaction. So, if I want to find this line, I need only one search. I'm right?? I did not understand why a small and large compaction occurs because they seem to do the same.
hbase
Guille
source share