HBase MemStore and Garbage Collection

Question

HBase MemStore and Garbage Collection

I am new to HBase, but I have setup and some knowledge of HBase and Hadoop.

As I studied HBase MemStore, and all I understood about MemStore is that "MemStore is the place in memory where HBase places the data to write or read." So why, when and where do we want to read about memstore, we also see a discussion of the garbage collection.

My question now is that the purpose of memstore is to store readable and writable data in memory? And can we adjust the size for this memory to get a quick response from hbase? does the garbage collection configuration (collector configuration) affect memstore? In my opinion, it should be yes. :)

+4

memory-management hbase hadoop

khan May 15, '12 at 8:17

source share

2 answers

The problem is that Java as a technology has problems processing the server, which creates and deletes many objects and at the same time must respond to all requests in a timely manner. The main reason is the garbage collector, which sometimes has to do the so-called “stop the world” and clear the memory. In large heaps, this can cause a delay of several seconds.
Now let's see why this is happening with HBase and why it should respond in a timely manner.
Memstore is a region data cache. If the data is very volatile, many objects are created / deleted. As a result, many GC (garbage collector) collectors appear.
HBase, since any real-time system working with large data sets usually caches as much as possible, and its MemStores are large.
HBase area servers should contact ZooKeeper in a timely manner to report that they are alive and to avoid migration. Long gc pacuse can prevent this.
What cloudera did - implemented its own memory management mechanism specifically for MemStore to avoid GC pauses. Larse in his book describes how to configure GC to improve the work with the Region server.
http://books.google.co.il/books?id=Ytbs4fLHDakC&pg=PA419&lpg=PA419&dq=MemStore+garbage+collector+HBASE&source=bl&ots=b-Sk-HV22E&sig=tFddqrJtlE_nIUI3VDMEyHdgx6o&hl=iw&sa=X&ei=79CyT82BIM_48QO_26ykCQ&ved=0CHUQ6AEwCQ#v= onepage & q = MemStore% 20garbage% 20collector% 20HBASE & f = false

0

David gruzman May 16 '12 at 5:35

source share

Avkashchauhan · Accepted Answer · 2012-05-16T05:35:24+0000

You are right in the Hbase Memstore. In general, when something is written in HBase, it is first written to the storage in memory (memstore), once this memstore reaches a certain size *, it is flushed to disk in the storage file (everything is also written directly to the log file for durability).

* From a global point of view, HBase uses 40% of the heap by default (see the hbase.regionserver.global.memstore.upperLimit property) for all memstores in all regions of all column families of all tables. If this limit is reached, it will start flushing some memoirs until the memory used by the memoirs is at least 35% of the heap (lowerLimit property). This is customizable, but you need to have the perfect calculation to have this change.

Yes, the GC has an effect on memstore, and you can really change this behavior using the Memstore-local distribution buffer. I would advise you to read an article from 3 articles, “Preventing Full GC in HBase Using MemStore-Local Allocation Buffers,” as shown below: http://www.cloudera.com/blog/2011/02/avoiding-full- gcs-in-hbase-with-memstore-local-allocation-buffers-part-1 /

HBase MemStore and Garbage Collection

More articles: