My situation is this: I have a 20-node Hadoop / HBase cluster with 3 ZooKeepers. I process a lot of data from HBase tables to other HBase tables through MapReduce.
Now, if I create a new table and tell any task to use this table as an output receiver, all its data will go to one register server. It would not surprise me if there were only a few regions. In a certain table, I have about 450 regions, and now the problem arises: most of these regions (about 80%) are on the same regional server!
I was wondering now how HBase distributes the assignment of new regions throughout the cluster and whether this behavior is normal / desirable or error. Unfortunately, I donβt know where to start looking for an error in my code.
I ask that this makes the work incredibly slow. Only when the tasks are fully completed, the table is balanced across the cluster, but this does not explain this behavior. Should HBase redistribute new regions at the time of creation to different servers?
Thanks for entering!
source share