Increase the number of Hive drives in Hadoop 2

I created an HBase table from Hive, and I'm trying to make simple aggregation on it. This is my catch request:

from my_hbase_table 
select col1, count(1) 
group by col1;

Reducing the work on the map generates only 2 cartographers, and I would like to increase this. With the simple job of shrinking a map, I would configure yarn and a memory card to increase the number of cartographers. I tried the following in Hive, but this did not work:

set yarn.nodemanager.resource.cpu-vcores=16;
set yarn.nodemanager.resource.memory-mb=32768;
set mapreduce.map.cpu.vcores=1;
set mapreduce.map.memory.mb=2048;

NOTE:

  • My test cluster has only 2 nodes
  • HBase table has more than 5 M records
  • Recruitment Logs Shows HiveInputFormat and Number of Sections = 2
+4
source share
3 answers

, . Spiting . , .

. .

set hive.merge.mapfiles=false;

set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;

set mapred.map.tasks = XX;

,

set mapred.reduce.tasks = XX;

, Hadoop 2 (YARN) mapred.map.tasks mapred.reduce.tasks :

mapred.map.tasks     -->    mapreduce.job.maps
mapred.reduce.tasks  -->    mapreduce.job.reduces

, ,

http://answers.mapr.com/questions/5336/limit-mappers-and-reducers-for-specific-job.html

Map Hive?

, InputFormat, MapReduce. InputFormat .

, HDFS 64 ( ), 100 2 , 2

, 2 30 ( ), , assigend .

, Hive CombineHiveInputFormat . MapReduce, CombineFileInputFormat, , node, .

mapred.max.split.size
or 
mapreduce.input.fileinputformat.split.maxsize ( in yarn/MR2);

, ( ), .

.

, Hadoop?

.

+14

. .

SET mapreduce.input.fileinputformat.split.maxsize;

+3

Partitioning the HBase table should make your work use more cards automatically.

Since you have 2 splits, each split is read by one cartographer. Click to enlarge. splits.

+1
source

All Articles