Failed to increase Map Map tasks?

I have a Hive managed table containing only one 150 MB file. Then I do "select count (*) from tbl" to it, and it uses 2 cartographers. I want to set it to a larger number.

At first I tried 'set mapred.max.split.size = 8388608;', so hopefully it will use 19 maps. But it uses only 3. Somehow, it still shares the input into 64 MB. I also used 'set dfs.block.size = 8388608;', also not working.

Then I tried working with a vanilla map to do the same. At first it uses 3 maps, and when I install mapred.max.split.size, it uses 19. So the problem is Hive, I suppose.

I read some of the Hive source code like CombineHiveInputFormat, ExecDriver etc. can't find clues.

What other settings can I use?

+1
source share
2 answers

I combined @javadba's answer with what I got from the Hive mailing list, here is the solution:

set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set mapred.map.tasks = 20;
select count(*) from dw_stage.st_dw_marketing_touch_pi_metrics_basic;

From the mailing list:

HIVE seems to use the old Hadoop MapReduce API, so mapred.max.split.size will not work.

I will delve into the source code later.

+4
source

Try adding the following:

set hive.merge.mapfiles=false;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
+1
source

All Articles