Failed to increase Map Map tasks?

Question

Failed to increase Map Map tasks?

I have a Hive managed table containing only one 150 MB file. Then I do "select count (*) from tbl" to it, and it uses 2 cartographers. I want to set it to a larger number.

At first I tried 'set mapred.max.split.size = 8388608;', so hopefully it will use 19 maps. But it uses only 3. Somehow, it still shares the input into 64 MB. I also used 'set dfs.block.size = 8388608;', also not working.

Then I tried working with a vanilla map to do the same. At first it uses 3 maps, and when I install mapred.max.split.size, it uses 19. So the problem is Hive, I suppose.

I read some of the Hive source code like CombineHiveInputFormat, ExecDriver etc. can't find clues.

What other settings can I use?

+1

hadoop hive

Ji zhang Dec 28 '13 at 16:26

source share

2 answers

Try adding the following:

set hive.merge.mapfiles=false;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;

+1

javadba Jan 2 '14 at 15:00

source share

Ji zhang · Accepted Answer · 2014-01-03T09:06:57+0000

I combined @javadba's answer with what I got from the Hive mailing list, here is the solution:

set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set mapred.map.tasks = 20;
select count(*) from dw_stage.st_dw_marketing_touch_pi_metrics_basic;

From the mailing list:

HIVE seems to use the old Hadoop MapReduce API, so mapred.max.split.size will not work.

I will delve into the source code later.

Failed to increase Map Map tasks?

More articles: