I have a Hive managed table containing only one 150 MB file. Then I do "select count (*) from tbl" to it, and it uses 2 cartographers. I want to set it to a larger number.
At first I tried 'set mapred.max.split.size = 8388608;', so hopefully it will use 19 maps. But it uses only 3. Somehow, it still shares the input into 64 MB. I also used 'set dfs.block.size = 8388608;', also not working.
Then I tried working with a vanilla map to do the same. At first it uses 3 maps, and when I install mapred.max.split.size, it uses 19. So the problem is Hive, I suppose.
I read some of the Hive source code like CombineHiveInputFormat, ExecDriver etc. can't find clues.
What other settings can I use?
source
share