How to schedule Hadoop Map tasks in an 8-node multi-core cluster?

Question

How to schedule Hadoop Map tasks in an 8-node multi-core cluster?

I have a "card only" (without phase reduction). The input file size is large enough to create 7 map tasks, and I confirmed this by looking at the result (part-000 on part006). Now my cluster has 8 nodes with 8 cores and 8 GB of memory and a shared file system located on the node head.

My question is: can I choose between starting all 7 map tasks in only 1 node or running 7 map tasks in 7 different sub-nodes (1 task per node). If I can do this, then what changes in my code and configuration file are needed.

I tried to set the parameter "mapred.tasktracker.map.tasks.maximum" to only 1 and 7 in my code, but I did not find a noticeable time difference. In my configuration file it is set to 1.

+5

mapreduce hadoop cloudera

justin waugh Apr 29 '12 at 15:47

source share

3 answers

Chaos · Answer 1 · 2012-04-29T16:10:38+0000

"mapred.tasktracker.map.tasks.maximum"refers to the number of map tasks that should be run on each node, and not the number of nodes that will be used for each map task. In the Hadoop architecture, for each node (slave) and 1 job tracker, there is 1 tasktracker per master node (master). Therefore, if you set the property mapred.tasktracker.map.tasks.maximum, it will only change the number of map tasks that will be performed per node. The range "mapred.tasktracker.map.tasks.maximum"is from 1/2*cores/nodeto2*cores/node

The number of map tasks you want as a whole should be set using setNumMapTasks(int)

Chris White · Answer 2 · 2012-04-29T18:51:43+0000

8 8 8 , node.

, node, , HDFS NFS, node? , HDFS, NFS - , HDFS ( , )

, , (, ..), ?

. 8x8 , .

Jeff Wu · Answer 3 · 2012-04-29T19:10:01+0000

7 7 , . MapReduce - , . 7 node, (, , IO) node.

mapred.tasktracker.map.tasks.maximum - , 8.

, , . - "" , , , DFS . .

. , , , .

How to schedule Hadoop Map tasks in an 8-node multi-core cluster?

More articles: