How to schedule Hadoop Map tasks in an 8-node multi-core cluster?

I have a "card only" (without phase reduction). The input file size is large enough to create 7 map tasks, and I confirmed this by looking at the result (part-000 on part006). Now my cluster has 8 nodes with 8 cores and 8 GB of memory and a shared file system located on the node head.

My question is: can I choose between starting all 7 map tasks in only 1 node or running 7 map tasks in 7 different sub-nodes (1 task per node). If I can do this, then what changes in my code and configuration file are needed.

I tried to set the parameter "mapred.tasktracker.map.tasks.maximum" to only 1 and 7 in my code, but I did not find a noticeable time difference. In my configuration file it is set to 1.

+5
source share
3 answers

"mapred.tasktracker.map.tasks.maximum"refers to the number of map tasks that should be run on each node, and not the number of nodes that will be used for each map task. In the Hadoop architecture, for each node (slave) and 1 job tracker, there is 1 tasktracker per master node (master). Therefore, if you set the property mapred.tasktracker.map.tasks.maximum, it will only change the number of map tasks that will be performed per node. The range "mapred.tasktracker.map.tasks.maximum"is from 1/2*cores/nodeto2*cores/node

The number of map tasks you want as a whole should be set using setNumMapTasks(int)

+4

8 8 8 , node.

, node, , HDFS NFS, node? , HDFS, NFS - , HDFS ( , )

, , (, ..), ?

. 8x8 , .

+1

7 7 , . MapReduce - , . 7 node, (, , IO) node.

mapred.tasktracker.map.tasks.maximum - , 8.

, , . - "" , , , DFS . .

. , , , .

+1

All Articles