AppMaster yarn request for broken containers

Question

AppMaster yarn request for broken containers

I am running a local yarn cluster with 8 vCores and 8Gb shared memory.

The workflow is as follows:

YarnClient sends an application request that launches AppMaster in the container.
AppMaster start, creates amRMClient and nmClient, registers with RM, and then creates 4 container requests for workflows through amRMClient.addContainerRequest

Although there are enough resources, containers are not allocated (the onContainersAllocated callback function is never called). I tried to check the nodemanager and resourcemanager logs and I do not see any line related to container requests. I kept a close eye on apache docs and couldn't figure out what I was doing wrong.

For reference: AppMaster code:

@Override public void run() { Map<String, String> envs = System.getenv(); String containerIdString = envs.get(ApplicationConstants.Environment.CONTAINER_ID.toString()); if (containerIdString == null) { // container id should always be set in the env by the framework throw new IllegalArgumentException("ContainerId not set in the environment"); } ContainerId containerId = ConverterUtils.toContainerId(containerIdString); ApplicationAttemptId appAttemptID = containerId.getApplicationAttemptId(); LOG.info("Starting AppMaster Client..."); YarnAMRMCallbackHandler amHandler = new YarnAMRMCallbackHandler(allocatedYarnContainers); // TODO: get heart-beet interval from config instead of 100 default value amClient = AMRMClientAsync.createAMRMClientAsync(1000, this); amClient.init(config); amClient.start(); LOG.info("Starting AppMaster Client OK"); //YarnNMCallbackHandler nmHandler = new YarnNMCallbackHandler(); containerManager = NMClient.createNMClient(); containerManager.init(config); containerManager.start(); // Get port, ulr information. TODO: get tracking url String appMasterHostname = NetUtils.getHostname(); String appMasterTrackingUrl = "/progress"; // Register self with ResourceManager. This will start heart-beating to the RM RegisterApplicationMasterResponse response = null; LOG.info("Register AppMaster on: " + appMasterHostname + "..."); try { response = amClient.registerApplicationMaster(appMasterHostname, 0, appMasterTrackingUrl); } catch (YarnException | IOException e) { // TODO Auto-generated catch block e.printStackTrace(); return; } LOG.info("Register AppMaster OK"); // Dump out information about cluster capability as seen by the resource manager int maxMem = response.getMaximumResourceCapability().getMemory(); LOG.info("Max mem capabililty of resources in this cluster " + maxMem); int maxVCores = response.getMaximumResourceCapability().getVirtualCores(); LOG.info("Max vcores capabililty of resources in this cluster " + maxVCores); containerMemory = Integer.parseInt(config.get(YarnConfig.YARN_CONTAINER_MEMORY_MB)); containerCores = Integer.parseInt(config.get(YarnConfig.YARN_CONTAINER_CPU_CORES)); // A resource ask cannot exceed the max. if (containerMemory > maxMem) { LOG.info("Container memory specified above max threshold of cluster." + " Using max value." + ", specified=" + containerMemory + ", max=" + maxMem); containerMemory = maxMem; } if (containerCores > maxVCores) { LOG.info("Container virtual cores specified above max threshold of cluster." + " Using max value." + ", specified=" + containerCores + ", max=" + maxVCores); containerCores = maxVCores; } List<Container> previousAMRunningContainers = response.getContainersFromPreviousAttempts(); LOG.info("Received " + previousAMRunningContainers.size() + " previous AM running containers on AM registration."); for (int i = 0; i < 4; ++i) { ContainerRequest containerAsk = setupContainerAskForRM(); amClient.addContainerRequest(containerAsk); // NOTHING HAPPENS HERE... LOG.info("Available resources: " + amClient.getAvailableResources().toString()); } while(completedYarnContainers != 4) { try { Thread.sleep(1000); } catch (InterruptedException e) { e.printStackTrace(); } } LOG.info("Done with allocation!"); } @Override public void onContainersAllocated(List<Container> containers) { LOG.info("Got response from RM for container ask, allocatedCnt=" + containers.size()); for (Container container : containers) { LOG.info("Allocated yarn container with id: {}" + container.getId()); allocatedYarnContainers.push(container); // TODO: Launch the container in a thread } } @Override public void onError(Throwable error) { LOG.error(error.getMessage()); } @Override public float getProgress() { return (float) completedYarnContainers / allocatedYarnContainers.size(); }

Here's the output from jps:

 14594 NameNode 15269 DataNode 17975 Jps 14666 ResourceManager 14702 NodeManager

And here is the AppMaster log for initialization and 4 container requests:

 23:47:09 YarnAppMaster - Starting AppMaster Client OK 23:47:09 YarnAppMaster - Register AppMaster on: andrei-mbp.local/192.168.1.4... 23:47:09 YarnAppMaster - Register AppMaster OK 23:47:09 YarnAppMaster - Max mem capabililty of resources in this cluster 2048 23:47:09 YarnAppMaster - Max vcores capabililty of resources in this cluster 2 23:47:09 YarnAppMaster - Received 0 previous AM running containers on AM registration. 23:47:11 YarnAppMaster - Requested container ask: Capability[<memory:512, vCores:1>]Priority[0] 23:47:11 YarnAppMaster - Available resources: <memory:7680, vCores:0> 23:47:11 YarnAppMaster - Requested container ask: Capability[<memory:512, vCores:1>]Priority[0] 23:47:11 YarnAppMaster - Available resources: <memory:7680, vCores:0> 23:47:11 YarnAppMaster - Requested container ask: Capability[<memory:512, vCores:1>]Priority[0] 23:47:11 YarnAppMaster - Available resources: <memory:7680, vCores:0> 23:47:11 YarnAppMaster - Requested container ask: Capability[<memory:512, vCores:1>]Priority[0] 23:47:11 YarnAppMaster - Available resources: <memory:7680, vCores:0> 23:47:11 YarnAppMaster - Progress indicator should not be negative

Thanks in advance.

+5

java hadoop yarn distributed

Andrei Antonescu Apr 16 '15 at 7:30

source share

2 answers

Thanks to Alexandre Fonseca for getProgress () returning NaN to divide by zero when it called before the first allocation, which makes the ResourceManager immediately exit with an exception.

AppMaster yarn request for broken containers

More articles: