hadoop - Why does YARN job not transition to RUNNING state? -
i've got number of samza jobs want run. can first run ok. however, second job seems sit @ accepted state , never transitions running state until kill first job.
here view yarn ui:
here details second job, can see no node has been allocated:
i have 2 datanodes, should able run multiple jobs. here relevant section of yarn-site.xml
(the other config have in file ha config, zookeeper etc):
<property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>128</value> <description>minimum limit of memory allocate each container request @ resource manager.</description> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>2048</value> <description>maximum limit of memory allocate each container request @ resource manager.</description> </property> <property> <name>yarn.scheduler.minimum-allocation-vcores</name> <value>1</value> <description>the minimum allocation every container request @ rm, in terms of virtual cpu cores. requests lower won't take effect, , specified value allocated minimum.</description> </property> <property> <name>yarn.scheduler.maximum-allocation-vcores</name> <value>2</value> <description>the maximum allocation every container request @ rm, in terms of virtual cpu cores. requests higher won't take effect, , capped value.</description> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>4096</value> <description>physical memory, in mb, made available running containers</description> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>4</value> <description>number of cpu cores can allocated containers.</description> </property>
edit:
i can see in resource manager logs:
2015-11-01 17:47:37,151 info org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.leafqueue: assignedcontainer application attempt=appattempt_1446300861747_0018_000001 container=container: [containerid: container_1446300861747_0018_01_000002, nodeid: yarndata-01:41274, nodehttpaddress: yarndata-01:8042, resource: <memory:1024, vcores:1>, priority: 0, token: null, ] queue=default: capacity=1.0, absolutecapacity=1.0, usedresources=<memory:1024, vcores:1>, usedcapacity=0.125, absoluteusedcapacity=0.125, numapps=1, numcontainers=1 clusterresource=<memory:8192, vcores:8> 2015-11-01 17:47:37,151 info org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.parentqueue: re-sorting assigned queue: root.default stats: default: capacity=1.0, absolutecapacity=1.0, usedresources=<memory:2048, vcores:2>, usedcapacity=0.25, absoluteusedcapacity=0.25, numapps=1, numcontainers=2 2015-11-01 17:47:37,151 info org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.parentqueue: assignedcontainer queue=root usedcapacity=0.25 absoluteusedcapacity=0.25 used=<memory:2048, vcores:2> cluster=<memory:8192, vcores:8> 2015-11-01 17:47:37,658 info org.apache.hadoop.yarn.server.resourcemanager.security.nmtokensecretmanagerinrm: sending nmtoken nodeid : yarndata-01:41274 container : container_1446300861747_0018_01_000002 2015-11-01 17:47:37,659 info org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.rmcontainerimpl: container_1446300861747_0018_01_000002 container transitioned allocated acquired 2015-11-01 17:47:39,154 info org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.rmcontainerimpl: container_1446300861747_0018_01_000002 container transitioned acquired running 2015-11-01 17:48:03,821 info org.apache.hadoop.yarn.server.resourcemanager.clientrmservice: allocated new applicationid: 19 2015-11-01 17:48:04,339 warn org.apache.hadoop.yarn.server.resourcemanager.rmapp.rmappimpl: specific max attempts: 0 application: 19 invalid, because out of range [1, 2]. use global max attempts instead. 2015-11-01 17:48:04,339 info org.apache.hadoop.yarn.server.resourcemanager.clientrmservice: application id 19 submitted user www-data 2015-11-01 17:48:04,339 info org.apache.hadoop.yarn.server.resourcemanager.rmauditlogger: user=www-data ip=192.168.2.81 operation=submit application request target=clientrmservice result=success appid=application_1446300861747_0019 2015-11-01 17:48:04,340 info org.apache.hadoop.yarn.server.resourcemanager.rmapp.rmappimpl: storing application id application_1446300861747_0019 2015-11-01 17:48:04,340 info org.apache.hadoop.yarn.server.resourcemanager.rmapp.rmappimpl: application_1446300861747_0019 state change new new_saving 2015-11-01 17:48:04,340 info org.apache.hadoop.yarn.server.resourcemanager.recovery.rmstatestore: storing info app: application_1446300861747_0019 2015-11-01 17:48:04,342 info org.apache.hadoop.yarn.server.resourcemanager.rmapp.rmappimpl: application_1446300861747_0019 state change new_saving submitted 2015-11-01 17:48:04,342 info org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.parentqueue: application added - appid: application_1446300861747_0019 user: www-data leaf-queue of parent: root #applications: 2 2015-11-01 17:48:04,342 info org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.capacityscheduler: accepted application application_1446300861747_0019 user: www-data, in queue: default 2015-11-01 17:48:04,343 info org.apache.hadoop.yarn.server.resourcemanager.rmapp.rmappimpl: application_1446300861747_0019 state change submitted accepted 2015-11-01 17:48:04,343 info org.apache.hadoop.yarn.server.resourcemanager.applicationmasterservice: registering app attempt : appattempt_1446300861747_0019_000001 2015-11-01 17:48:04,343 info org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.rmappattemptimpl: appattempt_1446300861747_0019_000001 state change new submitted 2015-11-01 17:48:04,343 info org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.leafqueue: not starting application amifstarted exceeds amlimit 2015-11-01 17:48:04,343 info org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.leafqueue: application added - appid: application_1446300861747_0019 user: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.leafqueue$user@202c5cd5, leaf-queue: default #user-pending-applications: 1 #user-active-applications: 1 #queue-pending-applications: 1 #queue-active-applications: 1
what not doing correctly please?
the answer lay in fact resource manager saying there not enough resource create new samza container plus application master.
i changed value of yarn.scheduler.capacity.maximum-am-resource-percent
within capacity-scheduler.xml
more default of 0.1.
the documentation parameter states:
maximum percent of resources in cluster can used run application masters i.e. controls number of concurrent running applications.
Comments
Post a Comment