2014-02-17 5 views
2

Я настроил hasoop на кластере с двумя узлами. Первый узел "NameNode" выполняет следующие демоны:Простой тест YARN TestDFSIO не работает

[email protected]:~$ jps 
2916 SecondaryNameNode 
2692 NameNode 
3159 NodeManager 
5834 Jps 
2771 DataNode 
3076 ResourceManager 

секунд узел "DataNode" выполняет следующие демоны:

[email protected]:~$ jps 
2559 Jps 
2087 DataNode 
2198 NodeManager 

В /etc/hosts файл я добавил на обеих машинах:

10.240.40.246 namenode 
10.240.172.201 datanode 

, которые являются соответствующими ips, и я проверяю, могу ли я ssh на любую другую машину с каждой машины. Теперь, я хотел проверить мою установку Hadoop, выполнив образец карты уменьшить базовую работу:

[email protected]:~$ hadoop jar /opt/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -write -nrFiles 20 -fileSize 10 

Однако эта работа не удается:

14/02/17 22:22:53 INFO fs.TestDFSIO: TestDFSIO.1.7 
14/02/17 22:22:53 INFO fs.TestDFSIO: nrFiles = 20 
14/02/17 22:22:53 INFO fs.TestDFSIO: nrBytes (MB) = 10.0 
14/02/17 22:22:53 INFO fs.TestDFSIO: bufferSize = 1000000 
14/02/17 22:22:53 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO 
14/02/17 22:22:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
14/02/17 22:22:55 INFO fs.TestDFSIO: creating control file: 10485760 bytes, 20 files 
14/02/17 22:22:56 INFO fs.TestDFSIO: created control files for: 20 files 
14/02/17 22:22:56 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 
14/02/17 22:22:56 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 
14/02/17 22:22:57 INFO mapred.FileInputFormat: Total input paths to process : 20 
14/02/17 22:22:57 INFO mapreduce.JobSubmitter: number of splits:20 
14/02/17 22:22:57 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name 
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class 
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name 
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir 
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class 
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir 
14/02/17 22:22:58 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1392675199090_0001 
14/02/17 22:22:59 INFO impl.YarnClientImpl: Submitted application application_1392675199090_0001 to ResourceManager at /0.0.0.0:8032 
14/02/17 22:22:59 INFO mapreduce.Job: The url to track the job: http://namenode.c.forward-camera-473.internal:8088/proxy/application_1392675199090_0001/ 
14/02/17 22:22:59 INFO mapreduce.Job: Running job: job_1392675199090_0001 
14/02/17 22:23:10 INFO mapreduce.Job: Job job_1392675199090_0001 running in uber mode : false 
14/02/17 22:23:10 INFO mapreduce.Job: map 0% reduce 0% 
14/02/17 22:23:42 INFO mapreduce.Job: map 20% reduce 0% 
14/02/17 22:23:43 INFO mapreduce.Job: map 30% reduce 0% 
14/02/17 22:24:14 INFO mapreduce.Job: map 60% reduce 0% 
14/02/17 22:24:41 INFO mapreduce.Job: map 60% reduce 20% 
14/02/17 22:24:45 INFO mapreduce.Job: map 85% reduce 20% 
14/02/17 22:24:48 INFO mapreduce.Job: map 85% reduce 28% 
14/02/17 22:24:59 INFO mapreduce.Job: map 90% reduce 28% 
14/02/17 22:25:00 INFO mapreduce.Job: map 90% reduce 30% 
14/02/17 22:25:02 INFO mapreduce.Job: map 100% reduce 30% 
14/02/17 22:25:03 INFO mapreduce.Job: map 100% reduce 100% 
14/02/17 22:25:16 INFO mapreduce.Job: map 0% reduce 0% 
14/02/17 22:25:16 INFO mapreduce.Job: Job job_1392675199090_0001 failed with state FAILED due to: Application application_1392675199090_0001 failed 2 times due to AM Container for appattempt_1392675199090_0001_000002 exited with exitCode: 1 due to: Exception from container-launch: 
org.apache.hadoop.util.Shell$ExitCodeException: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) 
    at org.apache.hadoop.util.Shell.run(Shell.java:379) 
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) 
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) 
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283) 
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
    at java.lang.Thread.run(Thread.java:744) 


.Failing this attempt.. Failing the application. 
14/02/17 22:25:16 INFO mapreduce.Job: Counters: 0 
java.io.IOException: Job failed! 
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836) 
    at org.apache.hadoop.fs.TestDFSIO.runIOTest(TestDFSIO.java:443) 
    at org.apache.hadoop.fs.TestDFSIO.writeTest(TestDFSIO.java:425) 
    at org.apache.hadoop.fs.TestDFSIO.run(TestDFSIO.java:755) 
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) 
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) 
    at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:650) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:606) 
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72) 
    at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) 
    at org.apache.hadoop.test.MapredTestDriver.run(MapredTestDriver.java:115) 
    at org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:123) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:606) 
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212) 

Посмотрев в файл журнала я нахожу на машине datanode в том, что:

[email protected]:/opt/hadoop-2.2.0/logs$ cat yarn-hadoop-nodemanager-datanode.log 
... 
2014-02-17 22:29:33,432 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 

на моем NameNode я сделал:

[email protected]:/opt/hadoop-2.2.0/logs$ cat yarn-hadoop-*log 
2014-02-17 22:13:20,833 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: STARTUP_MSG: 
... 
2014-02-17 22:13:25,240 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: The Auxilurary Service named 'mapreduce_shuffle' in the configuration is for class class org.apache.hadoop.mapred.ShuffleHandler which has a name of 'httpshuffle'. Because these are not the same tools trying to send ServiceData and read Service Meta Data may have issues unless the refer to the name in the config. 
... 
2014-02-17 22:13:25,505 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: NodeManager configured with 8 G physical memory allocated to containers, which is more than 80% of the total physical memory available (3.6 G). Thrashing might happen. 
... 
2014-02-17 22:24:48,779 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: KILL_CONTAINER sent to absent container container_1392675199090_0001_01_000023 
2014-02-17 22:24:48,779 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: KILL_CONTAINER sent to absent container container_1392675199090_0001_01_000024 
... 
2014-02-17 22:25:15,733 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1392675199090_0001_02_000001 is : 1 
2014-02-17 22:25:15,734 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1392675199090_0001_02_000001 and exit code: 1 
org.apache.hadoop.util.Shell$ExitCodeException: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) 
    at org.apache.hadoop.util.Shell.run(Shell.java:379) 
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) 
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) 
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283) 
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
    at java.lang.Thread.run(Thread.java:744) 
... 
2014-02-17 22:25:15,736 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 1 
... 
2014-02-17 22:25:15,751 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hadoop OPERATION=Container Finished - Failed TARGET=ContainerImpl RESULT=FAILURE DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE APPID=application_1392675199090_0001 CONTAINERID=container_1392675199090_0001_02_000001 
... 
2014-02-17 22:13:19,150 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: STARTUP_MSG: 
... 
2014-02-17 22:25:15,837 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hadoop OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1392675199090_0001 failed 2 times due to AM Container for appattempt_1392675199090_0001_000002 exited with exitCode: 1 due to: Exception from container-launch: 
org.apache.hadoop.util.Shell$ExitCodeException: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) 
    at org.apache.hadoop.util.Shell.run(Shell.java:379) 
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) 
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) 
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283) 
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
    at java.lang.Thread.run(Thread.java:744) 


.Failing this attempt.. Failing the application. APPID=application_1392675199090_0001 

Однако, я проверил на машине namenode, что слушает порт 8031. Я получаю:

[email protected]:~$ netstat 
Active Internet connections (w/o servers) 
Proto Recv-Q Send-Q Local Address   Foreign Address   State  
tcp  0  0 namenode.c.forwar:36975 metadata.google.in:http TIME_WAIT 
tcp  0  0 namenode.c.forwar:36969 metadata.google.in:http TIME_WAIT 
tcp  0  0 namenode.c.forwar:40616 namenode.c.forwar:10001 TIME_WAIT 
tcp  0  0 namenode.c.forwar:36974 metadata.google.in:http ESTABLISHED 
tcp  0  0 namenode.c.forward:8031 namenode.c.forwar:41229 ESTABLISHED 
tcp  0 352 namenode.c.forward-:ssh e178064245.adsl.a:64305 ESTABLISHED 
tcp  0  0 namenode.c.forwar:41229 namenode.c.forward:8031 ESTABLISHED 
tcp  0  0 namenode.c.forwar:40365 namenode.c.forwar:10001 ESTABLISHED 
tcp  0  0 namenode.c.forwar:10001 namenode.c.forwar:40365 ESTABLISHED 
tcp  0  0 namenode.c.forwar:10001 datanode:48786   ESTABLISHED 
Active UNIX domain sockets (w/o servers) 
Proto RefCnt Flags  Type  State   I-Node Path 
unix 10  [ ]   DGRAM     4604  /dev/log 
unix 2  [ ]   STREAM  CONNECTED  10490  
unix 2  [ ]   STREAM  CONNECTED  10488  
unix 2  [ ]   STREAM  CONNECTED  10452  
unix 2  [ ]   STREAM  CONNECTED  8452  
unix 2  [ ]   STREAM  CONNECTED  7800  
unix 2  [ ]   STREAM  CONNECTED  7797  
unix 2  [ ]   STREAM  CONNECTED  6762  
unix 2  [ ]   STREAM  CONNECTED  6702  
unix 2  [ ]   STREAM  CONNECTED  6698  
unix 2  [ ]   STREAM  CONNECTED  6208  
unix 2  [ ]   DGRAM     5750  
unix 2  [ ]   DGRAM     5737  
unix 2  [ ]   DGRAM     5734  
unix 3  [ ]   STREAM  CONNECTED  5643  
unix 3  [ ]   STREAM  CONNECTED  5642  
unix 2  [ ]   DGRAM     5640  
unix 2  [ ]   DGRAM     5192  
unix 2  [ ]   DGRAM     5171  
unix 2  [ ]   DGRAM     4889  
unix 2  [ ]   DGRAM     4723  
unix 2  [ ]   DGRAM     4663  
unix 3  [ ]   DGRAM     3132  
unix 3  [ ]   DGRAM     3131  

Итак, что может быть проблемой здесь. На мой взгляд, все наладилось отлично. Почему тогда моя работа терпит неудачу?

+0

Вы проверили настройки брандмауэра? Просто потому, что сервер прослушивает, это не означает, что порт доступен. Например, если вы используете Ubuntu, попробуйте запустить 'sudo ufw disable' на обеих машинах и посмотреть, исчезла ли ошибка. –

ответ

6

Бревно на datanode говорит

Retrying connect to server: 0.0.0.0/0.0.0.0:8031 

Так что для подключения пробует к этому порту на локальной машине, которая datanode. Однако услуга работает на namenode. Поэтому нужно добавить следующие строки в конфигурации yarn-site.xml

<property> 
    <name>yarn.resourcemanager.resource-tracker.address</name> 
    <value>namenode:8031</value> 
    </property> 
    <property> 
    <name>yarn.resourcemanager.address</name> 
    <value>namenode:8032</value> 
    </property> 
    <property> 
    <name>yarn.resourcemanager.scheduler.address</name> 
    <value>namenode:8030</value> 
    </property> 
    <property> 
    <name>yarn.resourcemanager.admin.address</name> 
    <value>namenode:8033</value> 
    </property> 
    <property> 
    <name>yarn.resourcemanager.webapp.address</name> 
    <value>namenode:8088</value> 
    </property> 

где namenode является псевдонимом в /etc/hosts для машины, которая запускает менеджер ресурсов демона.

0

Также добавьте те же свойства в файле yarn-site.xml в namenode, чтобы эти службы подключались к тем же портам.