以下は、COBOLプログラムをReduceアプリケーションとして実行した際、COBOLの実行時エラーが発生しHadoopジョブの実行が失敗した例です。
Reduceタスクは100%まで進捗しますが、ジョブはエラーで終了します。
[hadoop@hadoop1 hadoop01]$ cobhadoop.sh -conf conf/configuration.xml -files reduce.exe 14/04/24 10:44:44 INFO util.NativeCodeLoader: Loaded the native-hadoop library 14/04/24 10:44:44 WARN snappy.LoadSnappy: Snappy native library not loaded 14/04/24 10:44:44 INFO mapred.FileInputFormat: Total input paths to process : 1 14/04/24 10:44:44 INFO mapred.FileInputFormat: Total input paths to process : 1 14/04/24 10:44:44 INFO mapred.JobClient: Running job: job_201403281518_0349 14/04/24 10:44:45 INFO mapred.JobClient: map 0% reduce 0% 14/04/24 10:45:01 INFO mapred.JobClient: map 25% reduce 0% 14/04/24 10:45:02 INFO mapred.JobClient: map 50% reduce 0% 14/04/24 10:45:07 INFO mapred.JobClient: map 75% reduce 0% 14/04/24 10:45:08 INFO mapred.JobClient: map 100% reduce 0% 14/04/24 10:45:11 INFO mapred.JobClient: map 100% reduce 33% 14/04/24 10:45:14 INFO mapred.JobClient: map 100% reduce 100% 14/04/24 10:45:17 INFO mapred.JobClient: Job complete: job_201403281518_0349 14/04/24 10:45:17 INFO mapred.JobClient: Counters: 31 14/04/24 10:45:17 INFO mapred.JobClient: Job Counters 14/04/24 10:45:17 INFO mapred.JobClient: Launched reduce tasks=1 14/04/24 10:45:17 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=29916 14/04/24 10:45:17 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 14/04/24 10:45:17 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 14/04/24 10:45:17 INFO mapred.JobClient: Launched map tasks=4 14/04/24 10:45:17 INFO mapred.JobClient: Data-local map tasks=4 14/04/24 10:45:17 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=12460 14/04/24 10:45:17 INFO mapred.JobClient: File Input Format Counters 14/04/24 10:45:17 INFO mapred.JobClient: Bytes Read=0 14/04/24 10:45:17 INFO mapred.JobClient: File Output Format Counters 14/04/24 10:45:17 INFO mapred.JobClient: Bytes Written=0 14/04/24 10:45:17 INFO mapred.JobClient: FileSystemCounters 14/04/24 10:45:17 INFO mapred.JobClient: FILE_BYTES_READ=320873 14/04/24 10:45:17 INFO mapred.JobClient: HDFS_BYTES_READ=194818 14/04/24 10:45:17 INFO mapred.JobClient: FILE_BYTES_WRITTEN=809644 14/04/24 10:45:17 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=287 14/04/24 10:45:17 INFO mapred.JobClient: Extjoiner.ExitStatus 14/04/24 10:45:17 INFO mapred.JobClient: reduce.00=134 14/04/24 10:45:17 INFO mapred.JobClient: Map-Reduce Framework 14/04/24 10:45:17 INFO mapred.JobClient: Map output materialized bytes=320604 14/04/24 10:45:17 INFO mapred.JobClient: Map input records=10020 14/04/24 10:45:17 INFO mapred.JobClient: Reduce shuffle bytes=320604 14/04/24 10:45:17 INFO mapred.JobClient: Spilled Records=20040 14/04/24 10:45:17 INFO mapred.JobClient: Map output bytes=300540 14/04/24 10:45:17 INFO mapred.JobClient: Total committed heap usage (bytes)=826212352 14/04/24 10:45:17 INFO mapred.JobClient: CPU time spent (ms)=6180 14/04/24 10:45:17 INFO mapred.JobClient: Map input bytes=190320 14/04/24 10:45:17 INFO mapred.JobClient: SPLIT_RAW_BYTES=1032 14/04/24 10:45:17 INFO mapred.JobClient: Combine input records=0 14/04/24 10:45:17 INFO mapred.JobClient: Reduce input records=10020 14/04/24 10:45:17 INFO mapred.JobClient: Reduce input groups=40 14/04/24 10:45:17 INFO mapred.JobClient: Combine output records=0 14/04/24 10:45:17 INFO mapred.JobClient: Physical memory (bytes) snapshot=876621824 14/04/24 10:45:17 INFO mapred.JobClient: Reduce output records=0 14/04/24 10:45:17 INFO mapred.JobClient: Virtual memory (bytes) snapshot=5048295424 14/04/24 10:45:17 INFO mapred.JobClient: Map output records=10020 EX0003:ジョブの実行に失敗しました。
ログから、Reduceタスクの00番が、復帰値134を返していることがわかります。
134はCOBOLの実行時エラー(Uエラー)の復帰値のため、実行時エラーが起きている可能性が高いことがわかります。
スレーブサーバのシスログを確認します。
# tail /var/log/messages
(ログ抜粋)
Apr 24 10:45:17 hadoop2 : COBOL:rts64: HALT: JMP0015I-U [PID:000053FA TID:6BEC76E0] CANNOT CALL PROGRAM 'ABC'. "dlopen-so=libABC.so: cannot open shared object file: No such file or directory dlsym-out=./reduce.exe: undefined symbol: ABC" PGM=REDUCE. LINE=54
該当する時間に、実行時エラーが発生していることがわかります。