ページの先頭行へ戻る
NetCOBOL V12.0 ユーザーズガイド(Hadoop連携機能編)
FUJITSU Software

4.3.3 ジョブ失敗時の例(COBOLプログラムで実行時エラーが起きている例)

以下は、COBOLプログラムをReduceアプリケーションとして実行した際、COBOLの実行時エラーが発生しHadoopジョブの実行が失敗した例です。

Reduceタスクは100%まで進捗しますが、ジョブはエラーで終了します。

[hadoop@hadoop1 hadoop01]$ cobhadoop.sh -conf  conf/configuration.xml -files reduce.exe
14/04/24 10:44:44 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/04/24 10:44:44 WARN snappy.LoadSnappy: Snappy native library not loaded
14/04/24 10:44:44 INFO mapred.FileInputFormat: Total input paths to process : 1
14/04/24 10:44:44 INFO mapred.FileInputFormat: Total input paths to process : 1
14/04/24 10:44:44 INFO mapred.JobClient: Running job: job_201403281518_0349
14/04/24 10:44:45 INFO mapred.JobClient:  map 0% reduce 0%
14/04/24 10:45:01 INFO mapred.JobClient:  map 25% reduce 0%
14/04/24 10:45:02 INFO mapred.JobClient:  map 50% reduce 0%
14/04/24 10:45:07 INFO mapred.JobClient:  map 75% reduce 0%
14/04/24 10:45:08 INFO mapred.JobClient:  map 100% reduce 0%
14/04/24 10:45:11 INFO mapred.JobClient:  map 100% reduce 33%
14/04/24 10:45:14 INFO mapred.JobClient:  map 100% reduce 100%
14/04/24 10:45:17 INFO mapred.JobClient: Job complete: job_201403281518_0349
14/04/24 10:45:17 INFO mapred.JobClient: Counters: 31
14/04/24 10:45:17 INFO mapred.JobClient:   Job Counters
14/04/24 10:45:17 INFO mapred.JobClient:     Launched reduce tasks=1
14/04/24 10:45:17 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=29916
14/04/24 10:45:17 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/04/24 10:45:17 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/04/24 10:45:17 INFO mapred.JobClient:     Launched map tasks=4
14/04/24 10:45:17 INFO mapred.JobClient:     Data-local map tasks=4
14/04/24 10:45:17 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=12460
14/04/24 10:45:17 INFO mapred.JobClient:   File Input Format Counters
14/04/24 10:45:17 INFO mapred.JobClient:     Bytes Read=0
14/04/24 10:45:17 INFO mapred.JobClient:   File Output Format Counters
14/04/24 10:45:17 INFO mapred.JobClient:     Bytes Written=0
14/04/24 10:45:17 INFO mapred.JobClient:   FileSystemCounters
14/04/24 10:45:17 INFO mapred.JobClient:     FILE_BYTES_READ=320873
14/04/24 10:45:17 INFO mapred.JobClient:     HDFS_BYTES_READ=194818
14/04/24 10:45:17 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=809644
14/04/24 10:45:17 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=287
14/04/24 10:45:17 INFO mapred.JobClient:   Extjoiner.ExitStatus
14/04/24 10:45:17 INFO mapred.JobClient:     reduce.00=134
14/04/24 10:45:17 INFO mapred.JobClient:   Map-Reduce Framework
14/04/24 10:45:17 INFO mapred.JobClient:     Map output materialized bytes=320604
14/04/24 10:45:17 INFO mapred.JobClient:     Map input records=10020
14/04/24 10:45:17 INFO mapred.JobClient:     Reduce shuffle bytes=320604
14/04/24 10:45:17 INFO mapred.JobClient:     Spilled Records=20040
14/04/24 10:45:17 INFO mapred.JobClient:     Map output bytes=300540
14/04/24 10:45:17 INFO mapred.JobClient:     Total committed heap usage (bytes)=826212352
14/04/24 10:45:17 INFO mapred.JobClient:     CPU time spent (ms)=6180
14/04/24 10:45:17 INFO mapred.JobClient:     Map input bytes=190320
14/04/24 10:45:17 INFO mapred.JobClient:     SPLIT_RAW_BYTES=1032
14/04/24 10:45:17 INFO mapred.JobClient:     Combine input records=0
14/04/24 10:45:17 INFO mapred.JobClient:     Reduce input records=10020
14/04/24 10:45:17 INFO mapred.JobClient:     Reduce input groups=40
14/04/24 10:45:17 INFO mapred.JobClient:     Combine output records=0
14/04/24 10:45:17 INFO mapred.JobClient:     Physical memory (bytes) snapshot=876621824
14/04/24 10:45:17 INFO mapred.JobClient:     Reduce output records=0
14/04/24 10:45:17 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=5048295424
14/04/24 10:45:17 INFO mapred.JobClient:     Map output records=10020
EX0003:ジョブの実行に失敗しました。

ログから、Reduceタスクの00番が、復帰値134を返していることがわかります。

134はCOBOLの実行時エラー(Uエラー)の復帰値のため、実行時エラーが起きている可能性が高いことがわかります。

スレーブサーバのシスログを確認します。

# tail /var/log/messages

(ログ抜粋)

Apr 24 10:45:17 hadoop2 : COBOL:rts64: HALT: JMP0015I-U [PID:000053FA TID:6BEC76E0] CANNOT CALL PROGRAM 'ABC'. "dlopen-so=libABC.so: cannot open shared object file: No such file or directory dlsym-out=./reduce.exe: undefined symbol: ABC" PGM=REDUCE. LINE=54

該当する時間に、実行時エラーが発生していることがわかります。