If logging is to be used to accumulate event logs in a Hadoop system, design an application to analyze the content of the accumulated event logs. The application will use the Hadoop API and operate on the Hadoop system.
Refer to the Interstage Big Data Parallel Processing Server (hereafter, referred to as "BDPP") manuals for information on designing and developing applications to operate on a Hadoop system.
The data formats of the event logs to be analyzed by this application are shown below.
Event logs are output to a log storage area specified in the event type definition or in the logging listener in a complex event processing statement. The log storage area that will be the output destination is generated automatically.
If the output destination is a Hadoop system, the details are as follows:
The output destination can be changed using the value specified in the directory element of the engine configuration file.
If a directory name is specified in the directory element, the output destination will be a path made by joining the following values:
Value set in "pdfs.fs.local.basedir" (*1)
Directory name specified in the engine configuration file
Log storage area specified in the event type definition or logging listener
Automatically generated log file name
*1: "pdfs.fs.local.basedir" is the Hadoop mount directory. Refer to the BDPP manuals for details.
If a slash (/) only is specified in the directory element, the output destination will be a path made by joining the following values:
Value set in "pdfs.fs.local.basedir"
Log storage area specified in the event type definition or logging listener
Automatically generated log file name
Example
Example of output destination
The output destination will be "/mnt/pdfs/hadoop/tmp/logFileName" for the following conditions:
If the value set in "pdfs.fs.local.basedir" is "/mnt/pdfs"; and
If "hadoop" is specified as the directory name in the engine configuration file; and
If "/tmp" is specified as the log storage area specified in the event type definition or logging listener of the complex event processing statement
The output destination will be "/mnt/pdfs/tmp/logFileName" for the following conditions:
If the value set in "pdfs.fs.local.basedir" is "/mnt/pdfs"; and
If a slash (/) is specified as the directory name in the engine configuration file; and
If "/tmp" is specified as the log storage area specified in the event type definition or logging listener of the complex event processing statement
Note
If the output destination of the event log is duplicated and the format of the event data is the same, event data of a different event type will be output to the same file. If analysis is to be performed by event type or by output by logging listener, separate the output destinations.
The format will be Hadoop SequenceFile (binary file) format.
A log file will be automatically generated in the log storage area using the file name shown below.
This file will be renamed with the ".done" extension in 300 seconds by default.
dateTime_VMname_branchNumber
dateTime: yyyyMMddHHmmssSSS
VMname: processID@CEPserverHostName
branchNumber: 0000000001 to 0000000122
Point
A file with the ".done" extension will be analyzed by the event log analysis application. Move it to an arbitrary directory to analyze it.
Note
A file with an extension other than ".done" is a file that is being output, so do not perform an operation on it.
The upper limit of the file size is LONG MAX (263 - 1).
None
The date and time information (yyyyMMddHHmmss) will be the key. The corresponding Hadoop type (API) is "org.apache.hadoop.io.Text".
The date and time above will be the date and time at which the event data was written. (This may differ from the date and time at which the CEP engine received the events.)
Input events are output as they are. The corresponding Hadoop type (API) is "org.apache.hadoop.io.BytesWritable".
Record compression
6
Information
If outputting to the engine log
Input events are output to the engine log unchanged.