Top
Interstage Big DataParallel Processing ServerV1.0.0 User's Guide
Interstage

5.2.2 Developing Applications

Conventionally, in order to achieve parallel distributed processing of Big Data, complicated programs need to be created for synchronization processing and so on. Under Hadoop, there is no need to consider parallel distributed processing when creating programs. The user just creates programs as two applications: applications that perform Map processing and Reduce processing in accordance with MapReduce algorithms. The distributed storage and extraction of data and the parallel execution of created processing is all left up to Hadoop.

5.2.2.1 Application Overview

The applications for performing processing under Hadoop include the following types:

The development of MapReduce applications is described below. Refer to the website and similar of the Apache Hadoop project for information on developing other applications.


5.2.2.2 Designing Applications

Design the application processing logic. The processing such as input file splitting, merge, and so on, which needs to be designed under conventional parallel distributed processing, does not need to be designed because that is executed by the Hadoop framework. Therefore, developers can concentrate on designing the logic required for jobs.

For MapReduce applications, the application developer must understand the Hadoop API and design applications in accordance with the MapReduce framework. The main design tasks required are:


5.2.2.3 Creating Applications

Create applications based on the application design result.


MapReduce applications

As with the creation of ordinary Java applications, create a Java project and perform coding.

The Hadoop API can be used by adding the Hadoop jar file to the Eclipse build box.

Note that specification of hadoop -core-xxx.jar is mandatory (enter the Hadoop version at "xxx"). Specify other suitable Hadoop libraries in accordance with the Hadoop API being used.


5.2.2.4 References for Developing MapReduce Applications

Refer to the following information provided by Apache Hadoop for MapReduce application references:

If Hadoop API is used