Prerequisites (copied from hadoop-common repository)
* Unix System
* JDK 1.6+
* Maven 3.0 or later
* Findbugs 1.3.9 (if running findbugs)
* ProtocolBuffer 2.5.0
* CMake 2.6 or newer (if compiling native code)
* Internet connection for first build (to fetch all Maven and Hadoop dependencies)
Environment
Linux: I am using a rather old 32bit Debian 6.0.6.
debian@debian:~$ uname -a
Linux debian 2.6.32-5-686 #1 SMP Sun Sep 23 09:49:36 UTC 2012 i686 GNU/Linux
Java: I have the newest (at the time this article is written) Java 1.7 installed
debian@debian:~$ java -version
java version "1.7.0_45"
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) Server VM (build 24.45-b08, mixed mode)
Build and install the protocolbuffer-compiler 2.5.0
The newest and the one required by Hadoop is version 2.5.0. This is only available in debian experimental repository (at this time), and I could not get it installed via apt-get. If your Linux distribution provides 2.5.0 from software repository, use that one.
First you are going to need g++ installed. My virtual machine was really pure in terms of installed software, so I had to install g++ first:
$ aptitude install g++
Download source from here: https://code.google.com/p/protobuf/downloads/detail?name=protobuf-2.5.0.tar.gz
$ tar -xvzf protobuf-2.5.0.tar.gz
$ cd protobuf-2.5.0
$ ./configure --disable-shared #[1]
$ make install
The above commands compiled, built and hopefully installed protoc into /usr/local/bin/protoc .
Install Maven 3.0+
Choose a 3.0+ version from link below. I used 3.1.1, the newest one available at the time this article written. http://maven.apache.org/download.cgi
You need a binary tar.gz:
Put Maven to its place:
$ tar -xvzf apache-maven-3.1.1-bin.tar.gz
$ mkdir -p /usr/local/maven/
$ mv apache-maven-3.1.1 /usr/local/maven
$ ln -s /usr/local/maven/apache-maven-3.1.1 /usr/local/maven/current
Put a symlink into /usr/sbin
$ ln -s /usr/local/maven/current/bin/mvn /usr/sbin/mvn
In fact, this is the same way, how you install the Oracle JDK/JRE. The other way is to put the .../bin folder of the appliction on the $PATH variable at the end of /etc/bash.bashrc.
Install Git
This is available from repository:
$ aptitude install git
Clone hadoop-common
Go to your Eclipse workspace, or create one if you don't have any. I put it into my home:
$ mkdir -p ~/Development/workspace_eclipse_java
Clone the git repository:
$ git clone https://github.com/apache/hadoop-common.git hadoop-common
Install hadoop Maven plugin
Hadoop has it's own maven plugin to do stuff:
$ cd hadoop-maven-plugins
$ mvn install
First build everything
I found the project setup and build well documented. Everything is written down in the BUILDING.txt [2]
First you need to build the whole hadoop-common to allow Maven caching the dependency jars in your local repository. That way, eclipse will be able to resolve all your inter-project dependencies.
$ cd hadoop-common
$ mvn install -DskipTests -nsu #-nsu means something cache forever
Generate Eclipse projects
I am only interested in YARN and MapReduce components, so I will:
$ cd hadoop-yarn-project
$ mvn eclipse:eclipse -DskipTests
Set M2_REPO variable in Eclipse
If not yet set, you have to create a variable in eclipse pointing to your local Maven repository, as every dependencies in the generated .classpath file start with M2_REPO/..
[Window] => [Preferences]
Java -> Build Path -> Classpath Variables
Add a new one named M2_REPO pointing to your Maven local repository, which by default is at /home/username/.m2/repository
Import projects into Eclipse
[File] => [Import]
General -> Existing Projects into workspace
Set your root directory to the hadoop component you want to import. In my case it's
hadoop-common/hadoop-yarn-project/hadoop-yarn
I highly recommend creating working set to every hadoop component, since they all consists of several eclipse projects.
Enjoy!
[2] https://github.com/apache/hadoop-common/blob/trunk/BUILDING.txt