Prerequisites (copied from hadoop-common repository)
* Unix System
* JDK 1.6+
* Maven 3.0 or later
* Findbugs 1.3.9 (if running findbugs)
* ProtocolBuffer 2.5.0
* CMake 2.6 or newer (if compiling native code)
* Internet connection for first build (to fetch all Maven and Hadoop dependencies)
Environment
Linux: I am using a rather old 32bit Debian 6.0.6.
debian@debian:~$ uname -a
Linux debian 2.6.32-5-686 #1 SMP Sun Sep 23 09:49:36 UTC 2012 i686 GNU/Linux
Java: I have the newest (at the time this article is written) Java 1.7 installed
debian@debian:~$ java -version
java version "1.7.0_45"
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) Server VM (build 24.45-b08, mixed mode)
Build and install the protocolbuffer-compiler 2.5.0
The newest and the one required by Hadoop is version 2.5.0. This is only available in debian experimental repository (at this time), and I could not get it installed via apt-get. If your Linux distribution provides 2.5.0 from software repository, use that one.
First you are going to need g++ installed. My virtual machine was really pure in terms of installed software, so I had to install g++ first:
$ aptitude install g++
Download source from here: https://code.google.com/p/protobuf/downloads/detail?name=protobuf-2.5.0.tar.gz
$ tar -xvzf protobuf-2.5.0.tar.gz
$ cd protobuf-2.5.0
$ ./configure --disable-shared #[1]
$ make install
The above commands compiled, built and hopefully installed protoc into /usr/local/bin/protoc .
Install Maven 3.0+
Choose a 3.0+ version from link below. I used 3.1.1, the newest one available at the time this article written. http://maven.apache.org/download.cgi
You need a binary tar.gz:
Put Maven to its place:
$ tar -xvzf apache-maven-3.1.1-bin.tar.gz
$ mkdir -p /usr/local/maven/
$ mv apache-maven-3.1.1 /usr/local/maven
$ ln -s /usr/local/maven/apache-maven-3.1.1 /usr/local/maven/current
Put a symlink into /usr/sbin
$ ln -s /usr/local/maven/current/bin/mvn /usr/sbin/mvn
In fact, this is the same way, how you install the Oracle JDK/JRE. The other way is to put the .../bin folder of the appliction on the $PATH variable at the end of /etc/bash.bashrc.
Install Git
This is available from repository:
$ aptitude install git
Clone hadoop-common
Go to your Eclipse workspace, or create one if you don't have any. I put it into my home:
$ mkdir -p ~/Development/workspace_eclipse_java
Clone the git repository:
$ git clone https://github.com/apache/hadoop-common.git hadoop-common
Install hadoop Maven plugin
Hadoop has it's own maven plugin to do stuff:
$ cd hadoop-maven-plugins
$ mvn install
First build everything
I found the project setup and build well documented. Everything is written down in the BUILDING.txt [2]
First you need to build the whole hadoop-common to allow Maven caching the dependency jars in your local repository. That way, eclipse will be able to resolve all your inter-project dependencies.
$ cd hadoop-common
$ mvn install -DskipTests -nsu #-nsu means something cache forever
Generate Eclipse projects
I am only interested in YARN and MapReduce components, so I will:
$ cd hadoop-yarn-project
$ mvn eclipse:eclipse -DskipTests
Set M2_REPO variable in Eclipse
If not yet set, you have to create a variable in eclipse pointing to your local Maven repository, as every dependencies in the generated .classpath file start with M2_REPO/..
[Window] => [Preferences]
Java -> Build Path -> Classpath Variables
Add a new one named M2_REPO pointing to your Maven local repository, which by default is at /home/username/.m2/repository
Import projects into Eclipse
[File] => [Import]
General -> Existing Projects into workspace
Set your root directory to the hadoop component you want to import. In my case it's
hadoop-common/hadoop-yarn-project/hadoop-yarn
I highly recommend creating working set to every hadoop component, since they all consists of several eclipse projects.
Enjoy!
[2] https://github.com/apache/hadoop-common/blob/trunk/BUILDING.txt
Does Hadoop 2.2.0 works fine with Jdk1.7.0_45 on WIndows 7 (64 bit OS with Cygwin)
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDeleteFrom my view, the only difference between learning at hadoop online training and learning from the informative blogs like this is that, here in these blogs we can connect to more examples along with the videos which helps us to understand the subject to the point.
ReplyDelete