Building Hadoop on the Raspberry Pi 2

Introduction

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. This post will outline the steps you need to perform, to build Hadoop on the Raspberry Pi 2.

Install Dependencies

Hadoop has some dependencies that we will need to install. Running the following command will install most of what we need:

sudo apt-get install build-essential g++ autoconf automake \
  libtool cmake zlib1g-dev pkg-config libssl-dev maven \
  libsnappy-dev libbz2-dev openjdk-7-jdk -y

We need to make sure OpenJDK 7 is our preferred Java implementation. You can do this by running the following command. It will allow you to set the preferred implementation for all programs, we only care about the Java related ones. When you see an option for Java, select the one where java-7-openjdk-armhf is the provider and the option number is greater than 0. I hope that all makes sense :-). More information on updating alternatives can be found here and here.

sudo update-alternatives --all

We will also need to install Protocol Buffers, which is outlined here.

Obtaining and Building the Source Code

Every user places source code in a different directory, so before you perform any of the following steps, please navigate to that directory first.

wget http://apache.cs.utah.edu/hadoop/common/hadoop-2.7.0/hadoop-2.7.0-src.tar.gz
tar xf hadoop-2.7.0-src.tar.gz
rm hadoop-2.7.0-src.tar.gz
cd hadoop-2.7.0-src

We will need to patch a couple of files before building Hadoop.

The first file is pom.xml in the hadoop-2.7.0-src directory.

wget --no-check-certificate https://grizzlykoalabear.com/wp-content/uploads/2015/07/apache-hadoop-2.7.0-pom.xml.patch
patch < apache-hadoop-2.7.0-pom.xml.patch

The next file is JNIFlags.cmake in the hadoop-2.7.0-src/hadoop-common-project/hadoop-common/src/ directory.

cd hadoop-common-project/hadoop-common/src/
wget https://issues.apache.org/jira/secure/attachment/12570212/HADOOP-9320.patch
patch < HADOOP-9320.patch
cd ../../../

Now that we have the files patched, we can start building Hadoop.

mvn package -Pdist,native -DskipTests -Dtar -DcompileSource=1.7 -Dignore.symbol.file

Installing Hadoop

First, we will need to create Hadoops new home, /opt/hadoop-2.7.0.

sudo mkdir /opt/hadoop-2.7.0

Copy the files to their new home.

sudo cp -R hadoop-dist/target/hadoop-2.7.0/* /opt/hadoop-2.7.0/

Create a symbolic link to make things friendlier.

sudo ln -s /opt/hadoop-2.7.0/ /opt/hadoop

Update /etc/profile to include Hadoop paths.

sudo nano /etc/profile

We want to export some values so Hadoop can be easily found. At the top of the file, add the following lines:

HADOOP_PREFIX=/opt/hadoop
export HADOOP_PREFIX
HADOOP_INSTALL=$HADOOP_INSTALL
export HADOOP_INSTALL

Find where PATH is set and add the following to the end:

:$HADOOP_INSTALL/bin

You’ll need to logout, then log back in, to have the /etc/profile changes take effect.

Complete Script (minus the profile changes)

#!/bin/bash
sudo apt-get install build-essential g++ autoconf automake \
  libtool cmake zlib1g-dev pkg-config libssl-dev maven \
  libsnappy-dev libbz2-dev openjdk-7-jdk -y
sudo update-alternatives --all
mkdir -p $HOME/src
cd $HOME/src
wget http://apache.cs.utah.edu/hadoop/common/hadoop-2.7.0/hadoop-2.7.0-src.tar.gz
tar xf hadoop-2.7.0-src.tar.gz
rm hadoop-2.7.0-src.tar.gz
cd hadoop-2.7.0-src
wget --no-check-certificate https://grizzlykoalabear.com/wp-content/uploads/2015/07/apache-hadoop-2.7.0-pom.xml.patch
patch < apache-hadoop-2.7.0-pom.xml.patch
cd hadoop-common-project/hadoop-common/src/
wget https://issues.apache.org/jira/secure/attachment/12570212/HADOOP-9320.patch
patch < HADOOP-9320.patch
cd ../../../
mvn package -Pdist,native -DskipTests -Dtar -DcompileSource=1.7 -Dignore.symbol.file
sudo mkdir /opt/hadoop-2.7.0
sudo cp -R hadoop-dist/target/hadoop-2.7.0/* /opt/hadoop-2.7.0/
sudo ln -s /opt/hadoop-2.7.0/ /opt/hadoop

What’s next?

Your best bet is to check out the Documentation for Hadoop to get a better idea as to what you can do with it.

Enjoy!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.