Skip to content

Running the Hadoop Example

Bryan Robbins edited this page Aug 22, 2014 · 2 revisions

A simple Hadoop example with DataGenerator is now available in the master branch (to be released with DG v2).

Here is what I am doing to quickly run this example on my machine. I find these steps to be easier than installing a Hadoop cluster or the Cloudera VM. Also, we can make these steps easier in the future by pushing a docker image for testing purposes.

TODO: Collapse steps 2-6 (or the equivalent, e.g. without the checkout and build steps but jars instead) into a public Docker image.

  1. Install Docker: https://www.docker.com/

  2. (If not on Linux) Run Boot2Docker to start a VM which can access docker. Linux machines can support Docker natively.

  3. Run this command to start up a terminal on the Hadoop machine:

docker run -i -t sequenceiq/hadoop-docker /etc/bootstrap.sh -bash

  1. Install some dependencies for building DataGenerator on this machine:

yum install -y git-core wget

wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo

yum install -y apache-maven

  1. Clone and build DataGenerator jars, including the example:

mkdir work

cd work

git clone https://github.com/bryantrobbins/DataGenerator

cd DataGenerator

mvn clean install

  1. Make using Hadoop easier on yourself:

export PATH=$PATH:$HADOOP_PREFIX/bin

  1. Make calling this example easier on yourself. Create a file driver.sh with these contents:

TODO: Insert driver script here

  1. Finally, run the DG Hadoop example from its jar:

./driver.sh

Clone this wiki locally