-
Notifications
You must be signed in to change notification settings - Fork 167
Running the Hadoop Example
A simple Hadoop example with DataGenerator is now available in the master branch (to be released with DG v2).
Here is what I am doing to quickly run this example on my machine. I find these steps to be easier than installing a Hadoop cluster or the Cloudera VM. Also, we can make these steps easier in the future by pushing a docker image for testing purposes.
TODO: Collapse steps 2-6 (or the equivalent, e.g. without the checkout and build steps but jars instead) into a public Docker image.
-
Install Docker: https://www.docker.com/
-
(If not on Linux) Run Boot2Docker to start a VM which can access docker. Linux machines can support Docker natively.
-
Run this command to start up a terminal on the Hadoop machine:
docker run -i -t sequenceiq/hadoop-docker /etc/bootstrap.sh -bash
- Install some dependencies for building DataGenerator on this machine:
yum install -y git-core wget
wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
yum install -y apache-maven
- Clone and build DataGenerator jars, including the example:
mkdir work
cd work
git clone https://github.com/bryantrobbins/DataGenerator
cd DataGenerator
mvn clean install
- Make using Hadoop easier on yourself:
export PATH=$PATH:$HADOOP_PREFIX/bin
- Make calling this example easier on yourself. Create a file driver.sh with these contents:
TODO: Insert driver script here
- Finally, run the DG Hadoop example from its jar:
./driver.sh