Running the Hadoop Example

A simple Hadoop example with DataGenerator is now available in the master branch (to be released with DG v2).

Here is what I am doing to quickly run this example on my machine. I find these steps to be easier than installing a Hadoop cluster or the Cloudera VM. Also, we can make these steps easier in the future by pushing a docker image for testing purposes.

TODO: Collapse steps 2-6 (or the equivalent, e.g. without the checkout and build steps but jars instead) into a public Docker image.

Install Docker: https://www.docker.com/
(If not on Linux) Run Boot2Docker to start a VM which can access docker. Linux machines can support Docker natively.
Run this command to start up a terminal on the Hadoop machine:

docker run -i -t sequenceiq/hadoop-docker /etc/bootstrap.sh -bash

Install some dependencies for building DataGenerator on this machine:

yum install -y git-core wget

wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo

yum install -y apache-maven

Clone and build DataGenerator jars, including the example:

mkdir work

cd work

git clone https://github.com/bryantrobbins/DataGenerator

cd DataGenerator

mvn clean install

Make using Hadoop easier on yourself:

export PATH=$PATH:$HADOOP_PREFIX/bin

Make calling this example easier on yourself. Create a file driver.sh with these contents:

TODO: Insert driver script here

Finally, run the DG Hadoop example from its jar:

./driver.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Running the Hadoop Example

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally