Red Quickstart

 

This guide will provide you with the basic information needed to get jobs up and running on Red, a cluster running Hadoop and Accumulo. 

Login Procedure

The Red nodes should always be accessed remotely from the lab machines or the pool of nodes in the lab queue.  HDFS, MapReduce, and Accumulo applications are all run from the interactive lab nodes. Please connect through login.msi.umn.edu or nx.msi.umn.edu. See MSI's interactive connections FAQ, and use isub to start an interactive session on a lab machine.

Available Software:

Accumulo, as well as Hadoop HDFS and MapReduce are installed on the cluster. Additional software you need to access should be installed in your home directory.

Compilers and libraries

Java and Maven are available on the lab nodes to compile applications for Hadoop and Accumulo. The D4M libraries are available for users of the D4M Matlab Interface to Accumulo.

Moving Files Into HDFS

HDFS is a distributed filesystem. Each node in the Hadoop cluster holds some of its data in HDFS. The Namenode (red50) holds the metadata for all files in the HDFS.

To move files to HDFS, begin by loading the default environment settings for Hadoop.

module load hadoop

To list existing files:

hadoop fs -ls /user/$USER

To copy a local directory to HDFS on Red:

hadoop fs -put ~/mylocaldirectory /user/$USER

 

Executing a MapReduce Job

When you load the hadoop module, it will target the Red cluster by default. This example application writes random data to files in the target directory (/users/blynch/result_dir). You can copy the random.xml input from /nfs/soft-el6/hadoop/0.20.205.0/random.xml

export apps=/nfs/soft-el6/hadoop/0.20.205.0
hadoop jar $apps/hadoop-examples-0.20.205.0.jar randomwriter \
/user/blynch/result_dir ./random.xml

 

Starting the Accumulo Shell

Load the environment

module load hadoop
module load accumulo

Start the shell using your accumulo (not MSI) credentials.
 

accumulo shell -u $USER

 

Compiling Code:

A default CLASSPATH is loaded when you load the accumulo module.  javac can be used to compile your code at the commandline, or within Maven, or with your own IDE.

module load accumulo
javac ClassName.java

or

Maven can also be used to compile your Accumulo application.  Starting with a basic pom.xml file to manage the dependencies, you can compile an application with:

export JAVA_HOME=/usr/java/latest
mvn package

 

Run Accumulo Jobs Interactively:

The accumulo module will set the classpath to include Hadoop and Accumulo Jar files. You can change the CLASSPATH variable if you need to include additional jars for your application.

module load accumulo
java <ClassName> <Arguments>

 

The Hadoop Jobtracker Queue

The queue on Red is First-In-First-Out (FIFO). Please limit the size of your tests or benchmarks, and be respectful of the other MSI users on the system.

Useful Commands

View jobs in the queue.

hadoop job -list

kill one of your own jobs

hadoop job -kill <job-id>
Attachment: