You are here
- Login Procedure
- Available Software
- Moving Files Into HDFS
- Executing a MapReduce Job
- Starting the Accumulo Shell
- Compiling Code
- Run Accumulo Jobs Interactively
- The Hadoop JobTracker
This guide will provide you with the basic information needed to get jobs up and running on Red, a cluster running Hadoop and Accumulo.
The Red nodes should always be accessed remotely from the lab machines or the pool of nodes in the lab queue. HDFS, MapReduce, and Accumulo applications are all run from the interactive lab nodes. Please connect through login.msi.umn.edu or nx.msi.umn.edu. See MSI's interactive connections FAQ, and use isub to start an interactive session on a lab machine.
Accumulo, as well as Hadoop HDFS and MapReduce are installed on the cluster. Additional software you need to access should be installed in your home directory.
Compilers and libraries
Java and Maven are available on the lab nodes to compile applications for Hadoop and Accumulo. The D4M libraries are available for users of the D4M Matlab Interface to Accumulo.
HDFS is a distributed filesystem. Each node in the Hadoop cluster holds some of its data in HDFS. The Namenode (red50) holds the metadata for all files in the HDFS.
To move files to HDFS, begin by loading the default environment settings for Hadoop.
module load hadoop
To list existing files:
hadoop fs -ls /user/$USER
To copy a local directory to HDFS on Red:
hadoop fs -put ~/mylocaldirectory /user/$USER
When you load the hadoop module, it will target the Red cluster by default. This example application writes random data to files in the target directory (/users/blynch/result_dir). You can copy the random.xml input from /nfs/soft-el6/hadoop/0.20.205.0/random.xml
hadoop jar $apps/hadoop-examples-0.20.205.0.jar randomwriter \ /user/blynch/result_dir ./random.xml
Load the environment
module load hadoop module load accumulo
Start the shell using your accumulo (not MSI) credentials.
accumulo shell -u $USER
A default CLASSPATH is loaded when you load the accumulo module. javac can be used to compile your code at the commandline, or within Maven, or with your own IDE.
module load accumulo javac ClassName.java
Maven can also be used to compile your Accumulo application. Starting with a basic pom.xml file to manage the dependencies, you can compile an application with:
export JAVA_HOME=/usr/java/latest mvn package
The accumulo module will set the classpath to include Hadoop and Accumulo Jar files. You can change the CLASSPATH variable if you need to include additional jars for your application.
module load accumulo java <ClassName> <Arguments>
The queue on Red is First-In-First-Out (FIFO). Please limit the size of your tests or benchmarks, and be respectful of the other MSI users on the system.
View jobs in the queue.
hadoop job -list
kill one of your own jobs
hadoop job -kill <job-id>