D4M attempts to combine the advantages of five distinct processing technologies (sparse linear algebra, associative arrays, fuzzy algebra, distributed arrays, and triple-store/NoSQL databases such as Hadoop HBase and Apache Accumulo) to provide a database and computation system that addresses the problems associated with Big Data.
The h5py package is a Pythonic interface to the HDF5 binary data format.
The Hadoop Map/Reduce framework harnesses a cluster of machines and executes user defined Map/Reduce jobs across the nodes in the cluster. On itasca, a script exists to create an ephemeral Hadoop cluster on the set of nodes assigned by the scheduler. The script setup_cluster will format a HDFS filesystem on the local scratch disks.