The Hadoop Map/Reduce framework harnesses a cluster of machines and executes user defined Map/Reduce jobs across the nodes in the cluster. On itasca, a script exists to create an ephemeral Hadoop cluster on the set of nodes assigned by the scheduler. The script setup_cluster will format a HDFS filesystem on the local scratch disks. This resource is best-suited for application benchmarking, and algorithm testing. All data must be moved to HDFS after the cluster is brought up when the jobs starts. Any data that you wish to save must be moved to your home directory before the job completes. Many...
The h5py package is a Pythonic interface to the HDF5 binary data format.
Read SAM/BAM databases within Bioperl. http://search.cpan.org/~lds/Bio-SamTools/lib/Bio/DB/Sam.pm
D4M attempts to combine the advantages of five distinct processing technologies (sparse linear algebra, associative arrays, fuzzy algebra, distributed arrays, and triple-store/NoSQL databases such as Hadoop HBase and Apache Accumulo) to provide a database and computation system that addresses the problems associated with Big Data.