New Data Storage Option for MSI Researchers

In the fall of 2014, MSI added a Ceph object storage system as an option for second-tier storage for MSI users. Many MSI researchers are in disciplines that use huge amounts of data, such as informatics, genomics, and astrophysics. Often, much of this data does not need to be stored on a machine with high-speed access. This is where a good second-tier storage option is valuable.

The Ceph system currently has 1.4 PB of space for data storage. Its access features allow researchers to exchange data with colleagues outside the University of Minnesota, port cloud-based workflows to MSI systems, and store inactive data separately from the high-performance systems. Several MSI researchers, shown below, have already begun using the Ceph system.

Professor Shaul Hanany (Physics and Astronomy/Minnesota Institute for Astrophysics; College of Science and Engineering) and his research group are analyzing data from the E and B EXperiment (EBEX), a NASA-funded balloon-borne polarimeter designed to measure the polarization of the cosmic microwave background and to develop methods for subtraction of foreground sources from these data. The first science flight of EBEX collected 1 TB of raw data. The Hanany group is performing simulations to understand the uncertainties and systematic effects associated with the data analysis.

Assistant Professor Suzanne McGaugh (Ecology, Evolution, and Behavior; College of Biological Sciences) uses MSI to support genomics and transcriptomics studies in cavefish, reptiles, and other animals. A recent Research Spotlight featured a paper by Professor McGaugh and her colleagues that disclosed the first de novo genome assembly for the cavefish Astyanax mexicanus, the Mexican tetra fish.

Assistant Professor Peter Morrell (Agronomy and Plant Genetics; College of Food, Agricultural, and Natural Resource Sciences) is studying the effect of domestication and strong selection during crop improvement on the level of deleterious mutations in self-fertilizing crops. This involves DNA and RNA sequence assembly, identification of single nucleotide polymorphisms, and using probabilistic approaches to determine the proportion of mutations that are likely to be deleterious.

All these groups have placed terabytes of their data onto the Ceph system. This has freed up space in each group’s allotted quota on the high-performance storage systems, creating a more efficient use of those systems.

More information about MSI’s data storage resources can be found on the MSI website. Specific information about second tier storage is also available. Any questions can be sent to

posted on March 4, 2015

See all Research Spotlights.