Page not found

Ceph in HPC Environments at SC15

Overview Individuals from MSI , UAB , RedHat Inc. , Intel Corp ., CADRE , and MIMOS came together at SC15: The International Conference for High Performance Computing, Networking, Storage and Analysis on Wednesday, November 18, 2015 in Austin, TX to share their experiences with Ceph in HPC...


Stata is an integrated statistical package that provides extensive functionality for data analysis, data management, and graphics. To use Stata you must be on the Stata user list. Contact MSI Help, , to be added to the Stata user list. We have two Stata-MP8 version 13 licenses. We ask that all users use the calendaring program to sign up for time on Stata. If someone is signed up in the spot that you want, use a different calendar. Up to two users can sign up for the same time slot, one user per calendar. Please specify your name on the calendar. You can use the following...


NBO is a program for generating natural bond orbitals. NBO is a module that can be built into many electronic structure programs. NBO is currently built into the following versions of GAUSSIAN: None

Optimization of an RNA-Seq Hybrid Pipeline


Optimization of an RNA-Seq Hybrid Pipeline

RNA-Seq is an important technique that relies on next-generation sequencing to study tumor cells, in particular to identify gene and transcript expression. The procedure to carry out this type of study requires the use of pipelines, which involve starting with fragments of cDNA obtained from the sequencing machine and running them through a series of applications or genomics tools until results such as alignment, visualization, expression, differential expression, isoforms identification, and alternative expression are obtained. Pipeline optimization is important, not only in terms of accuracy but in terms of time to solution. It has been shown that a single splice aligner many times is not sufficient to study certain regions of the transcriptome.

This project involves the development of an ensemble approach to better capture splicing. In addition, this pipeline can be defined as a “hybrid” pipeline because it complements a traditional pipeline based on splice aligners with a de novo assembler. The use of a de novo assembler helps provide and identify novel transcripts. De novo assemblers have the advantage that they do not require a reference genome. The researchers use a Trinity RNA-Seq de novo assembler developed at the Broad Institute of Harvard and MIT. In addition to proposing optimizing accuracy, they also look at time to solution. To this end, the researchers have started parallelizing Trinity to efficiently use multiple nodes on any standard Linux cluster.

Return to this PI's main page.

Group name: 

PacBio SMRT Analysis Portal

The PacBio Single Molecule Real Time (SMRT) analysis portal is an easy-to-use web-based platform for analyzing 3rd generation sequencing data generated from the PacBio SMRT platform. Currently, workflows for microbial whole genome assembly, resequencing analysis, transcriptome analysis and various data processing steps are available through the portal. For more information on the analysis portal itself, see and the tutorial materials . The software must be run from a browser in the MSI network. This can be achieved via connection through the NICE interface , or by...

How do I setup SSH keys?

Why Use SSH Keys SSH Keys in Linux / Mac SSH Keys in Windows Optional: Multi-hop Connections (Connect to HPC Systems With One Command) Why Use SSH Keys When connecting through the server it may be preferable to use SSH keys. SSH keys provide a more secure form of remote...


ALLPATHS-LG is a whole-genome shotgun assembler that can generate high-quality genome assemblies using short reads (~100bp) such as those produced by the new generation of sequencers. The significant difference between ALLPATHS and traditional assemblers such as Arachne is that ALLPATHS assemblies are not necessarily linear, but instead are presented in the form of a graph. This graph representation retains ambiguities, such as those arising from polymorphism, uncorrected read errors, and unresolved repeats, thereby providing information that has been absent from previous genome assemblies.