Page not found

Pollution and Race

posted on June 4, 2014 A study headed by MSI PI Professor Julian Marshall ( Civil Engineering ; Institute on the Environment ) has been receiving a great deal of recent media attention. The study analyzes pollution distribution across the U.S. and shows that people of color are exposed to higher...

Choosing a Job Queue

Summary Most MSI systems use job queues to efficiently and fairly manage when computations are executed. A job queue is an automated waiting list for use of a particular set of computational hardware. When computational jobs are submitted to a job queue they wait in the queue in line until the...

SGA

Software Description: 

From the SGA GitHub repository:

Overview
 
SGA implements a set of assembly algorithms based on the FM-index. As the FM-index is a compressed data structure, the algorithms are very memory efficient.
 
An SGA assembly has three distinct phases. The first phase corrects base calling errors in the reads. The second phase assembles contigs from the corrected reads. The third phase uses paired end and/or mate pair data to build scaffolds from the contigs. Example real-data assemblies can be found here.
 
Error Correction
 
The first stage of the assembly. An FM-index of the sequence reads is constructed, then base calling errors are identified by finding low-frequency k-mers in the reads. The output from the error corrector is a set of FASTQ files containing the corrected read sequences.
 
Contig Assembly
 
An FM-index of the corrected sequence reads is constructed. Duplicate reads, and low-quality reads after correction, are found and discarded with the sga filter subprogram. For large genomes, the sga fm-merge program can be used to merge together reads that can be unambiguously assembled. sga overlap computes the structure of the string graph and contigs are built using sga assemble.
 
Scaffolding
 
The scaffolding module of sga begins by re-aligning reads to the contigs built in the previous step. The copy number of each contig, and distances between contigs, are estimated from the resulting BAM files and used as input to sga scaffold. The output of sga scaffold is passed to sga scaffold2fasta which produces a FASTA file of the resulting scaffold sequences.
Software Support Level: 
Secondary Support
Software Access Level: 
Open Access
Software Categories: 
Software Interactive/GUI: 
No
General Linux Documentation: 

SGA is available via the modules system

module load sga

The source directory contains examples of real assemblies using SGA. You should read these scripts or (better) download the data for one of the smaller genomes (I recommend the C. elegans data set) and run the example yourself. This will help you get understand the SGA pipeline so you can run the assembler effectively on your own data.

cd $SGA_EXAMPLES

to access the example files.

ddt

Software Support Level: 
Secondary Support
Software Interactive/GUI: 
No
General Linux Documentation: 

To load this software in a Linux environment run the command(s):

module load ddt

fslview

Software Support Level: 
Secondary Support
Software Interactive/GUI: 
No
General Linux Documentation: 

To load this software in a Linux environment run the command(s):

module load fslview

Development of High-Performance Methods for Spanning Multiple Length and Time Scales

Abstract: 
<h3 class="red">Development of High-Performance Methods for Spanning Multiple Length and Time Scales</h3><p>This project focuses on the development and application of high-performance methods for spanning multiple length and time scales in atomistic simulations. Efforts will focus on a number of directions:</p><ul><li>Development of a high-performance 3D implementation of the spatial multiscale Quasi-Continuum (QC) method that greatly reduces the computational cost of atomistic simulations by only retaining atomistic resolution where necessary and using a continuum approximation elsewhere. MSI resources are used to test different parallelization strategies and to perform QC production runs in a project related to the fracture of silicon MEMS devices.</li><li>Study of the fracture of single and polycrystalline silicon samples. This includes both practical aspects of fracture of silicon fabricated devices such as MEMS devices as well elucidation of the fundamental physics of dynamic fracture. Studies will include both molecular dynamics (MD) simulations as well as QC3D simulations as noted above.</li><li>Development of a method within the&nbsp;<a href="https://openkim.org">Knowledgebase of Interatomic Models (KIM) project</a>&nbsp;for assessing the transferability of interatomic potentials used in atomistic and multiscale simulations by comparing their predictions to density functional theory (DFT) calculations. MSI resources are used to perform DFT calculations to obtain high quality reference data.</li><li>Development of MD simulations of interpenetration at polymer interfaces to better understand the role of interface structure on polymer adhesion. Both all-atom and coarse grained (multiscale) simulations will be performed.</li></ul><p>A <a href="https://www.msi.umn.edu/content/simulating-novel-properties-nanomaterials">Research Spotlight</a> featuring the group&#39;s work appeared on the MSI website in July 2014.</p><p>Return to this PI&rsquo;s <a href="https://www.msi.umn.edu/pi/a2d0136dedbd43ca6059dcf0b8161f22/18300">main page</a>.</p>
Group name: 
luskin

MN Congenital Heart Network

Abstract: 
<h4>MN Congenital Heart Network</h4><p>The Minnesota Congenital Heart Network (MCHN) is dedicated to improving outcomes for children with congenital heart disease (CHD). To facilitate the sharing of clinical research data the MCHN developed a federated system that allows each site to capture and control data locally while enabling partner sites to query the data through a common interface. This system will facilitate multicenter cohort identification and evaluation of treatment strategies, with the goal of improving outcome and quality of life for children with CHD. Access is required to MSI systems to coordinate data exchange and access to shared code on MSI SVN servers.</p>
Group name: 
kocherjp

Statistical Models for Dependent High-Dimensional Data

Abstract: 
<h3 class="red">Statistical Models for Dependent High-Dimensional Data</h3><p>These researchers are involved in several projects using MSI.</p><ul><li><strong>Statistical Methods for Spatial Data:</strong> This research on spatial models has focused on regression inference for areal, i.e., spatially aggregated, data. Areal data are common in many fields, including forestry, marketing, epidemiology, image analysis, and ecology. Since investigators in these fields are often interested in scientific explanations rather than, or in addition to, predictions, spatial regression is important.</li><li><strong>Spatiotemporal Inference for fMRI Data:</strong> Typical fMRI experiments generate large datasets that exhibit complex spatial and temporal dependence. Fitting a full statistical model to such data can be so computationally burdensome that many practitioners resort to fitting oversimplified models, which can lead to lower quality inference. These researchers have developed a full statistical model that permits efficient computation.</li><li><strong>Joint Models for Longitudinal Data:</strong> The researchers have developed semiparametric and nonparametric joint models for multidimensional longitudinal outcomes. Although they focus on revealing time-varying dependence relationships, the frameworks accommodate all manner of time- varying parameters for the coordinate processes: regression coefficients, variances, etc. These methods will allow researchers to reveal complex dynamic patterns of dependence and response&ndash;predictor relationships.</li><li><strong>Piecewise Growth Mixture Models:</strong> This project focused on piecewise growth mixture models (PGMM), a special case of the finite mixture of multinormals. The researchers investigated Bayesian inference for PGMMs and maximum likelihood inference by expectation maximization.</li><li><strong>Bayesian Inference for Gaussian Copula Regression Models:</strong> Gaussian copula regression models (GCRM) provide a flexible, intuitive framework for modeling high- dimensional dependent outcomes. When such outcomes are discrete, the likelihood is computationally intractable because the running time grows exponentially in the sample size. These researchers developed three computationally feasible approaches to Bayesian inference for GCRMs with discrete outcomes.</li></ul><p>Return to this PI&#39;s <a href="https://www.msi.umn.edu/pi/9f0581be2107764b9707ab4e125b2f81/10556">main page</a>.</p>
Group name: 
hughesj

Pages