Lecture Slides

Interacting with dbGaP data on Stratus

MSI has deployed a cloud service for research computing called Stratus. In its initial iteration, Stratus is designed expressly to satisfy the requirements set forth by the NIH Genomic Data Sharing (GDS) Policy for data from the Database of Genotypes and Phenotypes (i.e., dbGaP data). This tutorial introduces Stratus to users who wish to process dbGaP data at MSI, and gives them an interactive lesson on how to access the service, deploy their first virtual machines, and move data through multiple tiers of storage.

Analyzing ChIP-Seq Data using Galaxy

This practical, hands-on tutorial is designed to give participants experience with ChIP-Seq data analysis using the Galaxy platform. The analysis in this tutorial is typical of experiments using ChIP-Seq data to identify transcription factor binding sites in eukaryotic, high quality genomes.

Digital Gene Expression (DGE) Analysis Using Galaxy, Human Data

This is a practical, hands-on tutorial designed to give participants experience with RNA-Seq data analysis using Tophat, Cufflinks, and CummRbund in Galaxy. The analysis in this tutorial is typical of experiments in eukaryotic species with high-quality genomes and genome annotation available. Participants are expected to be familiar with next-generation sequence data, basic theory of RNA-Seq, and Galaxy. Participants do not need previous experience with Tophat, Cufflinks, or CummRbund.

Basics of RNA-Seq Data Analysis - Lecture

This lecture will cover the basics of RNA-Seq experimental design and data quality assessment, followed by an overview of data analysis for the detection of differentally expressed genes.  Specific subtopics include:

Analysis of PacBio Sequencing Data Using SMRT Portal

This hands-on tutorial will cover installation and use of the SMRT portal at MSI to analyze PacBio sequencing data. The basics of full genome assembly and transcript assembly will be covered.  At the end of this tutorial, participants should be able to:

PacBio Sequencing - Lecture

This lecture will cover the special capabilities and use cases of PacBio sequencing as well as the basics of data analysis. Specific subtopics include:

  • Technology overview (physical basis of sequencing, pros and cons compared with other sequencing technologies)
  • De novo assembly applications (N50 and other assembly concepts, HGAP algorithm, diploid assembly)
  • IsoSeq transcriptome assessment (motivation, experimental procedure, biological applications, analysis approaches)
  • Visualization of PacBio data with new IGV features

Python for Scientific Computing

Python is a modern general purpose programming language that is popular in scientific computing for its readable syntax and extremely rich ecosystem of scientific and mathematical modules. The morning section will provide an introduction to some widely used packages, including common idioms for manipulating and visualizing data. The afternoon section will cover advanced modules and techniques relevant to high performance computing.

Analyze ChIP-Seq Data at the Command Line

This tutorial is paired with Analyzing ChIP-Seq Data using Galaxy and will take the user though the same steps but, using the command line versions of the tools used in the Galaxy environment. This tutorial will:

1. Provide a brief introduction to MSI systems.

2. Provide a very brief introduction to UNIX.

3. Take users step-by-step though the process needed to analyze ChIP-Seq data

4. Provide users with a basic PBS script to automate the mapping and peak calling.

5. Teach users how to edit and run the script to be used in the future.

Data Storage Systems and Data Analysis Workflows for Research

In this tutorial you will learn about the data storage systems available for academic research at the University of Minnesota. An overview of the kinds of storage systems that are available, policies for getting access to them, a comparison of their characteristics, and examples of how they can be accessed will be presented. You will also be given an overview of how the characteristics of UMN storage will impact the stability and throughput of various applications and workflows.


This tutorial provides an introduction on how to write a parallel program using OpenMP, and will help researchers write better and more portable parallel codes for shared memory Linux nodes. The course will cover the Compiler Directives (44), Runtime Library Routines (35), and Environment Variables (13) relevant to OpenMP. OpenMP supports C/C++ and Fortran implementations. Examples of how to enable OpenMP on the Intel, GNU, and PGI compilers will be given. The fork-join model of thread parallel execution will be described.