All Tutorials

  • Thu Nov 5, 1:00 pm - 3:00 pm
    Walter Library

    As the SNP data can be rather easily generated from either next generation sequencing or resequencing arrays, the further comprehensive post processing analyses become more important.

    Helixtree is an unique and leading software application for

    population-based candidate gene, genome-wide association, and family-based SNP and CNV (copy number variation) associations with diseases, drug resistance, or other phenotype properties by using SNP data.

    It also analyzes single marker and haplotype, calculates LD and HWE, identifies tagging SNPs, and more.

    In this webnar, we will use the same projector screen and watch the demonstration. Please bring your own laptop and you can follow the demonstration by playing your own data. This is an interactive tutorial, and is a good chance to have questions and discussion.


    Data Import & Preparation

    Importing Affymetrix/Illumina WG SNP & CN Data

    Importing/preparing phenotype data & merging with genotype data

    Merging data from different genotyping platforms or arrays

    Working with annotation files

    Quality Assurance

    Filtering SNP Data

    Principle Component Analysis

    CN Batch Effect Removal

    CN Outlier Removal


    SNP Association Tests

    LD & Haplotype Analysis

    Runs of Homozygosity Analysis

    CNV Detection with Segmentation

    CNV Association Test

    Visualization of Data

    Creating QQ Plots

    Creating Manhattan Plots

    Visualizing CN Data

    Using Genome Browser

    Creating Heat Map from CNV data

    Exporting Images for Publication


    How To Share Your Analysis Project with Colleagues

  • Tue Nov 3, 1:00 pm - 3:00 pm
    Walter Library

    The SAS system started out in the 1970s as a software package for statistical analysis, but is now a diverse family of integrated products that can be used as building blocks for constructing a complete data analysis system for data sets ranging from small to very large. In this tutorial, we provide an introduction to SAS. Among other skills, we will learn to load, create, and combine data sets in different ways, generate random samples, post results on the Web, and make simple plots. We will also cover writing basic code, using common statements and functions in SAS, and using statistical procedures. This tutorial is appropriate for people who have data that need to be processed and analyzed statistically.

    This tutorial will be webcast. See for more information.

  • Thu Oct 29, 1:00 pm - 3:00 pm
    Walter Library

    The Parallel Computing Toolbox software extends the MATLAB language with high-level parallel processing constructs such as parallel for-loops, distributed arrays, parallel numerical algorithms, and message-passing functions that let you exploit data and task parallelism in your applications.

    This tutorial aims at teaching MSI’s users how to convert serial MATLAB programs to parallel MATLAB programs and how to use the built-in functions.

    This tutorial consists two parts:

    1) 1-hour lecture to describe the MATLAB functions

    2) An optional 1-hour, hands-on session focusing on the use of the parallel computing toolbox.

  • Wed Oct 28, 1:00 pm - 3:00 pm
    Walter Library

    MaxQuant is a suite of algorithms for analysis of high-resolution mass spectrometry (Orbitrap and FT) data. It can be used for protein identification for non-labeled samples and identification and quantification for SILAC-labeled samples. MaxQuant includes all steps needed in a computational proteomic platform except that it uses the Mascot search algorithm for peptide identification.

    In brief, raw data acquired on Orbitrap, is processed using MaxQuant’s “Quant” module. The processed data is further searched with Mascot. Eventually, the Mascot search output files are subjected to statistic analysis and protein

    grouping using the “Identify” module. Apart from this, information regarding Gene Ontology, Pfam domain and TRANSFAC overrepresentation is also provided.

    In the tutorial we will discuss the algorithm, the recommended experimental setup, and hardware requirements. We will also include a walk-through of how to use MaxQuant.

  • Wed Oct 28, 10:30 am - 12:00 pm
    Walter Library

    Computational modeling of proteins is a complex task, and there are many computational tools out there to answer many types of questions. The goal of this lecture is to give those not familiar with the tools of protein molecular mechanics an idea what tools are available in the field and what kind of questions they can help answer.

    This tutorial will touch on a wide array of computational techniques used to model protein structure, including Energy Minimizations, homology modeling, Monte Carlo simulations, and various types of Molecular Dynamic simulations, and will discuss what kind of information can be gathered from such techniques.

  • Tue Oct 27, 1:00 pm - 3:00 pm
    Walter Library

    EnSight is a post-processing package for scientific and engineering data. EnSight provides a set of tools to help with many types of analysis, visualization and communication. With EnSight you can create contours, isosurfaces, particle traces, vector arrows, elevated surfaces, profile plots and much more. Ensight also supports animation and VR.

    Dave Baumgartner, Senior Software Developer from CEI, makers of Ensight, will be presenting the Ensight tutorial. There will be an opportunity for you to speak directly with Dave about your research projects. If you would like to see a particular topic included in the tutorial or would like to schedule time to speak with Dave, contact Nancy Rowe at

  • Thu Oct 22, 1:00 pm - 3:00 pm
    Walter Library

    Understanding the regulatory mechanisms of gene expression is the major function of genomics. The Trans/Cis-regulatory elements are regions of DNA or RNA that regulate the expression of genes. Except the experimental discovery, many bioinformaticians have developed various algorithms for predicting transcriptional regulatory mechanisms from the sequence, gene expression and interaction data. MSI has several such tools and databases for Trans/Cis-regulatory element analysis.

    This tutorial will introduce tools for Trans/Cis-regulatory element analysis including Transfac, MEME, Hmmer, and Bioprospector, and will demonstrate their use using sample sequence data.

  • Wed Oct 21, 1:00 pm - 3:00 pm
    Walter Library

    Statistics plays a key role in many scientific research projects. Increasingly, the R statistical platform is being used to perform such analysis. R, an open source version of the S statistical language developed by Bell Laboratories in the 1980s, together with its commercial offshoot, S-PLUS, are powerful computing environments and languages for statistical computing and graphics.

    More specifically, these software packages provide capabilities for reading a wide variety of data formats, performing simple and sophisticated data manipulations, applying statistical tests and computations, and graphing the data and the results of the statistical analysis.

    This tutorial will demonstrate how to log in to MSI’s computers to get started with R and S-PLUS and how to read and manipulate data in the R and S-PLUS environments. In addition, attendees will learn to perform basic statistical analysis in R and to produce graphics. Both R and S-PLUS run on a wide variety of UNIX platforms, Windows and Mac OS. They are available on the computers at MSI, including the supercomputers, providing an ideal platform for long run-time and/or large memory R or S-Plus applications.

  • Tue Oct 20, 1:00 pm - 3:00 pm
    Walter Library

    Python is a general purpose programming language with a rich syntax and structure. It has many modules specializing in various topics of interest to the scientific community such as mathematical and textual processing. Further, it serves as an interface for a huge number of scientific applications which have modules written in Python, are developed entirely in Python, or are extended and controlled with Python. This tutorial is divided into two components. An introduction to programming with Python covers control structures, data types, functions, mathematical and logical operators, and program input and output. The second part of the tutorial will address several special topics such as issuing commands to the UNIX shell, reading and writing spreadsheet files, creating XML documents, and performing mathematical operations like a Pythonista.

    This tutorial will be webcast. See for more information.

  • Tue Oct 13, 10:00 am - 4:00 pm
    Walter Library

    This one-day, hands-on workshop will introduce how to write a parallel program using MPI and will help researchers write better and portable parallel codes for distributed-memory Linux clusters. The tutorial will focus on basic point-to-point communication and collective communications, which are the most commonly used MPI routines in high-performance scientific computation. In addition, the advantage of using MPI non-blocking communication will be introduced. Each session of the workshop will combine a lecture with hands-on practice. The lecture will introduce basic principles, and the hands-on portion will focus on the use of MPI principles via examples.

    Session One: Introduction to basic concepts of MPI, centering on point-to-point communication.

    Session Two: MPI collective communications including broadcast, gather, scatter, and All-to-All. Programming will be done in Fortran and C, so any background in these two languages will be helpful.