Page not found

What are some user-friendly ways to use Second Tier Storage via S3?

Before getting started, you will need to fetch your s3 credentials . The s3 credentials act like a username and password for graphical interfaces to s3.msi.umn.edu. There are several graphical clients that support S3 and that make transferring files as easy as dragging and dropping. These clients...

How can my programs interface with Second Tier Storage?

Application Programming Interface (API) Support for S3 Access For advanced tasks or in writing your own software, you may want to interact with Second Tier Storage directly through the S3 programming API. Libraries exist to do this from many programming languages. boto is a useful Python library...

Data-Storage Case Study

The Active Archive Alliance has released a case study about MSI’s Spectra Logic file archive system. The case study can be found on the linked page under the "University of Minnesota" header. The Active Archive Alliance is a collaborative industry association dedicated to educating users about...

Ceph in HPC Environments at SC15

Overview Individuals from MSI , UAB , RedHat Inc. , Intel Corp ., CADRE , and MIMOS came together at SC15: The International Conference for High Performance Computing, Networking, Storage and Analysis on Wednesday, November 18, 2015 in Austin, TX to share their experiences with Ceph in HPC...

Brain Imaging With Serial Optical Coherence Scanning

Abstract: 

Brain Imaging With Serial Optical Coherence Scanning

These researchers are developing an optical imaging technique called serial optical coherence scanning (SOCS) that will be used for studying brain anatomy. The research will enable a comprehensive three-dimensional reconstruction of the brain and cerebellum, and support quantitative assessments on white matter content and organization.

Return to this PI's main page.

Group name: 
akkint

PacBio SMRT Analysis Portal

Software Support Level: 
Primary Support
Software Description: 

The PacBio Single Molecule Real Time (SMRT) analysis portal is an easy-to-use web-based platform for analyzing 3rd generation sequencing data generated from the PacBio SMRT platform.  Currently, workflows for microbial whole genome assembly, resequencing analysis, transcriptome analysis and various data processing steps are available through the portal.  For more information on the analysis portal itself, see http://www.pacb.com/devnet/and the tutorial materials. The software must be run from a browser in the MSI network.  This can be achieved via connection through the NICE interface, or by working directly in one of the MSI laboratories. Due to limits in RAM the portal does not run reliably on the lab queue, so execution is supported for Mesabi only. Genomes up to 100 Mbp in size can be successfully run on Mesabi.

 
 

 

Software Access Level: 
Open Access
Software Categories: 
Software Interactive/GUI: 
No
General Linux Documentation: 

Instructions for SMRT Link version 3

Initial setup (only needed once unless re-installing)

  1. Get a MSI account (https://www.msi.umn.edu/content/eligibility-getting-access)
  2. Setup your NICE client. (https://www.msi.umn.edu/support/faq/how-do-i-obtain-graphical-connection-using-nice-system). You will need to download the DCV client first.  You must use NICE to access the PacBio analysis portal at MSI remotely or you can come use the computer lab in Walter 575 or Cargill 138.
  3. SSH to MSI, then in the Terminal type: "/home/support/public/smrtlink311/install.sh" then hit return.  This will set up the pacbio portal files in your home directory under the folder name "smrtlink".

Running the PacBio portal (Mesabi queue for genomes < 100 Mbp)

Note: you must request a service unit (SU) allocation on Mesabi before proceeding with these instructions.

  1. Open an NICE session (non-GPU session, with more RAM and time for larger genomes). 
  2. Within the NICE session open a terminal
  3. In the Terminal:
    Type: "ssh -Y login" then enter your MSI account password and hit return to enter the gateway login node.
    Type: "ssh -Y mesabi" then enter your MSI account password and hit return to enter the HPC system.
    Type: "qsub -I -l nodes=1:ppn=8,walltime=24:00:00 -X" then hit return. [NOTEthe first -I is a capitol i and the second -l is a lowercase L.] 
    [NOTE2: if you have a larger genome, e.g.,  > ~30 Mbp, see advanced user tips below]
    When prompted enter your MSI account password then hit return
    Wait for job to start
    Type: "/home/support/public/smrtlink311/start.sh" then hit return.  This will start the portal server going.  Copy down the URL for use in the next step.
  4. In the same Terminal window and isub session:
    Type: "firefox &" then hit return
    When Firefox opens, enter the URL you copied down in the previous step into the browser address bar.  It will look something like this "http://cn0575:9090/".
  5. Note: If you receive an error message about needing to use Chrome, please follow these steps:
    • In the address bar type "about:config" and hit return.
    • In the list that appears, right-click, and select New -> String from the pop-up menu.
    • For "Enter the preference name", enter "general.useragent.override" (without quotes) and click "OK".
    • For the string value, paste "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2895.0 Safari/537.36" (without quotes) and click "OK".
  6. Do not exit the browser until the job is complete.  You can always close out the NICE server and Save your session, reconnecting later as you please.  The job will run in the background.
  7. When you are all complete be sure to clean up the session by running the following script: Type: "/home/support/public/smrtlink311/stop.sh" then hit return. Then type "exit" to exit your isub session.

 

Changing from local jobs to PBS

Edit `userdata/config/preset.xml`, change "False" to "True" for pbsmrtpipe.options.distributed_mode.
<!-- Enable Distributed Mode -->
<option id="pbsmrtpipe.options.distributed_mode">
    <value>True</value>
</option>
 
Edit `userdata/config/smrtlink.config`, change "NONE" to "PBS" for jmsselect__jmstype.
jmsselect__jmstype='PBS';

Instructions for SMRT Link version 2

Initial setup (only needed once unless re-installing)

  1. Get a MSI account (https://www.msi.umn.edu/content/eligibility-getting-access)
  2. Setup your NICE client. (https://www.msi.umn.edu/support/faq/how-do-i-obtain-graphical-connection-using-nice-system). You will need to download the DCV client first.  You must use NICE to access the PacBio analysis portal at MSI remotely or you can come use the computer lab in Walter 575 or Cargill 138.
  3. Open a NICE session. Choose one of the non-GPU options that meets your needs for time and RAM usage.
  4. Within the NICE session open a terminal
  5. In the Terminal:
    Type: "isub -m 8gb -w 24:00:00" then hit return
    When prompted enter your MSI account password then hit return
    Wait for job to start
    Type: "/home/support/public/smrtanalysis230v2/pacbio_user_setup_230.sh" then hit return.  This will set up the pacbio portal files in your home directory under the folder name "smrtanalysis".

Running the PacBio portal (Mesabi queue for genomes < 100 Mbp)

Note: you must request a service unit (SU) allocation on Mesabi before proceeding with these instructions.

  1. Open an NICE session (non-GPU session, with more RAM and time for larger genomes). 
  2. Within the NICE session open a terminal
  3. In the Terminal:
    Type: "ssh -Y login" then enter your MSI account password and hit return to enter the gateway login node.
    Type: "ssh -Y mesabi" then enter your MSI account password and hit return to enter the HPC system.
    Type: "qsub -I -l nodes=1:ppn=8,walltime=24:00:00 -X" then hit return. [NOTEthe first -I is a capitol i and the second -l is a lowercase L.] 
    [NOTE2: if you have a larger genome, e.g.,  > ~30 Mbp, see advanced user tips below]
    When prompted enter your MSI account password then hit return
    Wait for job to start
    Type: "/panfs/roc/pacbio/start_user_portal.sh" then hit return.  This will start the portal server going.  Copy down the admistrator username/password and URL for use in the next step.
  4. In the same Terminal window and isub session:
    Type: "firefox -no-remote &" then hit return
    When Firefox opens, enter the URL you copied down in the previous step into the browser address bar.  It will look something like this "http://cn0575:8080/smrtportal/".
    When prompted for your username and password, enter the administrator username/password you copied in the previous step.
  5. Do not exit the browser until the job is complete.  You can always close out the NICE server and Save your session, reconnecting later as you please.  The job will run in the background.
  6. When you are all complete be sure to clean up the session by running the following script: Type: "/panfs/roc/pacbio/stop_user_portal.sh" then hit return. Then type "exit" to exit your isub session.

Advanced users hints

Queue speedups

If you installed your PacBio portal prior to September 25, 2015, your portal is probably set up to use the PBS system, which tends to experience serious delays when running on Mesabi.  The portal works much better in multi-threaded mode, rather than in cluster mode.  So, you'll need to change a few things in a couple of config files.  Edit the following 2 files:

$HOME/smrtanalysis/install/smrtanalysis_2.3.0.140936/analysis/etc/user.smrtpipe.rc

$HOME/smrtanalysis/install/smrtanalysis_2.3.0.140936/analysis/etc/smrtpipe.rc

change CLUSTER_MANAGER = PBS to CLUSTER_MANAGER = BASH in both of those files.

You can continue to follow the instructions above (mesabi section).  When you run 'top', you should now see many processes happening in your local node, and qstat -u USERNAME should only show your single interactive batch job.

Adding more processor cores and memory for genomes > 30 Mbp

By default, we've set up the configuration files to use only 8 processor cores and 24 hours of walltime.  But if you have a large genome, you will greatly benefit from increasing these limits, and you may need more memory.  On Mesabi, you may request up to 96h of walltime and 32 processor cores on the ram1t nodes (See queue table specs here).  To take advantage of these increases, edit the 2 files:

$HOME/smrtanalysis/install/smrtanalysis_2.3.0.140936/analysis/etc/user.smrtpipe.rc

$HOME/smrtanalysis/install/smrtanalysis_2.3.0.140936/analysis/etc/smrtpipe.rc

change MAX_THREADS = 8 to MAX_THREADS = 32 in both of those files (assuming 32 is the number of cores you want to use).

change TMP = /tmp to TMP = /scratch.global/<your-user-name> to avoid overflowing the memory in /tmp for large genomes.

Then login to mesabi or itasca by ssh-ing to one of those machines and submit a request for an interactive queue submission: "qsub -I -l nodes=1:ppn=32,walltime=96:00:00 -q ram1t -X", for example.  Then follow the normal instructions.

Troubleshooting errors

If you get an error in the setup or running of the PacBio server, try the steps once more.  If it still fails, try the following:

  1. Open an NICE session. 
  2. Within the NICE session open a terminal
  3. In the Terminal:
    Type: "isub -m 8gb -w 24:00:00" then hit return
    When prompted enter your MSI account password then hit return
    Wait for job to start
    If you wish to save data from previous runs, move or make a copy of your current ~/smrtanalysis directory before proceding to the next step.
    Type: "/panfs/roc/pacbio/delete_user_portal.sh" then hit return.  This will delete your existing portal data and pending jobs.  Exit the current isub session by typing: "exit" and retry the steps for running the portal above.
  4. If you continue to have problems, send a email to "help@msi.umn.edu", being careful to include "trouble running PacBio portal" in the subject line.
 

Facilities Overview (Full)

Established in 1983, the Minnesota Supercomputing Institute (MSI) is the University of Minnesota's principle center for computational research. MSI provides services to over 560 active groups that sponsor more than 3,300 unique users from 19 different university colleges, maintaining an array of...

Gamma Ray Astrophysics; Zooniverse Crowdsourcing Science

Abstract: 

Gamma Ray Astrophysics; Zooniverse Crowdsourcing Science

The Fortson research group is focused on two main research areas, each of which can require MSI resources.

  • Gamma Ray Astrophysics: VERITAS is an array of four imaging atmospheric Cherenkov telescopes (IACTs), located at the F. L. Whipple Observatory in southern Arizona. The array has been detecting extraterrestrial gamma rays since 2007. In order to properly calibrate the results, large amounts of simulation and data processing are required. In addition to VERITAS, the next-generation gamma-ray experiment CTA, with a factor of 10 improvement in sensitivity over existing arrays, is finalizing development of its low-level systems. One key system is the triggering and event building stage, which collects and associates information from telescopes spread over several square kilometers.  

    The Fortson group at UMN has responsibilities for both VERITAS and CTA development. For VERITAS, they produce a large fraction of the simulations necessary for calibrating the instrument and performing analysis on the data. More processing capability allows them to explore a larger parameter space of observational conditions. Different atmospheric humidity and aerosol content between summer and winter require them to repeat these simulations. Another important example of the importance of simulations is to track the performance as the array hardware is upgraded.

    For CTA, the group is developing a novel use of self-assembly algorithms to generate a self-annealing event building architecture. These algorithms are meant to better cope with the high data rate and correspondingly high failure rates. These failures include network errors, timing errors, and other hardware errors. The ability of the CTA event builder to correctly identify the information associated with a particular gamma-ray atmospheric shower is vital to the success of this large-scale project.

    Supercomputing resources are also required for running NASA Fermi LAT gamma-ray analysis. Typically this is run in several stages depending on the data products required such as counts maps, test statistic maps, spectra and light curves. For example, to perform a standard binned analysis on a single gamma-ray source (using all the photons collected by the Fermi satellite to date) this typically requires about 2GB of disk storage space with memory usage between 2 to 4GB using approximately 15 CPU hours. This example is for a Log Likelihood analysis of an object situated away from the Galactic plane where the relative number of nearby Fermi sources is smaller and the diffuse background emission low. For an object on or close to the Galactic plane the same analysis could easily take 30 CPU hours depending on the number of sources to be included in the Log Likelihood fit. For data products such as a test statistic map which can only be generated once the standard analysis is complete, this requires significantly longer CPU hour usage e.g. ~168 CPU hours. This is because a maximum likelihood computation is performed on each and every pixel in the requested map. Typically, computing jobs using the Fermi LAT analysis tools are submitted serially to a batch management system.  The group expects to analyze several dozen Fermi LAT sources this year.

  • The Zooniverse is the world’s largest online citizen science platform and several members of the Fortson group are involved in the development and analysis of Zooniverse project data. It is likely that the Fortson group will need to use MSI resources about two-three times during 2016 to batch process hundreds of thousands of images in preparation for their upload to the Zooniverse site.

This PI's work in translational informatics and the Zooniverse project was featured in an MSI Research Spotlight in November 2014.

Return to this PI’s main page.

Group name: 
fortson

Pages