Page not found

What are the software service levels?

Software is classified according to three service levels: Primary Services Ancillary Services Minimal Services Primary Services These are the major software packages supported by the Minnesota Supercomputing Institute. In the case of application software a primary service is one where the package...

MSI Users Bulletin - September 2015 (Volume 12, Number 3)

The Users Bulletin provides a summary of new policies, procedures, and events of interest to MSI users. It is published quarterly. 1. Xeon Phi Seminars: Intel will be providing a seminar and hands-on training about developing codes for the Xeon Phi coprocessor. The seminar will be held on September...

MSI Users Bulletin - September 2015

Published on September 15, 2015 The Users Bulletin provides a summary of new policies, procedures, and events of interest to MSI users. It is published quarterly. 1. Xeon Phi Seminars: Intel will be providing a seminar and hands-on training about developing codes for the Xeon Phi coprocessor. The...

Astrophysics Simulations

Researchers in the group of Professor Tom Jones (MSI Fellow; Astronomy) are engaged in a long-term, unique, and highly successful study of the dynamics of diffuse, conducting media in astrophysical environments and their roles in mediating the acceleration and propagation of high-energy charged...

MSI Users Bulletin – March 2016

The Users Bulletin provides a summary of new policies, procedures, and events of interest to MSI users. It is published quarterly. To request technical assistance with your MSI account, please contact help@msi.umn.edu . 1. User Accounts: MSI is making changes that will consolidate user accounts and...

MSI Users Bulletin - June 2016

The Users Bulletin provides a summary of new policies, procedures, and events of interest to MSI users. It is published quarterly. To request technical assistance with your MSI account, please contact help@msi.umn.edu . 1. Jupyter Notebooks Service: MSI has opened notebooks.msi.umn.edu to all users...

Checkpointing

Abstract

Checkpointing HPC applications has been a challenging, but highly desired functionality for saving the state of long-running applications. This functionality hedges against failure modes from unexpected events that can cause premature failure of an application.
 

Itasca - where one can do the checkpointing right now.

        login to itasca

              ssh username@itasca.msi.umn.edu

Checkpoint serial jobs

Compile your job

       module load intel

       icc  -o my_test my_app.c  -lcr

or

       ifort -o my_test  my_app.f -lcr  
 

Run your job

      qsub -I -l node=1:ppn=8,mem=10gb,walltime=2:00:00

      cd $wrk # where your job

      cr_run ./my_test &

Find the job PID

   PS=`ps -u szhang | grep my_test`

   PID=${PS:0:6}
 

  PID=`echo "${PID:0:6}" | sed 's/ //g' `
  cr_checkpoint --signal=2 --term $PID

To verify   checkpointing  success

    tail  context.$PID

To checkpoint again and terminate the job 

   cr_checkpoint  --term $PID

To restart the job from the status of last chechpointing

  cr_restart context.$PID

 

Checkpoint OpenMP  jobs

Compile your job

       module load intel

       icc  -o my_test -openmp  my_app.c  -lcr

or

       ifort -o my_test   -openmp my_app.f -lcr  
 

Run your job

      qsub -I -l node=1:ppn=8,mem=10gb,walltime=2:00:00

      cd $wrk # where your job

      export OMP_NUM_THREADS=4

      cr_run ./my_test &

Find the job PID and checkpoint the job

   PS=`ps -u szhang | grep my_test`

   PID=`echo "${PID:0:6}" | sed 's/ //g' `

   cr_checkpoint --signal=2 --term $PID

To verify   checkpointing  success

    tail  context.$PID

To checkpoint again and terminate the job 

   cr_checkpoint  --term $PID

To restart the job from the status of last chechpointing

  export OMP_NUM_THREADS=4

 cr_restart context.$PID

 

Checkpoint MPI  jobs

Compile your job

       module load intel ompi/1.6.3-blcr/intel

       mpicc  -o my_test   my_app.c 

or

       mpif77 -o my_test    my_app.f
 

Where to store the checkpointing file

Please create a  .openmpi in your home directory, generate a file named as mca-params.conf under    .openmpi. the mca-params.conf file should contain the path to the directory where you want to store the checkpointing files. Here is an example:

     cat /home/support/szhang/.openmpi/mca-params.conf
     snapc_base_global_snapshot_dir=/lustre/cr_files
     crs_base_snapshot_dir=/lustre/cr_files/local

Run your job

      qsub -I -l node=4:ppn=8,mem=10gb,walltime=2:00:00

      cd $wrk # where your job

      mpirun  -am ft-enable-cr -np 32 ./my_test &

Find the job PID and checkpoint it:

      pid=`ps -u szhang | grep mpirun`

      jid=`echo "${pid:0:6}" | sed 's/ //g' `

     export jid
     ompi-checkpoint $jid

To verify  checkpointing  success

    ls -al /lustre/cr_files | grep $jid

To checkpoint again and terminate the job 

   ompi-checkpoint --term $jid

To restart the job from the status of last chechpointing

cd  /lustre/cr_files/

 ompi-restart  /lustre/cr_files/ompi_global_snapshot_$jid.ckpt/

 

 

  I/O Performance

 

2017 Research Exhibition

MSI held the eighth annual Research Exhibition on April 25, 2017. This year, besides the judged poster session, the event included a panel discussion about careers in HPC-related fields, plus presentations by industry representatives about current and emerging technologies. Attendees were also able...

Interactive queue use with isub

Notes on isub Use The command isub is an MSI-written wrapper to ssh and qsub, designed specifically for interactive use. When isub is run with default options, it will ssh to a compute node in a pool of nodes reserved for interactive use. These nodes are the lab back-end, so when your shell starts...

Use of Vtune for performance optimization

Outline

Introduction

Use of Vtune on different MSI systems

Interactive profiling

Comand-line options

Profiling MPI applications

Introduction

Intel® VTune™ Amplifier XE 2013 is the premier performance profiler for C, C++, C#, Fortran, Assembly and Java*. It is available on all MSI Linux systems for users to eveluate the performance of your applications (identify and remove the hotspots). The objective is to enable all applications to run efficiently on any MSI systems. Certainly, experienced users can deeply explore each of the performance metrics embedded in Vtune. The performance evaluation process itself can be very benedicial for users to learn and understand  the cutting-edge technologioes available in the HPC world.

Use of Vtune on different systems

The  module vtune  has been set on all systems. One can profile their applications not only through the graphic interface amplxe-gui, but alos by the use  of command-line interface amplxe-cl. The former fits the need of short-time interactive profiling  while the latter is usefulf for collecting infromation during the run-time.  Users who need to do  the Interactive profiling, please go to the section of Find Hotspot for the details.

Table 1: Profiling metrics associated with micro-architecture on differen systems 

System Name Sub-sytem specific features

Itasca - Nehalem processor

General Exploration,  Read Bandwidth; Write Bandwith; Memory Access; Cycles and Ops; Frond End Investigation.

Itasca- Sandy Bridge processor

General Exploration,  Memory Bandwidth; Access contention; Branch Analysis; Client Analysis; Core port Saturation; Cycles and Ops.

Cascade- Knights Corner, phi processor

Lightweight Hotspots; Memory Bandwidth; General exploration

Cascade- Core i7 980x processor

Lightweight Hotspots; Hotspots; Concurrency; Locks and Waits.
Lab Limux workstations Lightweight Hotspots; Hotspots; Concurrency; Locks and Waits.

Comand-line options

The command-line interface amplxe-cl provides users with the convenience to profile a real application. Users need to load the vtune module and specify the analysis type of interests. Here are the basic format:

       module load vtune

       amplxe-cl -collect $analysis_type -result-dir $yourprof_dir -- myApplication

        where $analysis_type is the options that users can chose for analyzing the performance on different sub-sysmtem processor (see the Table 2 for the supported analysis type on different platforms); $yourprof_dir is  the directory in which the profiling information is to save; myApplication is the program that you want to prfile.  After the job finishes, you can view the profiling results by either graphic interface:

       amplxe-gui $yourprof_dir

or the command-line interface:

      amplxe-cl -report  $report_type  -result-dir  $yourprof_dir

        where the $report_type should match the selected $analysis_type 

 

Table 2 Available Analysis Types for different micro-architectures

 System Name  Options available on different sub-systems
 General     concurrency                    
    frequency                     
    hotspots                  
    locksandwaits                
    sleep                         

 Sandy Bridge processor

    snb-access-contention        
    snb-bandwidth               
    snb-branch-analysis          
    snb-client                 
    snb-core-port-saturation    
    snb-cycles-uops                
    snb-general-exploration      
    snb-memory-access             
    snb-port-saturation         
 

 phi processor

    knc-bandwidth                   
    knc-general-exploration         
    knc-lightweight-hotspots
 Nehalem/Westmere processor     nehalem-cycles-uops         
    nehalem-frontend-investigation 
    nehalem-general-exploration   
    nehalem-memory-access         
 

Please note that the genral analysis-type in Table 2 applies to every platform on which you want to use vtune. One can find the details about one analysis type of particular interest  by

         amplxe-cl --help $analysis_type

For example,

         amplxe-cl --help concurrency

 

Find_hotspots

Fortran Applications

C and C++ applications
 

Profile MPI applications

MPI jobs can be analyzed by using Vtune over the the implementation of Intel MPI. Here are the simplified commands for profling MPI jobs:

module load intel impi vtune

mpirun -r ssh -f $PBS_NODEFILE -np 256 amplxe-cl -collect $analysis_type -result-dir $yourprof_dir ./test > run.out 

After the job runs successfully, one can view the profiling results either graphic or commd-line interface.

Comprehensive information can be found from the software document - Analyzing MPI applications.

 

Pages