Use of Vtune for performance optimization

Outline

Introduction

Use of Vtune on different MSI systems

Interactive profiling

Comand-line options

Profiling MPI applications

Introduction

Intel® VTune™ Amplifier XE 2013 is the premier performance profiler for C, C++, C#, Fortran, Assembly and Java*. It is available on all MSI Linux systems for users to eveluate the performance of your applications (identify and remove the hotspots). The objective is to enable all applications to run efficiently on any MSI systems. Certainly, experienced users can deeply explore each of the performance metrics embedded in Vtune. The performance evaluation process itself can be very benedicial for users to learn and understand  the cutting-edge technologioes available in the HPC world.

Use of Vtune on different systems

The  module vtune  has been set on all systems. One can profile their applications not only through the graphic interface amplxe-gui, but alos by the use  of command-line interface amplxe-cl. The former fits the need of short-time interactive profiling  while the latter is usefulf for collecting infromation during the run-time.  Users who need to do  the Interactive profiling, please go to the section of Find Hotspot for the details.

Table 1: Profiling metrics associated with micro-architecture on differen systems 

System Name Sub-sytem specific features

Itasca - Nehalem processor

General Exploration,  Read Bandwidth; Write Bandwith; Memory Access; Cycles and Ops; Frond End Investigation.

Itasca- Sandy Bridge processor

General Exploration,  Memory Bandwidth; Access contention; Branch Analysis; Client Analysis; Core port Saturation; Cycles and Ops.

Cascade- Knights Corner, phi processor

Lightweight Hotspots; Memory Bandwidth; General exploration

Cascade- Core i7 980x processor

Lightweight Hotspots; Hotspots; Concurrency; Locks and Waits.
Calhoun - Dual-Core 5100 series  Processor Lightweight Hotspots; Hotspots; Concurrency; Locks and Waits.
Lab Limux workstations Lightweight Hotspots; Hotspots; Concurrency; Locks and Waits.

Comand-line options

The command-line interface amplxe-cl provides users with the convenience to profile a real application. Users need to load the vtune module and specify the analysis type of interests. Here are the basic format:

       module load vtune

       amplxe-cl -collect $analysis_type -result-dir $yourprof_dir -- myApplication

        where $analysis_type is the options that users can chose for analyzing the performance on different sub-sysmtem processor (see the Table 2 for the supported analysis type on different platforms); $yourprof_dir is  the directory in which the profiling information is to save; myApplication is the program that you want to prfile.  After the job finishes, you can view the profiling results by either graphic interface:

       amplxe-gui $yourprof_dir

or the command-line interface:

      amplxe-cl -report  $report_type  -result-dir  $yourprof_dir

        where the $report_type should match the selected $analysis_type 

 

Table 2 Available Analysis Types for different micro-architectures

 System Name  Options available on different sub-systems
 General     concurrency                    
    frequency                     
    hotspots                  
    locksandwaits                
    sleep                         

 Sandy Bridge processor

    snb-access-contention        
    snb-bandwidth               
    snb-branch-analysis          
    snb-client                 
    snb-core-port-saturation    
    snb-cycles-uops                
    snb-general-exploration      
    snb-memory-access             
    snb-port-saturation         
 

 phi processor

    knc-bandwidth                   
    knc-general-exploration         
    knc-lightweight-hotspots
 Nehalem/Westmere processor     nehalem-cycles-uops         
    nehalem-frontend-investigation 
    nehalem-general-exploration   
    nehalem-memory-access         
 

Please note that the genral analysis-type in Table 2 applies to every platform on which you want to use vtune. One can find the details about one analysis type of particular interest  by

         amplxe-cl --help $analysis_type

For example,

         amplxe-cl --help concurrency

 

Find_hotspots

Fortran Applications

C and C++ applications
 

Profile MPI applications

MPI jobs can be analyzed by using Vtune over the the implementation of Intel MPI. Here are the simplified commands for profling MPI jobs:

module load intel impi vtune

mpirun -r ssh -f $PBS_NODEFILE -np 256 amplxe-cl -collect $analysis_type -result-dir $yourprof_dir ./test > run.out 

After the job runs successfully, one can view the profiling results either graphic or commd-line interface.

Comprehensive information can be found from the software document - Analyzing MPI applications.