Use of Vtune for performance optimization



Use of Vtune on different MSI systems

Interactive profiling

Comand-line options

Profiling MPI applications


Intel® VTune™ Amplifier XE 2013 is the premier performance profiler for C, C++, C#, Fortran, Assembly and Java*. It is available on all MSI Linux systems for users to eveluate the performance of your applications (identify and remove the hotspots). The objective is to enable all applications to run efficiently on any MSI systems. Certainly, experienced users can deeply explore each of the performance metrics embedded in Vtune. The performance evaluation process itself can be very benedicial for users to learn and understand  the cutting-edge technologioes available in the HPC world.

Use of Vtune on different systems

The  module vtune  has been set on all systems. One can profile their applications not only through the graphic interface amplxe-gui, but alos by the use  of command-line interface amplxe-cl. The former fits the need of short-time interactive profiling  while the latter is usefulf for collecting infromation during the run-time.  Users who need to do  the Interactive profiling, please go to the section of Find Hotspot for the details.

Table 1: Profiling metrics associated with micro-architecture on differen systems 

System Name Sub-sytem specific features

Itasca - Nehalem processor

General Exploration,  Read Bandwidth; Write Bandwith; Memory Access; Cycles and Ops; Frond End Investigation.

Itasca- Sandy Bridge processor

General Exploration,  Memory Bandwidth; Access contention; Branch Analysis; Client Analysis; Core port Saturation; Cycles and Ops.

Cascade- Knights Corner, phi processor

Lightweight Hotspots; Memory Bandwidth; General exploration

Cascade- Core i7 980x processor

Lightweight Hotspots; Hotspots; Concurrency; Locks and Waits.
Lab Limux workstations Lightweight Hotspots; Hotspots; Concurrency; Locks and Waits.

Comand-line options

The command-line interface amplxe-cl provides users with the convenience to profile a real application. Users need to load the vtune module and specify the analysis type of interests. Here are the basic format:

       module load vtune

       amplxe-cl -collect $analysis_type -result-dir $yourprof_dir -- myApplication

        where $analysis_type is the options that users can chose for analyzing the performance on different sub-sysmtem processor (see the Table 2 for the supported analysis type on different platforms); $yourprof_dir is  the directory in which the profiling information is to save; myApplication is the program that you want to prfile.  After the job finishes, you can view the profiling results by either graphic interface:

       amplxe-gui $yourprof_dir

or the command-line interface:

      amplxe-cl -report  $report_type  -result-dir  $yourprof_dir

        where the $report_type should match the selected $analysis_type 


Table 2 Available Analysis Types for different micro-architectures

 System Name  Options available on different sub-systems
 General     concurrency                    

 Sandy Bridge processor


 phi processor

 Nehalem/Westmere processor     nehalem-cycles-uops         

Please note that the genral analysis-type in Table 2 applies to every platform on which you want to use vtune. One can find the details about one analysis type of particular interest  by

         amplxe-cl --help $analysis_type

For example,

         amplxe-cl --help concurrency



Fortran Applications

C and C++ applications

Profile MPI applications

MPI jobs can be analyzed by using Vtune over the the implementation of Intel MPI. Here are the simplified commands for profling MPI jobs:

module load intel impi vtune

mpirun -r ssh -f $PBS_NODEFILE -np 256 amplxe-cl -collect $analysis_type -result-dir $yourprof_dir ./test > run.out 

After the job runs successfully, one can view the profiling results either graphic or commd-line interface.

Comprehensive information can be found from the software document - Analyzing MPI applications.