Page not found

Ceph in HPC Environments at SC15

Overview Individuals from MSI , UAB , RedHat Inc. , Intel Corp ., CADRE , and MIMOS came together at SC15: The International Conference for High Performance Computing, Networking, Storage and Analysis on Wednesday, November 18, 2015 in Austin, TX to share their experiences with Ceph in HPC...

Two PIs Honored by Minneapolis/St. Paul Business Journal

posted on August 27, 2014 Two MSI Principal Investigators in the College of Science and Engineering (CSE) have been named Titans of Technology by the Minneapolis/St. Paul Business Journal . Professor Art Erdman (Director, University of Minnesota Medical Devices Center ; Mechanical Engineering ) and...

How do I get a Person of Interest (POI) designation?

All MSI users must have a UMN Internet ID. If a user does not have a UMN Internet ID, the PI can get the user a Person of Interest (POI) account by working with their department’s HR representative. If the PI is not a UMN employee, the MSI Tech Support staff can assist them with creating POIs...

What are my data storage options?

MSI's storage infrastructure page provides an overview of your data storage options at MSI. MSI provides each group a shared home directory space on the high performance primary storage that is accessible from all MSI systems. Your use of this space is limited by a storage quota that applies to...

Checkpointing

Abstract

Checkpointing HPC applications has been a challenging, but highly desired functionality for saving the state of long-running applications. This functionality hedges against failure modes from unexpected events that can cause premature failure of an application.
 

Itasca - where one can do the checkpointing right now.

        login to itasca

              ssh username@itasca.msi.umn.edu

Checkpoint serial jobs

Compile your job

       module load intel

       icc  -o my_test my_app.c  -lcr

or

       ifort -o my_test  my_app.f -lcr  
 

Run your job

      qsub -I -l node=1:ppn=8,mem=10gb,walltime=2:00:00

      cd $wrk # where your job

      cr_run ./my_test &

Find the job PID

   PS=`ps -u szhang | grep my_test`

   PID=${PS:0:6}
 

  PID=`echo "${PID:0:6}" | sed 's/ //g' `
  cr_checkpoint --signal=2 --term $PID

To verify   checkpointing  success

    tail  context.$PID

To checkpoint again and terminate the job 

   cr_checkpoint  --term $PID

To restart the job from the status of last chechpointing

  cr_restart context.$PID

 

Checkpoint OpenMP  jobs

Compile your job

       module load intel

       icc  -o my_test -openmp  my_app.c  -lcr

or

       ifort -o my_test   -openmp my_app.f -lcr  
 

Run your job

      qsub -I -l node=1:ppn=8,mem=10gb,walltime=2:00:00

      cd $wrk # where your job

      export OMP_NUM_THREADS=4

      cr_run ./my_test &

Find the job PID and checkpoint the job

   PS=`ps -u szhang | grep my_test`

   PID=`echo "${PID:0:6}" | sed 's/ //g' `

   cr_checkpoint --signal=2 --term $PID

To verify   checkpointing  success

    tail  context.$PID

To checkpoint again and terminate the job 

   cr_checkpoint  --term $PID

To restart the job from the status of last chechpointing

  export OMP_NUM_THREADS=4

 cr_restart context.$PID

 

Checkpoint MPI  jobs

Compile your job

       module load intel ompi/1.6.3-blcr/intel

       mpicc  -o my_test   my_app.c 

or

       mpif77 -o my_test    my_app.f
 

Where to store the checkpointing file

Please create a  .openmpi in your home directory, generate a file named as mca-params.conf under    .openmpi. the mca-params.conf file should contain the path to the directory where you want to store the checkpointing files. Here is an example:

     cat /home/support/szhang/.openmpi/mca-params.conf
     snapc_base_global_snapshot_dir=/lustre/cr_files
     crs_base_snapshot_dir=/lustre/cr_files/local

Run your job

      qsub -I -l node=4:ppn=8,mem=10gb,walltime=2:00:00

      cd $wrk # where your job

      mpirun  -am ft-enable-cr -np 32 ./my_test &

Find the job PID and checkpoint it:

      pid=`ps -u szhang | grep mpirun`

      jid=`echo "${pid:0:6}" | sed 's/ //g' `

     export jid
     ompi-checkpoint $jid

To verify  checkpointing  success

    ls -al /lustre/cr_files | grep $jid

To checkpoint again and terminate the job 

   ompi-checkpoint --term $jid

To restart the job from the status of last chechpointing

cd  /lustre/cr_files/

 ompi-restart  /lustre/cr_files/ompi_global_snapshot_$jid.ckpt/

 

 

  I/O Performance

 

Python

Software Description: 

Python is a high level programming language that aims to combine remarkable power with very clear syntax. Its success in these areas has led to strong adoption by the scientific community, resulting in numerous math, physics, chemistry, and biology libraries being contributed from the community of users.

Software Updates and Version Consistency

MSI must make periodic updates to our existing Python installations in order to support the latest features available in the language, and for compatibility with other Python-supported software. In order to maintain a consistent Python environment, users are encouraged to use the conda tool available in the Anaconda Python distribution to clone the desired environment into their home directory. Environments cloned in this way will not be updated by MSI and are therefore suitable for use in applications that require a static Python environment. Instructions for cloning a Python environment can be found on MSI’s Anaconda Python software page.

 

Software Support Level: 
Primary Support
Software Access Level: 
Open Access
PBS Example: 

Programs can be submitted to a queue using PBS script such as the one below:

#PBS -l nodes=1:ppn=1,mem=1gb,walltime=4:00:00
#PBS -m abe
cd /location/of/the/script
module load python-epd
python myscript.py
Software Categories: 
Software Interactive/GUI: 
No
General Linux Documentation: 

MSI maintains installations of Python optimized for scientific and development use on all MSI computational resources. These include additional tools and libraries such as IPython, SciPy, NumPy, BioPython, Django, and many others. MSI supports the Enthought Python Distribution/Canopy and Continuum Analytics Anaconda Python as well as the PyPy alternative Python implementation.

To enable our default version of Python 2.7 (currently a version of the Enthought distribution), type

module load python

To enable our recommend version of Python 3 (currently a version of the Anaconda distribution), type

module load python3

Several other versions of Python are available, but the versions may be different on different platforms. To list all versions of Python available on the machine, type

module avail python

Assistance with Python programming is available by contacting help@msi.umn.edu.

Python Usage

The following information applies to all supported versions of Python. Additional information relevant to a particular environment can be found at the following links:

In the examples below we use module load python. They will also work if you select a specific environment version (python3/3.4) or implementation (python-epd, python-pypy). Loading a module for Python will make that Python environment your default for all Python-related commands.

To run Python interactively in a Linux environment run the commands:

module load python
ipython

If you would like to run a Python script, myscript.py, use the following command:

module load python
python myscript.py

Installing Packages

Our recommended environments for Python 2.7 (module load python) and Python 3.4 (module load python3) include a wide variety of useful scientific, mathematical, and programming packages already installed. You can usually determine if a package is installed by attempting to import its module from Python:

module load python
ipython

> import somepackage

If this succeeds without printing an error, the package is installed and ready to use. Otherwise, in most cases we recommend that you use the provided tools to install a local copy of the package in your home directory.

The simplest way to accomplish this is to use pip to install from the Python Package Index. Once you have determined the PyPI name of the package you want, you can install or upgrade the package with the commands:

module load python
pip install --user --upgrade packagename

For more complex installations you may want to create a Python virtual environment in your home directory. The commands to do so are not the same across all Python environments. Please see the software documentation page for the environment you are using for instructions.

Minnesota Moose Population

posted on June 9, 2014 Ecologists and biologists are concerned about recent data that shows that Minnesota's iconic moose seem to be vanishing. The decline in the moose population is being studied by MSI PI Assistant Professor James Forester ( Fisheries, Wildlife, and Conservation Biology ). The...

Request for Proposals: Text-based Crowdsourcing at the U of M

posted on October 30, 2014 In conjunction with the Zooniverse@UMN initiative, MSI is pleased to announce a Request for Proposals soliciting project proposals from U of Minnesota-affiliated research groups that have text-based projects that would benefit from hundreds of thousands of online...

Pages