HPC

HPC

What is HPC?

MSI's High Performance Computing (HPC) systems are designed with high speed networks, high performance storage, GPUs, and large amounts of memory in order to support some of the most compute and memory intensive programs developed today.

MSI currently has two main HPC systems, Itasca and Mesabi. Itasca is our lower capacity HPC cluster and is designed for software that process across multiple nodes. Mesabi is our premier HPC cluster with the highest capacity and processor speeds available at MSI. Mesabi is where a majority of MSI users get work done and has several features that makes it the right choice for most computational needs.

What Can I do with HPC?

MSI’s HPC systems have direct access to high performance storage and many of MSIs software resources including popular programming languages such as Python, R, Matlab and C compilers. This integration creates an computational environment that is flexible and powerful enough to accommodate any need. Researchers from departments across the University use MSI’s HPC resources daily to accelerate their research.

How Do I Access the HPC systems?

The first step to accessing MSI’s HPC systems is to become an MSI user, from there MSI’s HPC systems are primarily accessed via a terminal interface and many of our users have the ability to write custom programs to run complex analysis. MSI also provides interactive access to the HPC systems though NICE, iPython Notebook and interactive MATLAB options.

 

HPC Fairshare Scheduling

HPC Fairshare Scheduling

MSI uses a fairshare job scheduler to try to ensure a mix of jobs from all users can utilize any given HPC resource efficiently and fairly. The goal of fairshare is to increase the priority when scheduling jobs of the groups who are below their fairshare targets, and decrease the priority of jobs belonging to those groups whose usage exceeds their fairshare targets. In general, this means that when a group has recently used a large amount of resources, the priorities of their waiting jobs will be negatively affected until their usage decreases to their fairshare target once again.

 

Each group's fairshare target is based on the percentage of total Service Units (SUs) that the group has received for the current allocation period. For example, if a group's fairshare target is 5, then the group can use 5% of the resources. If, as often happens, during the course of several days, the group uses more than 5% of a given resource, then the group has exceeded their fairshare target on that resource, and the priorities of their waiting jobs, on that resource, will be decreased.
 
Furthermore, if a group uses their SUs faster than their uniform rate - total allocation for that group divided the number of days in the allocation period, which is a calendar year - the group's fairshare target will decrease. The updated fairshare target is based on the number of SUs remaining for the group and the sum of SUs allocated to all groups. Fairshare targets are updated as such on a daily basis.
 
To help groups determine their usage rate, the command 'acctinfo' states the percentage of their allocation remaining and the amount of time, as a percentage, remaining in the allocation period. The 'acctinfo' command also shows the group's current fairshare target as well as the group's current usage of the resource on which 'acctinfo' is being run. The fairshare target may vary from day to day, reflecting changes in the number of groups that have allocations in the current allocation period, and changes to allocations resulting from additional SU requests from various groups.
 
When calculating the fairshare factor that affects the priorities of a group's jobs, the scheduler uses a weighted average of the last 7 days of the group's usage on the specific resource. The weight of the usage of a specific day decays as that day slips into the past. The current weights are:

Weighting Factors for Fairshare Scheduling

Days Ago

0

1

2

3

4

5

6

Weight

1.0000

0.8000

0.6400

0.5120

0.4096

0.3277

0.2621

 

 
MSI allows jobs to run for 150 hours (6.25 days), so it makes sense to consider usage for the past 7 days. However, usage from 7 days ago affects the fairshare factor relatively little.
 
When scheduling jobs and calculating priorities of waiting jobs, there are many factors to consider, and fairshare is only one such factor. MSI also uses queue time - the time that a job has been waiting to run - to affect the fairshare of any given job. The longer a job waits, the more that the queue time factor will add to the job's priority. Also, the job's requested walltime, relative to the maximum walltime on the resource where the job is waiting, will affect the job's priority. This is called the Expansion Factor (or XFactor). The shorter the job is, the higher its expansion factor.
 
Additionally, the scheduler is configured to first try to schedule jobs requesting a large amount of resources, and then schedule smaller jobs around the larger jobs. Jobs requesting a large amount of resources need to reserve those resources in order to run, and they cannot run until there are sufficient free resources to fit such jobs. It is undesirable to have unused resources, so the scheduler uses smaller jobs to fill in the gaps created by the reservations of the large jobs. This scheduling behavior is called "backfill." It is far more efficient to backfill smaller jobs around larger jobs. Accurate estimates of wall clock time on your jobs, especially small jobs, will help the scheduler schedule your jobs promptly.
 
MSI understands that no one wants to wait. It is also true that no scheduling policy can guarantee that no one will wait - only impossibly large machines can guarantee that - so we use fairshare to try to ensure a mix of jobs from all users can utilize the resources efficiently and fairly. We monitor queues and often adjust parameters to get better turnaround times on jobs. Your comments are always welcome.

 

The new Mesabi compute cluster provides hardware resources for running a wider variety of jobs

The Itasca compute cluster provides resources for running production batch and interactive jobs