High Performance Computing (HPC) Policies and Procedures

These Policies and Procedures are common to every MSI HPC resource. The pages for each HPC resource also include any policies specific to that machine.

Ethical Guidelines

All users must follow the Supercomputing Institute's usage policies. In particular, passwords must not be shared with anyone. Every individual with access to the facility should have his/her own user account and password.

Privacy

The default file permissions for a new user is read/write only by the user, but this may be changed. A Principal Investigator (PI) has the right to access data and files in his/her research group. In the event that a PI has difficulty doing this they may contact MSI Technical Support.

Technical support staff do not read a user's email or files or alter them without the user's permission except in cases of emergency and only within the scope of administrative responsibilities.

Passwords

Users will have an MSI account and an associated password that will grant them access to some of the systems, labs, and software that MSI provides. This single ID and password can be used on all systems that the user has been granted access to. The user's password will automatically expire six months after the last password change or set. See the full MSI password policy for more information.

Job Scheduling

MSI uses a fairshare job scheduler to try to ensure that a mix of jobs from all users are executed on each HPC resource in a fair and efficient way. The goal of fairshare is to increase the priority of jobs belonging to groups who are below their fairshare targets, and decrease the priority of jobs belonging to groups whose usage exceeds their fairshare targets. In general, this means that when a group has recently used a large amount of resources, the priorities of their waiting jobs will be negatively affected until their usage once again decreases to their fairshare target.

For detailed information about fairshare scheduling, see the full documentation regarding job scheduling.

Backups

HPC home directories and project spaces are backed up in accordance with MSI's backup procedures.

The scratch filesystems (for example, /scratch1, /lustre) and temporary directories (like /tmp) are not backed up.

Suggestions for backing up your data

  • Keep two separate copies of anything you back up.
  • Keep detailed records of what you've backed up, including dates, file names, and devices used.
  • Verify your backup archive after you have created or updated it.

 

How do I have data restored from systems backups?

Please see the information about snapshots which may enable you to restore files without intervention. If necessary, please send a request with the following information via email to technical support:

  • Name of system
  • Name of files or directories you need to have restored
  • Approximate date that the files existed on our systems

Service Unit Allocations

Service Units (SUs) are charged for computer time on HPC resources.

One SU will provide a fixed number of hours of CPU time as follows:

CPU Hours per SU for MSI HPC Resources
HPC Resource CPU hours/SU
Itasca 1.5
Cascade 1.5
Calhoun 3.5

To determine the number of SUs you require you will need to know: how many processors (cores) your program runs on, how long each run takes (in hours), and how many runs you plan to do. The product of these three numbers determines how many CPU hours you will need. Multiplying the required CPU hours by the appropriate SU-to-CPU conversion factor found above will determine the number of SUs needed on the chosen machine. The table below provides a few examples:

Example SU Calculations
Application Information CPU Hour Calculation Itasca SUs Calhoun SUs
a single-core application that takes 5.5 hours per run; 190 runs will be needed 1 core x 5.5 hours x 190 runs = 1,045 CPU-HRs Not Allowed 1,045 CPU-HRs / 3.5 CPU-HRs per SU = 299 SUs
a 128-core application that takes 19 hours per run; 500 runs will be needed 128 cores x 19 hours x 500 runs = 1,216,000 CPU-HRs 1,216,000 CPU-HRs / 1.5 CPU-HRs per SU = 810,667 SUs 1,216,000 CPU-HRs / 3.5 CPU-HRs per SU = 347,429 SUs
a 2,048-core application that takes 24 hours per run; 75 runs will be needed 2,048 cores x 24 hours x 75 runs = 3,686,400 CPU-HRs 3,686,400 CPU-HRs / 1.5 CPU-HRs per SU = 2,457,600 SUs 3,686,400 CPU-HRs / 3.5 CPU-HRs per SU = 1,053,143 SUs

Please see the Allocations page for details on Service Unit allocation eligibility, renewal, and the peer review allocation process.

Service Unit Account and Monitoring Usage

The Supercomputing Institute tracks machine usage using Service Units. General Service Units awarded to a group can be used on any MSI HPC hardware. There are two allocation types through which a Principal Investigator may distribute SUs to their group: Group Allocation, which allows all group members access to the SUs, or User Allocation, which assigns specific amounts of SUs to each user. The default setting is for Group Allocation. If you wish to change your allocation type, or have questions about SU accounting and monitoring, send an email to technical support.

You can monitor your usage by using the command acctinfo. This command provides a summary of SU usage for the user executing it. Upon request MSI can set up acctinfo so that the PI, or the PI and a named group member, can view the usage of every member of the entire group. It is also possible to obtain usage information from the previous allocation period.

Further information about SU accounts and monitoring SU usage is available.