Slurm

Slurm

Slurm Workload Manager is MSI's new Job Scheduler

What is Slurm?

Slurm is a best-in-class, highly-scalable scheduler for HPC clusters. It allocates resources, provides a framework for executing tasks, and arbitrates contention for resources by managing queues of pending work.

Why is MSI transitioning to the Slurm scheduler?

Slurm has become an industry standard for scheduling among HPC centers. It’s an open-source scheduler with a plugin framework that allows us to leverage tools developed at other centers. It is capable of stable management of a larger number of jobs than our current scheduler. Finally, it’s architecture opens opportunities to leverage technologies that will be useful for many areas of scientific computation.

How does the transition to Slurm impact my work on MSI systems?
The most obvious adjustment everyone will need to make is to learn a new set of commands for submitting jobs and checking on job status. If you have written scripts that depend on the job scheduler, they will need to be modified to match the syntax used in Slurm. This is also true of some software that MSI maintains. 
 
When you run jobs using Slurm, there will be no SUs deducted from your SU allocation. Group job limits will change over the next couple months as we migrate nodes from the other cluster. ESO customers have received an email on October 15th containing important information regarding the transition of paid SUs and SU accounting for ESO customers.
Resources
MSI has put together resources for users to help groups get started using Slurm. A recorded tutorial session on using Slurm is also now available. Please see the list of links below for more information on various topics related to Slurm, and how to get started using Slurm:

Getting Started Using Slurm

Tutorial Materials

Other Slurm Documentation

Timeline for Transition

MSI will be sending regular updates over the next few months to remind users of upcoming important dates and link to the resources available to aid your group in the transition over to the new scheduler. Please see the dates below for a general timeline for the Slurm transition.

 

November 1st, 2020: Minimum of 30% of all MSI systems will be moved over to the Slurm scheduler. Depending on congestion levels in the Slurm portion of MSI systems, this percentage may increase, but will not exceed 80% of the system before December 1st, 2020.

 

December 1st, 2020: Minimum of 80% of all MSI systems will be moved over to the Slurm scheduler. Depending on congestion levels in the Slurm portion of MSI systems, this percentage may increase, but will not exceed 90% of the system before January 1st, 2021.

 

January 1st, 2021: Minimum of 90% of all MSI systems will be moved over to the Slurm scheduler. Depending on congestion levels in the Slurm portion of MSI systems, this percentage may increase, but will not exceed 100% of the system before January 6th, 2021. Additionally, all SU allocations will expire on this day, and groups* will not be able to start new PBS jobs starting on January 1st. Existing PBS jobs will continue to run through to completion.

*With a few rare exceptions

 

January 6th, 2021 (MSI Maintenance Day): 100% of all MSI systems will be moved over to the Slurm scheduler. Users will no longer be able to submit jobs using PBS.