Slurm

Slurm

Slurm Workflow Manager is MSI's new Job Scheduler

What is Slurm?

Slurm is a best-in-class, highly-scalable scheduler for HPC clusters. It allocates resources, provides a framework for executing tasks, and arbitrates contention for resources by managing queues of pending work.

Why is MSI transitioning to the Slurm scheduler?

Slurm has become an industry standard for scheduling among HPC centers. It’s an open-source scheduler with a plugin framework that allows us to leverage tools developed at other centers. It is capable of stable management of a larger number of jobs than our current scheduler. Finally, it’s architecture opens opportunities to leverage technologies that will be useful for many areas of scientific computation.

How does the transition to Slurm impact my work on MSI systems?
The most obvious adjustment everyone will need to make is to learn a new set of commands for submitting jobs and checking on job status.If you have written scripts that depend on the job scheduler, they will need to be modified to match the syntax used in Slurm. This is also true of some software that MSI maintains. 
 
When you run jobs using Slurm, there will be no SUs deducted from your SU allocation. Group job limits will change over the next couple months as we migrate nodes from the other cluster. ESO customers will receive an email with important information regarding the transition of paid SUs for external use.
 
MSI has put together resources for users to help groups get started using Slurm. A tutorial on using slurm will be offered on October 20, 2020. You can register for this tutorial and view other resources at the links below:
Timeline for Transition

MSI will be sending regular updates over the next few months to remind users of upcoming important dates and link to the resources available to aid your group in the transition over to the new scheduler. Please see the dates below for a general timeline for the Slurm transition.

 

November 1st, 2020: Minimum of 30% of all MSI systems will be moved over to the Slurm scheduler. Depending on congestion levels in the Slurm portion of MSI systems, this percentage may increase, but will not exceed 80% of the system before December 1st, 2020.

 

December 1st, 2020: Minimum of 80% of all MSI systems will be moved over to the Slurm scheduler. Depending on congestion levels in the Slurm portion of MSI systems, this percentage may increase, but will not exceed 90% of the system before January 1st, 2021.

 

January 1st, 2021: Minimum of 90% of all MSI systems will be moved over to the Slurm scheduler. Depending on congestion levels in the Slurm portion of MSI systems, this percentage may increase, but will not exceed 100% of the system before January 6th, 2021.

 

January 6th, 2021 (MSI Maintenance Day): 100% of all MSI systems will be moved over to the Slurm scheduler. Users will no longer be able to submit jobs using PBS.