Note: This page contains guidelines for both choosing a job queue under MSI's current job scheduler, PBS/TORQUE, and choosing a partition under MSI's new job scheduler, Slurm. The migration of MSI systems to the Slurm scheduler will take place over Quarter 4 of 2020, and the PBS/Torque will be discontinued on January 6th, 2021. Click on the following links to jump to a section, depending on the scheduler you are using to submit your job.
- If you plan to submit your job to the PBS/TORQUE scheduler, jump to Choosing a Job Queue (PBS/TORQUE)
- If you plan to submit your job to the Slurm scheduler, jump to Choosing a Parition (Slurm)
Most MSI systems use job queues to efficiently and fairly manage when computations are executed. A job queue is an automated waiting list for use of a particular set of computational hardware. When computational jobs are submitted to a job queue they wait in the queue in line until the appropriate resources become available. Different job queues have different resources and limitations. When submitting a job, it is very important to choose a job queue which has resources and limitations suitable to the particular calculation.
This document outlines factors to consider when choosing a job queue. These factors are important when choosing where to place a job. This document is best used on all MSI systems and in conjunction with the Queues page that outlines the resource limitations for each queue.
Please note that Mesabi's "widest" queue requires special permission to use. Please submit your code for review at: firstname.lastname@example.org.
There are several important factors to consider when choosing a job queue for a specific program or custom script. In most cases, jobs are submitted via PBS scripts as described in Job Submission and Scheduling.
Each MSI system contains job queues managing sets of hardware with different resource and policy limitations. MSI currently has two primary systems: the supercomputer Mesabi and Mesabi's expansion Mangi. Mesabi has a wide variety of queues suitable for many different job types. Mangi is a heterogeneous system suitable for even more job types. Mangi should be your first choice when doing any computation at MSI. The Mesabi Interactive Queue is primarily used for interactive software that is graphical in nature, and testing. Which system to choose depends highly on which system has queues appropriate for your software/script. Examine the Queue page to determine the most appropriate system.
Job Walltime (walltime=)
The job walltime is the time from the start to the finish of a job (as you would measure it using a clock on a wall), not including time spent waiting to run. This is in contrast to cputime, which measures the cumulative time all cores spent working on a job. Different job queues have different walltime limits, and it is important to choose a queue with a sufficiently high walltime that enables your job to complete. Jobs that exceed the requested walltime are killed by the system to make room for other jobs. Walltime limits are maximums only, and you can always request a shorter walltime, which will reduce the amount of time you wait in the queue for your job to start. If you are unsure how much walltime your job will need start with the queues with shorter walltime limits and only move to others if needed.
Job Nodes and Cores (nodes=X:ppn=Y)
Many calculations have the ability to use multiple cores (ppn), or (less often) multiple nodes, to improve calculation speed. Certain job queues have maximum or minimum values for the number nodes and cores a job may use. If Node Sharing is enabled for a queue you can request fewer cores (ppn) than exist on an entire node. If Node Sharing is not enabled then you must request resources equivalent to a multiple of an entire node. Mesabi’s widest and large queues do not allow Node Sharing.
Job Memory (mem=)
The memory which a job requires is an important factor when choosing a queue. The largest amount of memory (RAM) that can be requested for a job is limited by the memory on the hardware associated with that queue. Mesabi has two queues (ram256g and ram1t) with high memory hardware, the largest memory hardware is available through the ram1t queue.
User and Group Limitations
To efficiently share resources, many queues have limits on the number of jobs or cores a particular user or group may simultaneously use. If a workflow requires many jobs to complete, it can be helpful to choose queues which will allow many jobs to run simultaneously.
Some queues contain nodes with special hardware, GPU accelerators and solid-state scratch drives being the most common. If a calculation needs to use special hardware, then it is important to choose a queue with the correct hardware available. Furthermore, those queues may require additional resources to be specified (e.g., GPU nodes require ":gpus=X").
At certain times particular queues may become overloaded with submitted jobs. In such a case, it can be helpful to send jobs to queues with lower utilization (node status). Sending jobs to lower utilization queues can decrease wait time and improve throughput. Care must be taken to make sure calculations will fit within queue limitations.
Choosing a Partition (Slurm)
Job Walltime (--time=)
Job Nodes and Cores (--nodes= and --ntasks= )
Job Memory (--mem=)
User and Group Limitations