Job Submission and Scheduling (PBS Scripts)

Many MSI systems use job queues to efficiently and fairly manage when computations are executed.  When computational jobs are submitted to a job queue they wait in the queue in line until the appropriate computational resources are available.

The queuing system which MSI uses is called PBS, which stands for Portable Batch System.  To submit a job to a PBS queue users create PBS job scripts.  PBS job scripts contain information on the resources requested for the calculation, as well as the commands for executing the calculation.

PBS Script Format

A PBS job script is a small text file containing information about what resources a job requires, including time, number of nodes, and memory.   The PBS script also contains the commands needed to begin executing the desired computation.  A sample PBS job script is shown below.

#!/bin/bash -l 
#PBS -l walltime=8:00:00,nodes=3:ppn=8,pmem=1000mb
#PBS -m abe
cd ~/program_directory
module load intel
module load ompi/intel
mpirun -np 24 program_name < inputfile > outputfile

The first line in the PBS script defines which type of shell the script will be read with (how the system will read the file).  It is recommended to make the first line #!/bin/bash -l   Commands for the PBS queing system begin with #PBS.  The second line in the above sample script contains the PBS resource request.   The sample job will require 8 hours, 3 nodes each with 8 processor cores per node (ppn), and 1000 megabytes of RAM per processor core (pmem).  The resource request must contain appropriate values; if the requested time, processors, or memory are not suitable for the hardware the job will not be able to run.

The two lines containing #PBS -m abe , and #PBS -M , are both commands having to do with sending message emails to the user.  The first of these lines instructs the PBS system to send a message email when the job aborts, begins, or ends.  The second command specifies the email address to be used.  Using the message emails is recommended because the reason for a job failure can often be determined using information in the emails.

The rest of the sample PBS script contains the commands which will be executed to begin the calculation.  A PBS script should contain the appropriate change directory commands to get to the job execution location (the script will start in the user home directory).  A PBS script also needs to contain module load commands for any software modules that the calculation might need.  The last lines of a PBS script contain commands used to execute the calculation.  In the above example the final line contains an execution command to start a program which uses MPI communication to run on 24 processor cores.

Submitting Job Scripts

Once a job script is made it is submitted using the qsub command:

qsub -q quename scriptname    

Here quename is the name of the queue being submitted to, and scriptname is the name of the job script.  The -q quename portion of the command may be ommitted, in which case the job would be submitted to whichever queue is set as the default.  Alternatively, the queue specification can be placed inside the job script (see below).

To view all of the jobs submitted by a particular user use the command:

qstat -u username    

This command will display the status of the specified jobs, and the associated job ID numbers. The command qstat by itself will show all jobs on the system.

To cancel a submitted job use the command:

qdel jobIDnumber    

Here jobIDnumber should be replaced with the appropriate job ID number determined by using the qstat command.

PBS Script Commands

Below is a table summarizing some commands that can be used inside PBS job scripts.  The first two commands (interpreter specification, and resource request) are required, while the other commands are optional.  Each of the below PBS commands is meant to go on a single line within a PBS script.

PBS commandEffect
#!/bin/bash -lSpecifies how the PBS file should be read (by the bash interpreter). A statement like this is required to be the first line of a PBS script.
#PBS -l walltime=2:00:00,nodes=1:ppn=8,pmem=2500mbThe resource request (required). The resource request must specify the job walltime (hours:minutes:seconds), number of nodes, and processor cores per node (ppn). It is recommended to specify either the required memory per processor core (pmem), or the required total memory (mem).
#PBS -m abeMakes the PBS system send message emails when the job aborts, begins, or ends.
#PBS -M sample_email@umn.eduSpecifies the email address that should be used when the PBS system sends message emails.
#PBS -N jobnameSpecifies a name for the job that will appear in the job queue.
#PBS -o output_filenameDirects the job standard output to be placed in the named file.
#PBS -e error_filenameDirects the job error output to be placed in the named file.
#PBS -q queue_nameSpecifies that the job should be run in the named queue.