MSI systems use job queues to efficiently and fairly manage when computations are executed. When computational jobs are submitted to a job queue they wait in the queue until the appropriate computational resources are available.
The queuing system which MSI uses is called PBS, which stands for Portable Batch System. To submit a job to a PBS queue users create PBS job scripts. PBS job scripts contain information on the resources requested for the calculation, as well as the commands for executing the calculation.
PBS Script Format
A PBS job script is a small text file containing information about what resources a job requires, including time, number of nodes, and memory. The PBS script also contains the commands needed to begin executing the desired computation. A sample PBS job script is shown below.
#!/bin/bash -l #PBS -l walltime=8:00:00,nodes=1:ppn=8,mem=10gb #PBS -m abe #PBS -M email@example.com cd ~/program_directory module load intel module load ompi/intel mpirun -np 8 program_name < inputfile > outputfile
The first line in the PBS script defines which type of shell the script will be read with (how the system will read the file). It is recommended to make the first line #!/bin/bash -l Commands for the PBS queing system begin with #PBS. The second line in the above sample script contains the PBS resource request. The sample job will require 8 hours, 1 nodes each with 8 processor cores per node (ppn), and 10 gigabytes of memory (mem). The resource request must contain appropriate values; if the requested time, processors, or memory are not suitable for the hardware the job will not be able to run.
The two lines containing #PBS -m abe , and #PBS -M firstname.lastname@example.org , are both commands having to do with sending message emails to the user. The first of these lines instructs the PBS system to send a message email when the job aborts, begins, or ends. The second command specifies the email address to be used. Using the message emails is recommended because the reason for a job failure can often be determined using information in the emails.
The rest of the sample PBS script contains the commands which will be executed to begin the calculation. A PBS script should contain the appropriate change directory commands to get to the job execution location (the script will start in the user home directory). A PBS script also needs to contain module load commands for any software modules that the calculation might need. The last lines of a PBS script contain commands used to execute the calculation. In the above example the final line contains an execution command to start a program which uses MPI communication to run on 8 processor cores.
Submitting Job Scripts
Once a job script is made it is submitted using the qsub command:
qsub -q queuename scriptname
Here quename is the name of the queue being submitted to, and scriptname is the name of the job script. The -q quename portion of the command may be ommitted, in which case the job would be submitted to whichever queue is set as the default. Alternatively, the queue specification can be placed inside the job script (see below).
To view all of the jobs submitted by a particular user use the command:
qstat -u username
This command will display the status of the specified jobs, and the associated job ID numbers. The command qstat by itself will show all jobs on the system.
To cancel a submitted job use the command:
mjobctl -c jobIDnumber
Here jobIDnumber should be replaced with the appropriate job ID number determined by using the qstat command.
PBS Script Commands
Below is a table summarizing some commands that can be used inside PBS job scripts. The first two commands (interpreter specification, and resource request) are required, while the other commands are optional. Each of the below PBS commands is meant to go on a single line within a PBS script.
|#!/bin/bash -l||Specifies how the PBS file should be read (by the bash interpreter). A statement like this is required to be the first line of a PBS script.|
|#PBS -l walltime=2:00:00,nodes=1:ppn=8,pmem=2500mb||The resource request (required). The resource request must specify the job walltime (hours:minutes:seconds), number of nodes, and processor cores per node (ppn). It is recommended to specify either the required memory per processor core (pmem), or the required total memory (mem).|
|#PBS -m abe||Makes the PBS system send message emails when the job aborts, begins, or ends.|
|#PBS -M email@example.com||Specifies the email address that should be used when the PBS system sends message emails.|
|#PBS -N jobname||Specifies a name for the job that will appear in the job queue.|
|#PBS -o output_filename||Directs the job standard output to be placed in the named file.|
|#PBS -e error_filename||Directs the job error output to be placed in the named file.|
|#PBS -q queue_name||Specifies that the job should be run in the named queue.|