PBS Information for the BSCL Compute Servers


Introduction

The Basic Sciences Computing Laboratory compute server resources consist of a 48-processor Altix 3700, an 8-processor Sun Fire V880, and a 4-node x86_64 cluster.

The Sun Fire utilizes 750 MHz UltraSparc III+ processors with 32 GB of main memory. The Altix 3700 utilizes 1.3 GHz Itanium2 processors with 92 GB of main memory. The x86_64 cluster utilizes 2.66 GHz Intel Xeon quad-core (Clovertown) processors, and has 16 GB of memory per node.

PBS is a queueing system for submitting serial and parallel jobs. It matches job requirements with the machine's resources by use of a command file. It ensures that the machines are used fairly and to their full potential and that resources don't go unused.


Queue Structure

NOTE:

Users have to specify a queue, altix, onyx, sun, or x86_64 and an architecture with the "arch" keyword, altix, onyx, v9b, or x86_64.

Please specify :

The following table gives a summary of the enforced limits on memory and number of processors.

Queue Architecture Max Wall Clock Time per Job Max Number of CPUs per Job
altix altix 96 hrs 32 cpus
onyx onyx 96 hrs 12 cpus
x86_64 Linux x86_64 96 hrs 8 cpus


How to Create a Command File

All pbs jobs must be submitted with a command file. The following example is a PBS script that will submit a single processor job that uses 50 MB of memory and is estimated to run up to 3 hours on the Sun Fire V880.
#PBS -l mem=50mb,ncpus=1,walltime=3:00:00
#PBS -m abe
#PBS -q sun
#PBS -l arch=v9b
cd /home/runesha/Testpbs
./a.out
By using the -m abe option, PBS will send email to you if the job is aborted (a), when the job begins running (b), and when the job terminates (e).


How to Submit a Job

You may use PBS to submit jobs from any of the BSCL workstations to the SGI Altix, Sun Fire V880, or x86_64 cluster at the BSCL. You first need to load the pbs module:

module load pbs


How to Submit OpenMP Jobs

The following is a 4-processors OpenMP job. The wall time represents the amount of time the job takes to complete. So, if you specified 00:30:00, the job would need to finish in 30 minutes.

#PBS -l ncpus=4,mem=1gb,walltime=00:30:00
#PBS -m abe
#PBS -q altix
#PBS -l arch=altix
cd /home/runesha/Testpbs
setenv OMP_NUM_THREADS 4
./a.out
# end of example script


How to Submit MPI Jobs

The following is a 4-processor Message Passing Interface job. It requests the same resources as the above job.

#PBS -l ncpus=4,mem=1gb,walltime=00:30:00
#PBS -m abe
#PBS -q altix
#PBS -l arch=altix
cd /home/runesha/Testpbs
mpirun -np 4 ./a.out
# end of example script


How to Check Job Status

You can check job status using the qstat command.

Alternatively, you can use the showq command. showq is provided with the Moab scheduler, the job scheduler that we use in the BSCL.


How to Remove and Kill Jobs

Jobs are killed or removed from the queuing system by using the qdel command. There is a man page for qdel that lists the options you can use with it. If you wish to signal a running job, you may use the qsig command. Again, please see the man page for information. Here is a quick example of how to kill job 1974.ice.msi.umn.edu as listed above.

qdel 1974.ice.msi.umn.edu
or
qdel 1974