At MSI, CUDA is installed on Mesabi, our main cluster. There are 40 nodes with 2 K40 GPUs each. In order to request the GPU nodes, you need to use the k40 queue. In the PBS options, you should include the number of GPUs that are needed for the job. Below is an example of an interactive session. 1 node with 2 GPUs was requested for 20 minutes.
NOTE: GPU nodes are not shared, which means any job running in the k40 queue will be charged for 24 cores of utilization.
(This assumes you are already on a mesabi login node.)
[ln0003:~] % qsub -I -l nodes=1:gpus=2,walltime=20:00 -q k40
qsub: waiting for job 469592.mesabim3.msi.umn.edu to start
qsub: job 469592.mesabim3.msi.umn.edu ready
Load the cuda modules
[cn3006:~] % module load cuda cuda-sdk
Here, the deviceQuery program shows that there are 2 GPUs available
[cn3006:~] % deviceQuery | grep NumDevs
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.0, CUDA Runtime Version = 7.0, NumDevs = 2, Device0 = Tesla K40m, Device1 = Tesla K40m