GPU Cluster

GPU Cluster

K40 queue on Mesabi for GPU-enabled software applications

What is the K40 GPU queue?

The K40 GPU queue on Mesabi is composed of 40 Haswell Xeon E5-2680 v3 nodes, with each node having 128 GB of RAM and 2 NVidia Tesla K40m GPUs. Each K40m GPU has 11 GB of RAM and 2880 CUDA cores.

Since each K40m GPU has a peak performance of 1.43 double precision TFLOPS (4.29 single precision TFLOPS), the GPUs in the GPU subsystem provides a total of 114 double precision TFLOPS of peak performance.

What can I do with the K40 GPU nodes?

GPU enabled software applications can utilize the k40 nodes by submitting PBS jobs (which request the k40 queue) from Mesabi.

Examples of software packages (modules) that can utilize GPUs include: amber/14, namd/2.9-libverbs-CUDA, nwchem/6.5_cuda_6.0, caffe/0.999_cuda_6.5, fsl/5.0.6_cuda5.5

How do I access the K40 GPU nodes?

K40 GPU nodes are accessible to users by submitting jobs to the k40 queue located on the Mesabi computing cluster.  All MSI users with active accounts and service units (SUs) can submit jobs to the k40 queue using standard commands outlined in the Queue Quick Start Guide.

 

At MSI, CUDA is installed on Mesabi, our main cluster. There are 40 nodes with 2 K40 GPUs each. To request the GPU nodes, you need to use the k40 queue.  In the PBS options, you should include the number of GPUs that are needed for the job by adding gpus=2 to the resource list. Below is an example of an interactive session. 1 node with 2 GPUs was requested for 20 minutes.   As of May 2017, cgroups is now enforcing access control and resource management.   If you do not request the GPU resource then the cgroups will not provide access to the GPU.    MSI job filters will reject jobs submitted to the k40 queue which do not have a gpu resource request.   Note that the k40 nodes are not shared. 

 

qsub -I -l nodes=1:ppn=24:gpus=2,walltime=20:00 -q k40
qsub: waiting for job 469592.mesabim3.msi.umn.edu to start
qsub: job 469592.mesabim3.msi.umn.edu ready
[~] %

Load the cuda modules

[~] % module load cuda cuda-sdk

Here, the deviceQuery program shows that there are 2 GPUs available

[~] % deviceQuery | grep NumDevs
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.0, CUDA Runtime Version = 7.0, NumDevs = 2, Device0 = Tesla K40m, Device1 = Tesla K40m
[~] %