Calhoun (SGI Altix XE 1300 Linux Cluster)

Data in old Calhoun home directories will be erased on July 15, 2013. Any data that has not been migrated from old Calhoun home directories to the new unified home will be lost.

Calhoun

Calhoun is an SGI Altix XE 1300 Linux cluster. The cluster consists of 180 SGI Altix XE 300 compute nodes, each containing two quad-core 2.66 GHz Intel Xeon "Clovertown"-class processors sharing 16 GB of main memory. In total, Calhoun consists of 1440 compute cores and 2.8 TiB of main memory.

 

User Guides

For a quick introduction to working on Calhoun, see the Quickstart guide. An in-depth discussion of operations on Calhoun is available on the user resources page.

Hardware & Configuration

Clovertown MCM

Calhoun has 360 Intel Xeon 5355 "Clovertown"-class multi-chip modules (MCMs). Each MCM (illustrated to the left) is composed of two dies. These dies are two seperate pieces of silicon connected to each other and arranged on a single module. Each die has two processor cores that share a 4 MB L2 cache. Each MCM communicates with the main memory in the system via a 1,333 MHz front-side bus (FSB).

  • 180 compute nodes
  • 2 interactive nodes
  • 5 server nodes
  • 1440 total cores
  • 2.8 TiB total main memory
  • Suitable for: MPI jobs

Each node

  • Processors: Two quad-core 2.66 GHz Intel Xeon "Clovertown"-class processors
  • Memory: 16 GB main memory running at 1,333 MHz.

Network

All of the systems within Calhoun are interconnected with a 20-gigabit non-blocking InfiniBand fabric used for interprocess communication (IPC). The InfiniBand fabric is a high-bandwidth, low-latency network, the intent of which is to accomodate high-speed communication for large MPI jobs. The nodes are also interconnected with two 1-gigabit ethernet networks for administration and file access, respectively.

Home Directories and Disks

Calhoun home directories are as described on the MSI Home Directories page. MSI central project spaces are available on the Calhoun interactive nodes by request.

Scratch Spaces

There are two kinds of scratch space on Calhoun: a large scratch partition that is shared across all compute and interactive nodes; and local scratch on each compute node.

Currently, there is 20 TB of file space allocated for scratch files in each of the /scratch2, /scratch3 and /scratch4 filesystems. This is useful for voluminous input and output, including checkpoint/restart files. There is no disk quota on the shared scratch file system. However, a scratch clean-up process runs daily on the system and all files in the scratch file system that have not been modified for 14 days will be deleted. Users may request exceptions to the scratch clean-up by sending email to help@msi.umn.edu.

The local scratch on each compute node (/scratch) is not visible on any other compute node, nor is it visible on the interactive nodes. It is only useful as temporary space for an application to write to and read back during a single job, and is not available after the job ends.

Queues

15 gb per node is the maximal amount that a job can request. Serial jobs may be run on Calhoun.

Calhoun Queue Policies

Primary Queue

(-q batch)

Development Queue

(-q devel)

Medium Queue

(-q medium)

Long Queue

(-q long)

Max Queue

(-q max)

180 nodes or 1440 cores

8 nodes or 64 cores

64 nodes or 512 cores

16 nodes or 128 cores

2 nodes or 16 cores

48hr run-time 1hr run-time 96hr run-time 192hr run-time 600hr run-time
        serial jobs only