You are here
- Login Procedure
- Available Software
- Compiling Codes
- Run Jobs Interactively
- Submitting Jobs to the Queue
- CPU sets
Koronis is a constellation of SGI systems, including foremost an Altix UV1000 server with 1140 compute cores (190 6-core Intel Xeon X7542 "Westmere" processors at 2.66 GHz), 2.96 TiB of globally-addressable shared memory in a single system image. OpenMP or other threaded codes should run well on this resource.
This guide will provide you with the basic information necessary to get your jobs up and running on Koronis.
Please connect through login.msi.umn.edu or nx.msi.umn.edu, i.e.,
The command line
will list software packages that have been compiled and installed for Koronis.
module load name_of_software_package
will set appropriate environmental variables and add the software run scripts and binaries to your path. For example,
module load intel
must be run to make the icc and ifort commands available.
To get more information about a module:
module help name_of_software_package
To see how the module will affect your execution environment:
module show name_of_software_package
Please compile your codes on login nodes to finishe the compiling quickly. You may experince a slow compling if you have to compile the code on the compute nodes .
module load intel icc -o test -O3 -openmp openmp_code.c
module load intel ifort -o test -O3 -openmp openmp_code.f
Please add "-shared-intel -mcmodel=large -i-dynamic" flags to the compiling options if the job needs memory more than 2 GB. Users can select different compiling options for optimizing the performance. Please see the man page (e.g., man ifort or man icc) for the available options.
module load intel mpt icc -o test -lmpi mpi_code.c icpc -o test -lmpi++abi1002 -lmpi mpi_code.cpp
module load intel mpt ifort -o test -lmpi mpi_code.f
Please see the MPI man page (man mpi) for more information on SGI's MPI implementation.
module load intel export OMP_NUM_THREADS=4 ./test
module load intel mpt mpirun -np 4 ./test
There are currently two queues on the system. The default queue submits to the UV100 systems, used for development. There is also a queue for the UV1000 system, specified via the -q uv1000 option on job submission. The maximum run-time is currently set to 24 hours, and there are no limits on the number of queued or running jobs.
The minimum size of a job on a UV100 is 6 processes and 32 GiB of memory. The minimum size of a job on the UV1000 is 6 processes and 16 GiB of memory. Jobs should request resources in sets of 6 processor cores (called ncpus by PBS). Jobs do not need to request memory, as it is implicitly allocated based on the number of processor cores requested. See the CPU sets section below for more information.
|Queue||Memory (per system)||Cores (per system)||Walltime|
Submit a script to PBS
The following is an example of a submission script for a 1-hour, 12-core, OpenMP job submitted to a UV100 node.
#PBS -l select=2:ncpus=6 #PBS -l walltime=01:00:00 #PBS -l place=excl:group=board cd $PBS_O_WORKDIR module load intel export OMP_NUM_THREADS=12 dplace -c 0-11 -x2 ./a.out
Here is an submission script for a 24-hour, 192-core, OpenMP job submitted to the UV1000.
#PBS -l select=32:ncpus=6 #PBS -l place=excl:group=iru #PBS -l walltime=24:00:00 #PBS -q uv1000 cd $PBS_O_WORKDIR module load intel export OMP_NUM_THREADS=192 dplace -c 0-191 -x2 ./a.out
Here is an submission script for a 24-hour, 192-core, MPI job submitted to the UV1000.
#PBS -l select=32:ncpus=6:mpiprocs=6 #PBS -l place=excl:group=iru #PBS -l walltime=24:00:00 #PBS -q uv1000 cd $PBS_O_WORKDIR module load intel mpt mpiexec_mpt -np 192 dplace -c 0-191 -x2 ./a.out
MSI staff have developed a script which will automatically determine what group to select in a batch job given the number of cores required, along with the number of nodes required. The script is located in /soft/koronis/msi/bin/ProcToGroup.sh, and an example of how to call it is given in /soft/koronis/msi/bin/GroupExample.sh, for your reference. For example, executing:
Specific components of the the UV1000 can be selected to run on according to the table below.
|Resource||Cores||Memory (GiB)||NUMA nodes||Description|
|rack||384||1024||64||One rack contains two irus|
|iru||192||512||32||One iru contains two iruhalves|
|iruhalf||96||256||16||One irqhalf contains two iruquadrants|
|iruquadrant||48||128||8||One iruquadrant contains two boardpairs|
|boardpair||24||64||4||One boardpair contains two boards|
|board||12||32||2||One board contains two sockets|
|socket||6||16||1||A single socket on the system|
Checking job status
Job status can be seen using the qstat command
To get a summary of all of your jobs:
To get detailed status of one of your jobs:
qstat -f jobID
To know the estimated time when your job will start to run
qstat -Tw jobID
NOTE: You will not see all jobs in the queue, only your own jobs. Thus it may appear that Koronis is "empty," but it very rarely is. We are working on a way to provide more useful information about queue and node status within Koronis. Once complete, the commands will be documented here.
To get the status of the uv1000 queue:
qstat -Qf uv1000
As you can see in the above examples, resource specifications on Koronis are much different than on other MSI systems. This is primarily the result of using CPU sets. CPU sets are a technology available on SGI's SMP systems that allow multiple jobs to run on the same node without impacting each other's resources. That is, a CPU set is a resource container in which a job runs. CPU sets are required in order to achieve maximum performance on the SGI UV systems in Koronis.
When a job runs on a UV system, a CPU set is dynamically created for that job. The CPU set will be made up of NUMA nodes within the UV system. In this context "node" does not refer to a compute node, but it refers to a processor socket and its associated memory within the UV system. The minimum size of a CPU set is one processor socket. In the UV1000 system, each processor socket contains a processor with 6 cores, and has an associated 16 GiB of memory. Thus, the minimum resources consumed by any job on the UV1000 are 6 cores and 16 GiB of memory. In the UV100 systems, each processor socket contains a processor with 6 cores, and has an associated 32 GiB of memory. Thus, the minimum resources consumed by any job on a UV100 system are 6 cores and 32 GiB of memory.
As cores and memories are so tightly coupled, one must take care to request the appropriate number of cores that will result in the memory required for their job. For example, if your job on the UV1000 has merely 1 process (thus runs on merely 1 core), but it requires 300 GiB of memory, then you will need to request at least 19 NUMA nodes for your job, which is equivalent to 114 cores and 304 GiB of memory. This is where the select statement comes in, as shown in the above examples. The select statement for this job would then be select=19:ncpus=6.
Here is a script example about the use of thread pinning within a CPU set for an OpenMP job using 48 cores:
#PBS -l select=8:ncpus=6 #PBS -l place=excl:group=iruquadrant #PBS -l walltime=24:00:00 #PBS -q uv1000 module load intel # Show the resources allocated to my CPU set cpuset -d . # Turn on some debugging set -xv # Set the stacksize ulimit -s unlimited export OMP_NUM_THREADS=48 export KMP_AFFINITY=disabled export KMP_LIBRARY=turnaround export KMP_BLOCKTIME=infinite cd working_directory /usr/bin/time dplace -c 0-47 -x2 ./a.out
Please note the key word group specifies the needed resource described in the above table.
MSI staff testing on Koronis has determined that performance can sometimes be improved by 50% to 100% without using cpusets, by setting the KMP_AFFINITY environment variable in your batch job before executing an application. An example bash command would be:
Over time, MSI will refine its use of CPU sets to ensure that Koronis is being properly utilized. We ask you for your patience as we explore the nuances of this technology.
There are three categories of storage within Koronis: Home, Project, and Scratch.
Home directories are located at /home/koronis and are suitable for storing source code, small files, text-based job results, and other like data. Home directories are available on all nodes via the network filesystem, NFS. Koronis has 16 TB of storage allocated for home directories.
Koronis-only project spaces
Koronis-only project spaces are located at /cxfs/project[1-9] and are suitable for storing large datasets. As with MSI's central project spaces, a research group should email email@example.com requesting a project space within Koronis. Koronis-only project spaces are on a clustered filesystem called CXFS. CXFS allows all of the Koronis-only project spaces within Koronis to be shared to all Koronis systems at very high bandwidth. However, care must be taken to properly utilize this bandwidth.
Reads from CXFS are very fast, and thus we recommend that it be used to store large input for jobs. Writes to CXFS require one use large-block write operations -- 8 MB and larger. Most applications will by default use 8 KB to 32 KB write operations, which will be quite slow on CXFS. As such, we recommend that jobs output data to scratch space, and then copy their output to Koronis-only project space, if desired. A properly tuned application can achieve 1-3 GB/s of bandwidth to CXFS. Koronis has 500 TB of storage allocated for Koronis-only project spaces.
Central MSI project spaces can be made available on the Koronis interactive nodes by request.
Scratch spaces on Koronis are located on each compute system at /scratch and are suitable for heavy writing during the run time of a job. Scratch spaces use a fast filesystem called XFS. Unlike its clustered counterpart, XFS is capable of very high bandwidth at a much wider range of block sizes. It is for this reason that we recommend most jobs use scratch. Applications writing a lot of data to scratch can achieve 3 to 7.5 GB/s of bandwidth. The scratch spaces on the UV1000 and UV100 systems are 96 TB and 20 TB in size, respectively.
As with scratch on other MSI HPC resources, any data in scratch that is over 14 days old will be purged. To aid users in deleting their unneeded data in scratch, we have made each compute system's scratch space accessible via NFS on Koronis' interactive nodes. The scratch spaces for the UV1000 and UV100 systems are accessible at /scratch/uv1000, /scratch/uvdev1, and /scratch/uvdev2t, respectively, on the interactive nodes.
Ideally, one should migrate their job's useful output from scratch to CXFS project space with a command at the end of their submission script. The simple UNIX cp command can be used to copy very small output to one's home directory, but more intelligent tools are required to do a high-bandwidth transfer to CXFS. At this time, we recommend the bbcp command for copying large output from scratch to CXFS project space.
To use bbcp, you must first load the bbcp module:
module load bbcp
and then use bbcp similarly to how you would use cp:
bbcp /scratch/user/job/output.dat /cxfs/project5/group/output.dat
For very large amounts of data (10s of GB to multiple TB in size), some additional bbcp parameters can be used to improve throughput:
bbcp -B 8M -s 8 /scratch/user/job/output.dat /cxfs/project5/group/output.dat
The above options increase the blocks and number of data streams used by bbcp.
Files in Koronis home and project spaces are backed up nightly at 8pm Central time. If you delete a file that did not exist the last time a backup was run, that file can not be restored. Please email firstname.lastname@example.org with restore requests; include specific location(s) of the file(s) that need to be restored as well as the time frame from which you'd like them restored. Koronis storage does not use the snapshots you may know from central MSI home and project spaces.