Cosma A100

Test bed
In service
Discipline-specific system for Astronomy and Cosmology
Funded by STFC, DiRAC, ExCALIBUR
Partitions

2 nodes, with 1 NVIDIA A100 40GB accelerator per node

Benchmarks ▴ (1) ▾

Memory bandwidth (BabelStream): 1352 GB/s
- array_size: 134217728
- iterations: 100
- precision: FP64

Manufactured by Dell
Scheduler: Slurm

1 nodes, with 1 NVIDIA A100 40GB accelerator per node

Manufactured by Dell
Scheduler: Direct SSH

Interconnects: Infiniband HDR200, Liqid composable fabric

COSMA A100

COSMA (The Compute Optimised System for Modelling and Analysis) is a High Performance Computing facility hosted at Durham University, operated by the Institute for Computational Cosmology on behalf of DiRAC.

The A100 nodes are GPU testbeds within COSMA.

Node	RAM	CPU	Access
mad04	4TB	128 cores (AMD EPYC)	Slurm (`cosma8-shm`)
mad05	4TB	128 cores (AMD EPYC)	Slurm (`cosma8-shm`)
mad06	1TB	128 cores (AMD EPYC Milan-X)	Direct SSH

The A100 GPUs (3 total) are connected via a Liqid composable fabric, allowing them to be moved between these nodes. The default configuration is 1 GPU per node. To request a different configuration, contact cosma-support@durham.ac.uk.

Documentation

Gaining access

Access requires a COSMA account, obtained via the DiRAC SAFE portal.

Create a SAFE account with an institutional email.
Upload an SSH public key on SAFE. If you do not have one, generate with ssh-keygen -t ed25519.
Request a login account. This requires selecting a project, either:
- Project do016 for NVIDIA GPU testbed access.
- A DiRAC project code for a given allocation (provided by a supervisor).
Wait for the account to be approved by the project manager. Keep an eye on your email!
Connect to COSMA via SSH: ssh username@login8.cosma.dur.ac.uk (Note: On first login you will be asked to change the password provided in your email)

Visit https://cosma.readthedocs.io/en/latest/account.html for more details. Contact cosma-support@durham.ac.uk for any questions.

Usage

For mad04 and mad05, jobs are submitted via Slurm to the cosma8-shm partition:

#!/bin/bash
#SBATCH --partition=cosma8-shm
#SBATCH --account=do016
#SBATCH --time=01:00:00
#SBATCH --constraint=gpu

nvidia-smi # checks existence of GPU
./gpu_program_to_run

--constraint=gpu ensures you’re given a node with a GPU.

--include and --exclude can be used as SLURM parameters to specify particular nodes.

For mad06, connect directly via SSH from a login node: ssh mad06

Restrictions

Maximum wall time: 3 days
Nodes are non-exclusive by default (shared with other users). Use --exclusive if you require the entire node
CUDA is available via module load nvhpc/25.11