• Test bed
  • In service
  • Discipline-specific system for Astronomy and Cosmology
  • Funded by STFC, DiRAC, ExCALIBUR
  • 1 nodes, with 4 AMD MI300A 128GB accelerators per node
  • Benchmarks (1) ▾
    • Memory bandwidth (BabelStream): 3648 GB/s
      • array_size: 134217728
      • iterations: 100
      • precision: FP64
  • Manufactured by AMD
  • Scheduler: Direct SSH
  • Interconnects:

COSMA MI300A

COSMA (The Compute Optimised System for Modelling and Analysis) is a High Performance Computing facility hosted at Durham University, operated by the Institute for Computational Cosmology on behalf of DiRAC.

The MI300A node is a GPU testbed within COSMA.

Node RAM CPU Access
ga008 500GB (unified) 192 cores (AMD EPYC) Direct SSH

The MI300A is an APU (Accelerated Processing Unit). The CPU and GPU are on the same silicon and share physical RAM. Therefore, memory copies between CPU and GPU are not necessary which can improve performance for many applications.

The MI300A node has 4x GPUs.

Documentation

Gaining access

Access requires a COSMA account, obtained via the DiRAC SAFE portal.

  1. Create a SAFE account with an institutional email.
  2. Upload an SSH public key on SAFE. If you do not have one, generate with ssh-keygen -t ed25519.
  3. Request a login account. This requires selecting a project, either:
    • Project do018 for AMD GPU testbed access.
    • A DiRAC project code for a given allocation (provided by a supervisor).
  4. Wait for the account to be approved by the project manager. Keep an eye on your email!
  5. Connect to COSMA via SSH: ssh username@login8.cosma.dur.ac.uk (Note: On first login you will be asked to change the password provided in your email)

Visit https://cosma.readthedocs.io/en/latest/account.html for more details. Contact cosma-support@durham.ac.uk for any questions.

Usage

Connect directly via SSH from a login node:

ssh ga008
rocm-smi
./gpu_program_to_run

Restrictions

  • /cosma5/ is not mounted on ga008. Copy binaries to your home directory first.
  • Nodes are non-exclusive by default (shared with other users). Use --exclusive if you require the entire node
  • The AMD ROCm software stack is installed. ROCm 7.2.0 is available at /opt/rocm-7.2.0/bin/hipcc
  • CUDA code must be converted to HIP using the hipify script provided with ROCm