- Test bed
- In service
- Discipline-specific system for Astronomy and Cosmology
- Funded by STFC, DiRAC, ExCALIBUR
- 1 nodes, with 4 AMD MI300A 128GB accelerators per node
- Benchmarks (1) ▾
-
Memory bandwidth (BabelStream):
3648
GB/s
- array_size: 134217728
- iterations: 100
- precision: FP64
- Manufactured by AMD
- Scheduler: Direct SSH
- Interconnects:
COSMA MI300A
COSMA (The Compute Optimised System for Modelling and Analysis) is a High Performance Computing facility hosted at Durham University, operated by the Institute for Computational Cosmology on behalf of DiRAC.
The MI300A node is a GPU testbed within COSMA.
| Node | RAM | CPU | Access |
|---|---|---|---|
| ga008 | 500GB (unified) | 192 cores (AMD EPYC) | Direct SSH |
The MI300A is an APU (Accelerated Processing Unit). The CPU and GPU are on the same silicon and share physical RAM. Therefore, memory copies between CPU and GPU are not necessary which can improve performance for many applications.
The MI300A node has 4x GPUs.
Documentation
- https://cosma.readthedocs.io/en/latest/gpu.html#mi300a
- https://www.amd.com/en/products/accelerators/instinct/mi300/mi300a.html
Gaining access
Access requires a COSMA account, obtained via the DiRAC SAFE portal.
- Create a SAFE account with an institutional email.
- Upload an SSH public key on SAFE. If you do not have one, generate with
ssh-keygen -t ed25519. - Request a login account. This requires selecting a project, either:
- Project
do018for AMD GPU testbed access. - A DiRAC project code for a given allocation (provided by a supervisor).
- Project
- Wait for the account to be approved by the project manager. Keep an eye on your email!
- Connect to COSMA via SSH:
ssh username@login8.cosma.dur.ac.uk(Note: On first login you will be asked to change the password provided in your email)
Visit https://cosma.readthedocs.io/en/latest/account.html for more details. Contact cosma-support@durham.ac.uk for any questions.
Usage
Connect directly via SSH from a login node:
ssh ga008
rocm-smi
./gpu_program_to_run
Restrictions
/cosma5/is not mounted on ga008. Copy binaries to your home directory first.- Nodes are non-exclusive by default (shared with other users). Use
--exclusiveif you require the entire node - The AMD ROCm software stack is installed. ROCm 7.2.0 is available at
/opt/rocm-7.2.0/bin/hipcc - CUDA code must be converted to HIP using the
hipifyscript provided with ROCm