Benchmark guidelines (kickstart)

Through of combination of: discussions with researchers, the SHAREing performance assessment seminar and a review of benchmarking literature, this task has collated a proposed set of guidelines for researchers and developers to design their own benchmark. We here list a condensed set of recommendations. We believe that a benchmark should:

capture several components of the mathematical or algorithmic features of the software
be representative of a typical job run, e.g., including realistic I/O calls
be configurable by the performance analyst, i.e., the example job should be easily scalable and should allow for features such as I/O to be switched off or on
include documentation which explains how to build and run the benchmark, including details on compilers, libraries, etc. and their versions
not be too small, as this will behave more like a synthetic benchmark. It is likely easier for a performance analyst to reduce the size of the benchmark, than increase it in any meaningful way. For the full document discussing these guidelines further, please see the document linked below.

Outcomes

This is a first iteration of the SHAREing benchmarking guidelines, suggestions are highly encouraged!

Document outlining the proposed SHAREing benchmark guidelines
- Attachment: SHAREing proposed benchmark guidelines LaTeX

Fit to programme

This was a proposed solution answering Task 005: DBenchmark Guideline, behind WP 1.2.

Description

SHAREing is working toward a generic and simple performance assessment service. For this, the project has to curate guidelines. The objective of this task is to kickstart the curation of a benchmarking guide that SHAREing can continue to iterate and improve. These guidelines describe a proposed set of characteristics for a benchmark. In brief, this tasks posits that a benchmark should accurately describe features of research software, whilst also being configurable for a rigorous analysis.