by Peter Dzwig
A perennial problem has been how to benchmark clusters – and if benchmarked what indeed does it mean. Oh and can I do it cheaply and easily and without any preconceived notions of what the system should be?
Well, we now have the beginings of an answer, to which people are invited to contribute. Raul Gomez has announced the beginings of a Sourceforge project to create such a product, based upon the results of his thesis project. The project, called ClusterNumbers. It must be made clear that ClusterNumbers isn't unique in the world, but it is open source and based upon some sound starting points. See, for example, those in The HPC Challenge of Luszczek, Dongarra et al.
The idea is that any user with a modest level of skill ought to be able to benchmark their system in order to be able to have a grip on how their distributed application should run on a particular cluster. The tool therefore should address itself to the major issues affecting cluster performance and make them readily accessible via a single user interface, and offer to the user a basic configuration for carrying out the benchmark and identify factors impacting performance.
The core of ClusterNumbers is a set of packaged benchmarks for CPU, memory, networking, etc., that can be accessed either individually or as a whole. These are HPL (High Performance Linpack) across the cluster; DGEMM providing matrix multiplies on cluster node; FFTE tests for CPU execution rates by running discrete FFTs; STREAM to measure CPU/memory performance; disk performance is measured using IOTRANS; and network capability under various loading is measured using Netperf and PTRANS. The observant reader will have noted that many of these are scientifically-oriented benchmarks and the majority are those used in The HPC Challenge, nonetheless here the aims are somewhat different and they are being supplemented by others.
ClusterNumbers allows the user to select the kind of benchmarks to be run from a PC window that communicates with a daemon that runs on the cluster's admin node. The selection is then made to run the appropriate subsets of the benchmarks listed above.
Getting the members of the FOSS community involved through an Open Source project in Sourceforge indeed seems a logical step and according to Gomez there has already been a strong response from the HPC community. And therein lies an issue. While without doubt the high performance community lead a lot of fields and stress systems in ways that make their contributions to projects such as these invaluable, it is important that those who run clusters in other environments contribute. Their presence would give ClusterNumbers a broad following and ensure that Gomez' work is not “just another Sourceforge project”. After all some of the most intensive users of clusters in the world are very much not conventional HPC users although their systems are certainly high-throughput. Perhaps they should include something like DBT2 or an appropriate derivative as a starter.
The first steps are to create a roadmap. The wider the input at that stage, the better for the long term viability of the technology. I urge those outside the HPC community to make their input in the interests of giving ClusterNumbers a wider user base than might otherwise be the case.