Fujitsu and RIKEN Claim 1st Place for MLPerf HPC Benchmark with Supercomputer Fugaku

Nov 18, 2021 10:34 JST

Source: Fujitsu Ltd

World's fastest performance for the number of deep learning models trained per time unit for CosmoFlow a key machine learning processing benchmark.

TOKYO, Nov 18, 2021 - (JCN Newswire) - Fujitsu and RIKEN today announced that the supercomputer Fugaku took the first place for the CosmoFlow training application benchmark (1), one of the key MLPerf HPC benchmarks for large-scale machine learning processing tasks requiring capabilities of a supercomputer. Fujitsu and RIKEN leveraged approximately half of Fugaku's resources (2) to achieve this result, demonstrating the world's fastest performance in this key benchmark.

MLPerf HPC measures how many deep learning models can be trained per time unit (throughput performance, 3). Software technology that further refines Fugaku's parallel processing performance has achieved a processing speed approximately 1.77 times faster than that of other systems, demonstrating the world's highest level of performance in the field of large-scale scientific and technological calculations using machine learning.

These results were announced as MLPerf HPC version 1.0 on November 17th (November 18th Japan time) at the SC21 High-Performance Computing Conference, which is currently being held as a hybrid event.

Fugaku Claims World's Highest Level of Performance in the Field of Large-scale Scientific and Technological Calculations Using Machine Learning

MLPerf HPC is a performance competition composed of three separate benchmark programs: CosmoFlow, which predicts cosmological parameters, one of the indicators used in the study of the evolution and structure of the universe, DeepCAM (4), which identifies abnormal weather phenomena, and Open Catalyst (5), which estimates how molecules react on the catalyst surface.

For CosmoFlow, Fujitsu and RIKEN used approximately half of the Fugaku system's entire computing resources to train multiple deep learning models to a certain degree of prediction accuracy and measured from the start time of the model that started the training to the end time of the model that completed the training last to evaluate throughput performance. To further enhance the parallel processing performance of Fugaku, Fujitsu and RIKEN applied technology to programs used on the system that reduce the mutual interference of communication between CPUs, which occurs when multiple learning models are processed in parallel, and also optimize the amount of data communication between CPU and storage. As a result, the system trained 637 deep learning models in 8 hours and 16 minutes, a rate of about 1.29 deep learning models per minute.

The measured value of Fugaku claimed first place amongst all the systems for the CosmoFlow training application benchmark category, demonstrating performance at rates approximately 1.77 times faster than other systems. This result revealed that Fugaku has the world's highest level of performance in the field of large-scale scientific and technological calculations using machine learning.

Going forward, Fujitsu and RIKEN will make software stacks such as libraries and AI frameworks available to the public that accelerate large-scale machine learning processing developed for this measurement. Widely sharing the knowledge of large-scale machine learning processing using supercomputers gained through this exercise will allow users to leverage world-leading systems for the analysis of simulation results, leading to potential new discoveries in astrophysics and other scientific and technological fields. These resources will also be applied to other large-scale machine learning calculations, such as natural language processing models used in machine translation services, to accelerate technological innovation and contribute to solving societal and scientific problems.

About MLPerf HPC

MLPerf HPC is a machine learning benchmark created in 2020 by MLCommons, a community that conducts machine learning benchmarks, to evaluate the system performance of a supercomputer for large-scale machine learning calculations, which take an enormous amount of time, to create a performance list of systems that execute machine learning applications. It is used for supercomputers around the world and is anticipated to become a new industry standard.
MLPerf HPC was designed to assess the performance of large-scale machine learning models requiring the use of supercomputers. Performance evaluation was carried out for 3 applications: CosmoFlow, DeepCAM, and Open Catalyst. In addition, a benchmark that measures the number of deep learning models trained per time unit has also been newly established.

All measurement data are available on the following website:
Related links: https://mlcommons.org/

(1) CosmoFLow:
A deep learning model for predicting cosmological parameters from three-dimensional simulation results of dark matter distributed in outer space.
(2) Approximately half of the whole Fugaku system:
Since this measurement was conducted during the operation of Fugaku, the measurement scale was halved in consideration of the impact on other researches using Fugaku.
(3) Measure how many deep learning models can be learned per unit time (throughput performance):
A new measurement method for MLPerf. By learning multiple models simultaneously, the total performance of a supercomputer can be extracted, and by measuring the number of models that can be learned per unit time, it is possible to compare the performance of the entire system of a supercomputer.
(4) DeepCAM:
A Deep Learning Model for Identifying Abnormal Meteorological Phenomena from Global Climate Prediction Simulation Data.
(5) Open Catalyst:
A deep learning model that estimates the relaxation energy of molecules on the catalyst surface from simulation data of atomic and intermolecular reactions.

About Fujitsu

Fujitsu is the leading Japanese information and communication technology (ICT) company offering a full range of technology products, solutions and services. Approximately 126,000 Fujitsu people support customers in more than 100 countries. We use our experience and the power of ICT to shape the future of society with our customers. Fujitsu Limited (TSE:6702) reported consolidated revenues of 3.6 trillion yen (US$34 billion) for the fiscal year ended March 31, 2021. For more information, please see www.fujitsu.com.

About RIKEN Center for Computational Science

RIKEN is Japan's largest comprehensive research institution renowned for high-quality research in a diverse range of scientific disciplines. Founded in 1917 as a private research foundation in Tokyo, RIKEN has grown rapidly in size and scope, today encompassing a network of world-class research centers and institutes across Japan including the RIKEN Center for Computational Science (R-CCS), the home of the supercomputer Fugaku. As the leadership center of high-performance computing, the R-CCS explores the "Science of computing, by computing, and for computing." The outcomes of the exploration - the technologies such as open source software - are its core competence. The R-CCS strives to enhance the core competence and to promote the technologies throughout the world.

Source: Fujitsu Ltd
Sectors: Enterprise IT