At the SC21 conference, a Chinese team claimed the prestigious Gordon Bell prize (roughly analogous to the Nobel prize for supercomputing). The team’s paper, “Closing the Quantum Supremacy Gap: Achieving Real-Time Simulation of a Random Circuit Using a New Sunway Supercomputer,” describes how they used a new supercomputer to simulate a random quantum circuit.
In fact, this is the same benchmark that led Google to controversially claim “quantum supremacy” for its Sycamore quantum computer in 2019. Sycamore managed the benchmark in 200 seconds, saying a classical supercomputer like Summit would need 10,000 years (this was refuted by IBM at the time, who said the real figure was closer to 2.5 days). The new Chinese supercomputer, combined with the algorithmic work of the Chinese team, can do it in 304 seconds. It seems quantum still has the lead over classical supercomputers, but the gap is closing.
The paper describes the single-precision performance of the Sunway-based supercomputer at 1.2 Exaflops. While this doesn’t officially confirm China’s rumoured Exascale capabilities (1.2 Exaflops is single precision; the term “Exascale” requires 1 Exaflops double precision performance), it does seem likely that this new supercomputer is one of the world’s most powerful. While this supercomputer hasn’t been named, we do know the research used 41.9 million Sunway RISC processor cores for computation.
Widely expected to debut the world’s first Exascale system, the Top500 HPC benchmark results instead revealed little has changed in the top 10. China did not enter its Sunway-based system (as above), and the US’s 1.5-Exaflops system Frontier, expected to come online in “late 2021,” appears not to be ready yet.
With China and Frontier as no-shows, then, the Top500 is still topped by reigning champion Fugaku – a position it has held since June 2020. The Japanese supercomputer’s HPL (high performance Linpack) benchmark score is 442 Pflops/sec, which exceeded the performance of the world #2, Summit, by a factor of three.
In fact, there was little change in the top 10, with the only new entry Microsoft Azure’s Voyager-EUS2, which came in tenth place. This system is based on AMD Epyc Rome CPUs and Nvidia A100 GPUs.
Notable new entries this round include four Russian systems, in positions between 19 and 43.
Overall in this round of scores, China dropped from 186 systems in the top 500 to 173 while US systems increased from 123 to 150.
MLPerf HPC Scores
Results for AI benchmark suite MLPerf HPC were also announced. These benchmarks specifically measure AI performance; AI is becoming an increasingly large portion of scientific workloads. Compared to the last submission round, the best benchmark results improved by 4-7X, showing substantial improvement in hardware, software, and system scale.
All except one submission were powered by Nvidia GPU accelerators, including P100, V100 and A100 (the score from Fugaku is powered by its Arm-based CPUs without accelerators).
CosmoFlow and DeepCAM benchmarks were won by Nvidia. CosmoFlow is used for physical quantity estimation from cosmological image data. The winning CosmoFlow score was 8.04 minutes to train using 1024 Nvidia A100-SXM4-80GB GPUs. (512 of Fugaku’s CPUs managed it in 114.35 minutes).
DeepCAM is used to identify hurricanes and atmospheric rivers in climate simulation data. The winning score was Nvidia again, this time with twice the number of the same GPUs, in 1.67 minutes.
The new OpenCatalyst benchmark was won by Lawrence Berkeley National Laboratory – using 512 of the 40GB version of the same Nvidia GPUs – its training time was 111.86 minutes. OpenCatalyst is used to predict energies of molecular configurations based on graph connectivity. The inclusion of graph networks was said by submitters to be important because it reflects the state of the art for materials science and chemistry workloads. Their computational characteristics are different from other types of neural networks as they tend to be sparse, and different datasets will result in networks with different structure and connectivity, which can result in load imbalance (where loads are difficult to parallelise efficiently).
A new performance metric was also introduced. Weak scaling mode means systems can train multiple instances of the same model concurrently. The idea is to capture the impact on shared resources such as storage system and interconnect.
Some of the biggest announcements from semiconductor companies came just before SC21.
In the days preceding SC, AMD unveiled the first ever multi-die GPU. The AMD Instinct MI200 will have two GPU die connected via a new 2.5D silicon bridge technology – elevated fanout bridge (EFB) – which the company said enables standard substrates and assembly techniques (unlike competing embedded silicon bridge architectures).
MI200 will be the first GPU to be built on AMD’s second-generation CDNA2 architecture, which is optimized for compute-intensive HPC and AI workloads. Compared to last year’s first-generation product, the MI100, the new device is 1.8X bigger, with 220 compute units and 880 matrix cores. MI200 will have up to 8 stacks of HBM2e memory, making it the first GPU with 128 GB of HBM2e. This will give it 4.7X the memory capacity and 2.7X the memory bandwidth of the MI100. Peak performance will be 47.9 TFLOPS for FP64 vector operations and 95.7 TFLOPS for FP64 matrix maths.
It was also revealed that the US’s first Exascale supercomputer, Frontier, will use AMD Instinct MI200 GPUs.
“As we think about the most important challenges facing our generation, energy transitions, climate change, and issues that we are currently facing tackling the pandemic, Frontier is going to allow us to tackle these important challenges using the capability of the machine, driven and powered by AMD processors,” said Thomas Zacharia, laboratory director at Oak Ridge National Laboratory, where Frontier will be located. “This makes MI200 the most powerful processor that’s ever been made available to scientists. A single GPU is more powerful than an entire node of Summit, which is currently the fastest supercomputer in the United States.”
Zacharia said that Frontier will be coming online soon and that it will be available to scientists early next year.
In his keynote speech opening Nvidia’s GTC conference, Nvidia CEO Jensen Huang revealed the company will build a new supercomputer, Earth 2 in order to make a digital twin of our planet in order to simulate and predict climate change. Huang later revealed that Earth 2 will be completely funded by Nvidia, and will be around the same size as Nvidia’s in-house Selene supercomputer and Cambridge-1 UK installation, which is used for medical research. He added that Earth 2’s architecture will make it “the most energy efficient supercomputer ever created,” and that he is still deciding on the physical location.