"The Frontier system architecture embodies the compute and data intensive capabilities required to unlock the full potential of the exascale era", stated Jeff Nichols, Associate Lab Director at ORNL. "The power and flexibility of the system will enable the creation of new converged HPC, analytics, and AI applications across the full breadth of the exascale computing programme's mission."
Shasta supercomputers are set to be the technology underpinning the exascale era, which is characterized by a deluge of new data and a convergence of modelling, simulation, analytics, and AI workloads. To enable this fusion of workloads to run simultaneously across the system, Slingshot was designed to incorporate intelligent features like adaptive routing, quality-of-service, and congestion management. Frontier will utilize Cray's new Shasta system software for monitoring, orchestration, and application development to provide a single developer interface across the system. The new software stack is a fully containerized architecture that combines the scalability and performance of HPC while enabling the productivity and portability of Cloud.
"Exascale systems demand a complex balance of compute, interconnect, and software capabilities to enable HPC and AI applications to execute simultaneously and with optimal performance", stated Steve Scott, CTO at Cray. "This poses a number of architectural challenges across the entire HPC space ranging from the development of new high density compute infrastructure, to modernizing developer software for the creation of extreme scale, data-intensive applications. Delivering these technologies for Frontier is incredibly exciting, as they will also become standard product offerings from Cray, enabling us to deliver enhanced performance and productivity to businesses large and small."
In addition to the capabilities native to the Shasta system, Cray has also been awarded a separate joint development contract to pursue new foundational technologies for the Frontier system. This includes the development of new high-density compute infrastructure, enhancements to HPC developer tools for GPU scaling and AI, and the creation of a Center of Excellence to establish best practices for exascale application development and tuning.
Current approaches to delivering dense GPU compute form factors have hit limitations in packaging density due the amount of power that can be delivered to a blade, resulting in more datacenter floorspace being required to deliver comparable performance. To reach sustained exaflop performance, the Frontier system will transcend those limitations with powerful and dense compute and cabinet infrastructure capabilities. For Frontier, Cray is designing a new AMD EPYC CPU and Radeon Instinct GPU powered blade for the Shasta high-density cabinet. Cray will also engineer new high-efficiency power delivery and integrated direct liquid cooling capabilities for key server components to ensure high operational energy efficiency and low total cost of ownership.
"The Frontier design is a marvel of engineering and AMD is proud to be bringing its technical innovation to the project in conjunction with Cray, Oak Ridge National Lab and the Department of Energy", stated Mark Papermaster, executive vice president and chief technology officer, AMD. "AMD has a long history of pushing the boundaries of compute performance and working with DOE on advanced exascale research. I'm very excited to see a combination of custom AMD EPYC CPUs, purpose built Radeon Instinct GPUs, and our open software development tool set selected to power this amazing machine."
To enable developer productivity, users will require a high-level software development environment with tightly-coupled compilers, tools, and libraries which abstract away system complexity. The Cray Programming Environment (Cray PE) has delivered these core capabilities for Cray users for decades and, as part of this programme, will see a number of enhancements for increased functionality and scale.
This will start with Cray working with AMD to enhance these tools for optimized GPU scaling with extensions for Radeon Open Compute Platform (ROCm). These software enhancements will leverage low-level integrations of AMD ROCm RDMA technology with Cray Slingshot to enable direct communication between the Slingshot NIC to read and write data directly to GPU memory for higher application performance. Finally, to provide a seamless developer workflow, Cray PE will be integrated with a full machine learning software stack with support for the most popular tools and frameworks. Taken together, the rich HPC development capabilities of Cray PE, in combination with an optimized and scalable data science suite, will enable developers to fully embrace the converged use of analytics, AI, and HPC at extreme scale for the first time.
To further accelerate user adoption of the system, a Center of Excellence will be established by Cray and Oak Ridge National Lab to drive collaboration and innovation, and to assist in the porting and tuning of key DOE applications and libraries for the Frontier system. This will include collaborative modernization of new and legacy code to support directive-based programming models such as OpenMP, and delivering training and workshops for hands-on learning of how to fully leverage the system. This collaboration will ensure that best practices are defined and disseminated quickly to further accelerate development of exascale-class applications.
"This is another major win for Cray and means that in 2021 America's top two supercomputers and most powerful entries in the global exascale race will use the Cray 'Shasta' architecture", stated Steve Conway, Hyperion Research senior vice president of research. "This architecture is designed to support the extreme heterogeneity needed for future HPC and AI workloads."
Cray Shasta systems fuse the performance and scale of supercomputing with the productivity of Cloud computing and full datacenter interoperability. By providing a flexible compute infrastructure, a modular and containerized software architecture, and an intelligent and ethernet-capable system interconnect, Shasta supercomputers seamlessly bridge the worlds of extreme scale advanced research and enterprise datacenters for the first time.
The contract award includes technology development funding, a centre of excellence, several early-delivery systems, the main Frontier system, and multi-year system support. The Frontier system is expected to be delivered in 2021 and acceptance is anticipated in 2022.