And Now for the Hardware: LLNL's BlueGene Is Still the Fastest

In November, BlueGene/Light retained its ranking as the world's fastest supercomputer for the sixth straight time, clocking 478.2 trillion floating operations per second (teraFLOPS) on the industry-standard Linpack benchmark.
The new Top500 ranking was released Nov. 13 at the annual Supercomputing conference (SC07) held this year in Reno, Nev.

Designed and built by IBM in partnership with National Nuclear Security Administration scientists, BlueGene/L is emblematic of how far high-performance computing has come in the last decade and how it has transformed science for national security and basic science research as well as the industry itself.
Today, simulation is as integral to the scientific discovery as theory and experiment. The process of this transformation has helped to revitalize a once moribund supercomputing industry.

BlueGene/L is housed at the Lawrence Livermore National Laboratory and serves the Advanced Simulation and Computing (ASC) program, a cornerstone of NNSA's Stockpile Stewardship program, the effort to ensure the safety, security and reliability of the nation's nuclear deterrent without nuclear testing. ASC unites the scientific computing expertise and resources of Los Alamos, Sandia and Lawrence Livermore national labs. ASC's largest computing systems, such as BG/L, are managed as shared resources for use by code developers, designers and analysts at all three labs.

BG/L was recently expanded to meet the ASC's growing programmatic needs, from a 280 teraFLOP/s (360 teraFLOPS peak) to its current configuration and 596 teraFLOPS peak speed. In partnership with IBM, the machine was scaled up from 65,536 to 106,496 nodes in five rows of racks; the 40,960 new nodes have double the memory of those installed in the original machine.

"The demand for high performance computing resources is growing faster than our ability to meet that demand," said Mark Seager, head of advanced computing technologies for Livermore's Computation Directorate. "Growing demand for HPC resources reflects the increasingly important role simulation plays globally in science and technology research and development."

The ability to simulate physical and chemical phenomena at the atomic scale has become more urgent as the nuclear weapons stockpile has aged beyond the cessation of testing in 1992 and a ban on new weapons systems. Underground testing was an important part not only of testing new designs, but of ensuring the viability of existing weapons in the nuclear arsenal.

Accelerating Simulation

Weapons scientists and engineers in 1995 estimated that to achieve reliable simulations would require a million-fold increase in computing power over the following 10 years. The Accelerated Strategic Computing Initiative, or ASCI, was created by the Department of Energy to focus the computational efforts of the three weapons laboratories on achieving a 100 teraFLOP/s system, regarded then as the threshold of reliable simulations. With the creation of NNSA in 2000 within DOE, ASCI evolved into the Advanced Simulation and Computing program.

"The idea was to leverage existing commercial technology and for each of the three national labs to try different approaches to achieving a 100 teraFLOP/s system and laying the foundation for petaFLOPS (quadrillion FLOPS) computing," Seager said.

The hardware platform strategy adopted was to establish a series of partnerships with U.S. computer companies to leverage their business models to accelerate high-end computing power. ASC systems from IBM and Cray and other high performance computer (HPC) manufacturers pushed the boundaries of scientific computing—€”systems such as "Q," Blue Mountain, Blue Pacific, Red Storm and White.
During this period, LLNL followed competitive procurement processes that led to a series of partnerships with IBM, which resulted in the development of a series of increasingly powerful computer systems, culminating in delivery of BG/L and ASC Purple.

BGL takes a radically different approach from its predecessors, employing a cell-based design. It's a scalable architecture that allows the computational power of the machine to be expanded by adding more building blocks, without the introduction of bottlenecks as the machine scales up. The original system uses systems-on-a-chip technology and low-cost, low power embedded microprocessors.

The unique design requires less power, 1.8 megawatts, and less floor space (2,500 sq. ft.), than conventional supercomputers. These are important elements in the context of developing a possible approach to building the petaFLOP/s systems required to fully develop "predictive simulation" capabilities.
Predictive simulation is the new frontier in high- performance computing and allows researchers to understand how complex physical, chemical and biological systems behave over time, where it was previously only possible to get brief snapshots at a smaller scale. The ability to conduct predictive simulations is of interest to NNSA scientists seeking to understand the effects of aging on nuclear weapons and to the broader scientific community for such complex problems as understanding climate change over hundreds of years, and the development of new materials and nanotechnologies.

Computer scientists from Livermore worked closely with IBM on the design and development of the first BG/L machine to ensure the concerns of the end users were addressed as part of the design process. This also helped accelerate integration of the machine and delivery to the program for production.

BG/L's three-year reign as the world's fastest supercomputer has seen significant progress in code development and the achievement of numerous milestones for NNSA's stockpile stewardship program. For example, simulations on BG/L helped answer critical questions about plutonium aging—€”a key to understanding the life expectancy of nuclear weapons systems.

"Since BG/L went into production in early 2006 it has performed beyond our expectations and delivered for the ASC program. BG/L's architecture has proven suitable for a much broader range of applications than originally envisioned," said Dona Crawford, associate director for computation, at the time the last Top500 list was released in June of this year.

While NNSA has the largest BlueGene system, other BlueGene systems are making their mark in research institutions around the world. IBM's BlueGene systems dominated the top 10 of recent Top500 lists. IBM's system, first developed through the ASC program, is now being used for scientific research at centers throughout the United States, Europe and other parts of the world.

"The partnership between ASC and IBM, bringing together code and hardware developers, has had an impact on the research community well beyond NNSA's mission of stockpile science. Focused programmatic efforts, such as stockpile stewardship, to developing new technology often lead to spin-off benefits for science and industry. NASA's program to put a man on the moon is another fine example of this," said Seager. "What we're finding is that innovative architecture work for ASC leads to low-cost, but highly useful computers that benefit the nation well beyond national security."

"If you look at HPC today, the cost per teraFLOP/s of high end systems has come way down making supercomputers more accessible to the broader science and technology community," he said.

Don Johnston is a Lawrence Livermore National Laboratory public information officer.