Inside the Large Hadron Collider near Geneva

Dealing With Petabytes of Data

Giant “atom smashers” like the Relativistic Heavy Ion Collider at Brookhaven National Laboratory and Europe’s Large Hadron Collider generate enormous amounts of data as many thousands of particles stream from thousands of subatomic smashups every second. With the LHC shifting into full operations, a team of Brookhaven physicists and computer scientists is expanding the limits of GRID computing to keep up with the data.

Working on a daily basis with the Relativistic Heavy Ion Collider (RHIC) for almost a decade, the 40 staff members at Brookhaven's RHIC and ATLAS Computing Facility (RACF) are no strangers to storing and distributing large amounts of data. The numbers will reveal hidden mysteries about the most fundamental forces in the universe, including how they hold all matter together. Experience working with that data has proved invaluable as the world’s most powerful particle accelerator, the Large Hadron Collider (LHC), begins its own exploration of matter’s mysteries.

“The benefit of an integrated facility like this is the ability to move highly skilled and experienced IT experts from one project to the other, depending on what's needed at the moment,” said RACF Director Michael Ernst. “There are many different flavors of physics computing, but the requirements for these two facilities are very similar to some extent. In both cases, the most important aspect is reliable and efficient storage.”

As the sole Tier 1 computing facility in the United States for ATLAS—one of the four experiments at the LHC—Brookhaven provides a large portion of the overall computing resources for U.S. collaborators and serves as the central hub for storing, processing and distributing ATLAS experimental data among scientists across the country. This mission is possible, Ernst said, because of the lab's ability to build upon and receive support from the Open Science Grid project, a national computational facility that allows researchers to share knowledge, data and computer processing power in fields ranging from physics to biology.

Now with 10 petabytes of accessible data in online storage—a capacity 12 times greater than what existed when ATLAS joined the RACF eight years ago—the computing center underwent plenty of testing and problem solving to prepare for LHC operations.

“You can't just put up a number of storage and computing boxes, turn them on, and have a stable operation at this scale,” Ernst said. “Ramping up so quickly presents a number of issues because what worked yesterday isn't guaranteed to work today.”

To test their limitations and prepare for real data, the computing staff participated in numerous simulation exercises, spanning tasks from data extraction to actual analyses physicists might perform on their desktops. In one of the more recent throughput tests with all of the ATLAS Tier 1 centers, Brookhaven was able to receive data from CERN at a rate of 1,100 megabytes per second. At that speed, it would take just 8 seconds to fill an 8-gigabyte iPod with music.

To prepare for the future—as ATLAS and RHIC undergo upgrades to increase collision rates, events become more complex, and data is archived—Brookhaven has built a new facility to house, power and cool the currently used 2,500 machines, which must be replaced with newer models about every three years. This constant maintenance cycle, combined with unexpected challenges from data-taking and data analyses, are sure to keep Brookhaven’s Tier 1 center busy for years to come, Ernst said.
“This is all new ground,” he said. “You work with people from around the world to find a path that carries you for two years or so. Then, the software and hardware changes and you have to throw everything away and start again. This is extremely difficult, but it's also one of the parts I enjoy most.”

The ATLAS grid computing system is a large and complicated endeavor that will allow researchers and students around the world to analyze ATLAS data.

The beauty of the grid is that a wealth of computing resources are available for a scientist to accomplish an analysis, even if those resources are not physically available close to them. The data, software and storage may be located hundreds or thousands of miles away, but the grid makes this invisible to the researcher.

The grid computing infrastructure is made up of several key components. The “fabric” consists of the hardware elements—computing centers, disk storage, tape storage and networking. The “applications” are the software programs that users would employ, for example, to analyze data. Applications take the raw data from ATLAS and reconstruct it into meaningful information that scientists can interpret. Another type of software, called middleware, links the fabric elements together so that they form a unified system—the grid. The development of the middleware is a joint effort between physicists and computer scientists. Outside of high-energy physics, grid computing is used on smaller scales to manage data within other scientific areas such as astronomy, biology and geology. But the LHC grid is the largest of its kind.

Kendra Snyder is a writer at the Brookhaven National Laboratory.