By Arnie Heller
High-Performance Computing Takes Aim at Cancer
Combining extraordinary processing capability with enormous storage capacity and advanced simulation and analytical software, supercomputers have become essential to national security, scientific discovery, engineering, technology, and industry. Some of the world’s most powerful supercomputers are located at Lawrence Livermore National Laboratory, where they support the National Nuclear Security Administration’s Stockpile Stewardship Program and make possible advances in areas such as materials science, chemistry and energy, among others.
Livermore researchers have recently been calling national attention to applying the power of high-performance computing (HPC) to biology. According to Dave Rakestraw, head of Livermore’s chemical, biological and explosives security program, the laboratory is fostering collaborations across academia, industry, and government that promote HPC as a revolutionary approach to improved understanding of human health. The effort focuses on countering biosecurity threats, overcoming infectious disease challenges, and laying foundations for the future of critical care.
Now, a partnership between the Department of Energy and the National Cancer Institute is applying the formidable computing resources at Livermore and other national laboratories to advance cancer research and treatment. Announced in late 2015, the effort will help researchers and physicians better understand the complexity of cancer, choose the best treatment options for every patient, and reveal possible patterns hidden in vast patient and experimental data sets.
The DOE–NCI agreement features three pilot programs that bring together nearly 100 cancer and biomedical researchers, computer scientists, and engineers. Livermore researchers are playing important roles in all three programs. Participants also include Argonne, Los Alamos and Oak Ridge national laboratories; NCI’s Frederick National Laboratory for Cancer Research and the Department of Veterans Affairs.
“One of the goals of this partnership is to bring about a huge shift in how biological and medical research will be performed in the future,” says Fred Streitz, director of Livermore’s High Performance Computing Innovation Center. “We are investing in the computational tools needed to move the medical community toward a predictive approach to cancer,” he says. “Such tools may help explain why one cancer treatment is successful with one patient but fails with the next.” In that respect, the DOE–NCI partnership supports former President Barack Obama’s Precision Medicine Initiative, which promotes developing treatments for various medical conditions that take into account patients’ individual variability in genes, microbiomes (the collection of microbes in or on the body), environment, health history, lifestyle, and diet.
The partnership is also a key element of the National Cancer Moonshot Initiative, which, under the direction of former Vice President Joe Biden, sought to double the rate of progress in the understanding, prevention, diagnosis, and treatment of cancer. On June 28, 2016, a summit for the Cancer Moonshot was held at Howard University in Washington, D.C., that joined Biden with more than 350 researchers, oncologists, and care providers.
Jason Paragas, Livermore’s director of innovation, was instrumental in bringing together high-level officials for the DOE–NCI cancer research partnership. He notes that the agreement is aligned with the National Strategic Computing Initiative, which is designed to ensure the United States continues leading the world in HPC over the coming decades. “NCI understands that the complexity of cancer initiation and growth demands the same computational approaches Livermore has spent decades developing for both national security and scientific discovery,” says Paragas. “NCI managers recognize that the newer computational architectures inside the latest machines provide an opportunity to think about biology in a novel way by combining the best of simulation and data science.”
According to Jim Brase, deputy associate director for science and technology in Lawrence Livermore’s Computation Directorate (who also serves as the laboratory’s point of contact for the three pilot programs), this partnership underscores how Livermore can work closely with research partners to advance medical breakthroughs. “Our expertise is in computing, not cancer,” he says. “Medical advances in this area require an effective partnership with NCI.”
Data May Reveal Patterns
Advanced data analytics—an approach that uses machine-learning algorithms to search for connections within vast amounts of data—is a key component of the DOE–NCI research. Recently, Livermore-developed deep-learning networks, based loosely on neural pathways in the human brain, have been used to create advanced models based on patterns buried deep within data sets. Streitz says, “Merging data analytics and simulation could potentially transform how we do scientific research.”
All three DOE–NCI pilot programs will develop advanced data analytics for large sets of patient, drug, experimental, and other cancer-related data to uncover correlations that are too complex for humans to discern. Each pilot also will be applying uncertainty quantification, a statistical process that increases confidence in the conclusions drawn from data analytics. The process, improved over the years by Lawrence Livermore weapons scientists, has been highly effective in stockpile stewardship work for assessing the expected performance of nuclear weapons systems without nuclear testing.
Together, the pilot programs are aimed at improving drug therapy for cancer patients, simulating human RAS genes and proteins (which affect cell signaling) to facilitate cancer drug development, and analyzing extremely large NCI databases to optimize cancer therapies. The pilots will also identify requirements for future supercomputer architectures and data analytics software.
Learning from Cancer Cell Cultures
The first pilot program is led by Rick Stevens at Argonne National Laboratory and Jim Doroshow at NCI, with bioinformatics scientist Jonathan Allen heading Livermore’s participation. This team aims to outperform current methods for selecting cancer treatments through the development of algorithms that produce powerful new predictive models. The work includes both statistical and mechanistic models (how tumor cells promote unchecked cell growth and how cancer drugs interact with those cells). The models are expected to help researchers speedily and inexpensively predict the effectiveness of potential cancer drugs and more quickly identify and evaluate promising new pharmaceuticals. The pilot program also promises to provide new insights into tumor biology and critical cancer pathways.
For several years, Allen has been working on methods to rapidly detect and characterize pathogenic organisms such as viruses, bacteria, and fungi. Allen’s team previously developed the Livermore Metagenomic Analysis Toolkit (LMAT), a group of software programs that quickly compares metagenomic data (environmental genetic material) to large collections of already sequenced human and microbial genomes. LMAT uses unique search algorithms that exploit large memory computer architectures such as those being implemented for the DOE–NCI research.
The computer models will be based on well-documented data generated by numerous cell lines—populations of cells taken from different human tumors and grown and maintained in a laboratory. Allen says, “We will look for key patterns such as molecular signatures that correlate with certain outcomes to build a model of the drugs’ effectiveness at countering tumor growth.”
The researchers will start with the NCI-60 tumor cell line to determine the tumors’ response to thousands of available drugs. This group of 60 different human tumor cell lines includes leukemia, melanoma, and cancers of the lung, colon, brain, ovary, breast, prostate, and kidney. Studying tumor cell cultures is critical because “it’s difficult to know what’s happening inside a human tumor,” explains Allen.
Researchers expect to add data from other cell lines and from patient-derived xenograft (PDX) models, wherein cells from human tumors are transplanted to mice, to better capture details of how tumors grow and respond to different treatments. The long-range goal is to have more than 1,000 PDX models available for screening to study the tumors’ heterogeneity. The resulting model repository will be used to characterize tumor viability and provide a computerized platform for testing new drugs.
Modeling Cancer Initiation Events
Livermore’s Streitz and Dwight Nissley at NCI lead the second pilot program, which promises to deliver the computational advances necessary for understanding cancer initiation in RAS proteins located in cell membranes. Found in all human cells and organs, these proteins are involved in transmitting signals within cells and regulating diverse cell behaviors. When a RAS protein is switched on, it activates other proteins, which then trigger other genes involved in cell growth, differentiation, and survival. Under normal function, a RAS protein switches off after other proteins are switched on. However, RAS gene mutations can lead to proteins’ permanent activation. These mutations are responsible for up to 30 percent of all human cancers, including some of the most deadly forms, such as pancreatic.
The fundamental mechanism by which RAS proteins initiate uncontrolled cell growth is still a mystery. NCI has large amounts of data on the physical, chemical, and biological characteristics of RAS genes and proteins, data which were obtained through x-ray crystallography, cryoelectron microscopy, and other imaging techniques. The team will couple experimental data with atomic-resolution molecular dynamics simulations to build a model of RAS protein biology in varying types of cell membranes. The RAS model will permit easy manipulation of particular tissues and simulate the effects of environmental and genetic factors present in human populations or specific to individuals.
According to Streitz, a major advance in this area of research would be a comprehensive approach to explain the mechanisms of the protein and the onset of cancer. “Although RAS is found only in the cell membrane, it starts a cascade of events that involves many processes happening simultaneously,” he says. “These events are not linear, so they cannot be modeled sequentially or simply. However, we can simulate the membrane environment and explore how it operates and interacts with other proteins and with cancer drugs.”
The models will use machine-learning algorithms combined with uncertainty quantification to optimize the simulations of RAS interactions with RAF (a protein activated by RAS). The investigators plan to use the model’s ability to predict the fundamental mechanism of RAS-driven cancer initiation and growth in the various tissue types to identify potential treatments for inhibiting RAS activation in normal cells.
As part of this effort, the team is developing algorithms that will automatically switch between atomistic and coarse-grained molecular dynamics, in essence, optimizing the resolution to maximize fidelity yet minimize run time. In addition, they will explore algorithms capable of autonomously generating hypotheses about signaling mechanisms. The hypotheses will then be validated through simulation, possibly identifying potential drug therapy sites among thousands of possible configurations. “This capability will be nothing short of revolutionary,” says Streitz. “It will change the way we use predictive simulations.”
Going Deep into Patient Records
The third pilot program, led by Gina Tourasi at Oak Ridge and Lynn Penberthy at NCI, takes a population-wide approach to cancer research. The research team is analyzing cancer patients’ medical records to better understand treatment outcomes on a large scale. Livermore computational biologist and team member Todd Wasson notes that patient privacy will be strictly observed. The team has begun studying 500,000 medical records from four states—Washington, Louisiana, Georgia, and Kentucky. The records are provided by NCI’s Surveillance Epidemiology and End Results (SEER) program, which has been collecting data on cancer patients since 1973.
This pilot aims to develop processing tools for analyzing many different sets of medical records. Powerful machine-learning tools will search the data for patterns of how genetics, environment, lifestyle, and quality of health affect the progression, recurrence, and survival of cancers. The data include patient characteristics, pathology reports, specific treatment, survival, and cause of death. Since clinical text varies in writing style and expression, algorithmic development will focus on advanced machine-learning and deep-learning techniques to extract relevant features from clinical reports. In particular, investigators will be implementing natural-language processing, which enables computers to derive meaning from reports written in human languages. The machine-learning approaches also could be augmented with genomic data, images, and medical claims.
The results will help scientists improve cancer care at various levels—individuals, an entire population, or subgroups where there may be disparities in outcome. Investigators plan to produce an unprecedented predictive simulation capability. “We want to obtain a deeper understanding of cancer drivers and outcomes in the population,” says Wasson. “We’ll be looking at how different cancers respond to the same treatment and how a single type of cancer responds to different treatments.”
He says the long-term goal is to support personalized therapies, as part of the Precision Medicine Initiative. “We want to provide oncologists greater confidence when they recommend a particular treatment based on the type of cancer and the individual. We don’t know what we will discover,” says Wasson. The pilot program also is expected to advance machine-learning algorithms and scalable deep-learning tools for CORAL-class supercomputers and exascale-computing platforms to permit efficient analysis of the millions of records expected annually in the cancer surveillance program.
Partnerships Are Critical to Success
The expected collaborations between biomedical researchers and clinicians and HPC teams will likely change the culture of medical research, according to Brase. “You need a big team to write codes and validate them,” he observes. This approach points to the philosophy of E. O. Lawrence, who more than 60 years ago invented “team science,” the proven concept of assembling a highly focused team of investigators from different disciplines to achieve a common, often difficult, goal.
Streitz predicts that as the value of HPC to cancer research becomes more evident, collaborations aimed at helping overcome medical challenges will become an increasingly important aspect of Livermore’s research portfolio. He observes that connecting the computational resources of DOE national laboratories to life-sciences projects also may help in developing responses to drug-resistant microbes, the ever-changing threat of bioterrorism, the intractability of other complex diseases in addition to cancer, and the rising cost of new pharmaceuticals. He emphasizes, “But we’ll always need partners such as NCI to make the progress needed in these fields.” With the help of HPC and the dedication of hundreds of scientists, doctors, and researchers, the scourge of cancer may, one day, have a cure.
Arnie Heller is a writer at Lawrence Livermore National Laboratory.