
Seek and Ye Shall Find
DOE's latest Strategic Plan sets an ambitious goal of accelerating innovation to create transformational solutions for energy and other national needs. However, delivery of these solutions to society is left up to entrepreneurs and technology companies. Because DOE's research efforts are vast, one of the greatest challenges technologists face is finding the "right stuff"—€”the big innovations—€”to develop, market and deliver. Happily, DOE is also fielding a new family of search tools to help meet this challenge by putting technologists in contact with research results.
DOE spends roughly $8 billion a year on R&D, in a complex array of ways. About $4 billion is for basic research under the Office of Science, which funds research related to a huge number of problems and disciplines, from combustion to cosmology. In fact, the Office of Science funds almost half of the federal physical sciences basic research. The other $4 billion in applied R&D is scattered among many specific mission Program Offices. There are energy programs for fossil energy, nuclear power, energy efficiency, etc. In addition, DOE has large programs for nuclear weapons, submarine propulsion, environmental research, nuclear medicine and even a major genomics program.
Where the research is done is another complex array. Much of this research is distributed among thousands of projects, among several dozen federal labs and user facilities, sometimes with little apparent reason to the outsider. Some labs are fairly specialized. The National Energy Technology Lab works mostly on fossil fuel related research, but its results may apply to many other problem areas. The National Renewable Energy Lab of course looks at renewables, but not entirely. Other labs are multipurpose and may work under any DOE program. These include Oak Ridge, Pacific Northwest, Los Alamos and Sandia. Some facilities are single purpose, built around a huge atom smasher or similar instrument. But even here the research may cover a broad range of materials and engineering problems. There are also thousands of university research projects.
Finding the right stuff in this complex matrix of programs and projects, past and present, is a daunting task, a major obstacle to innovation. The amount of information and findings is staggering. DOE has taken steps to meet this challenge, which Under Secretary for Science Ray Orbach calls "the awesome responsibility to feed science and technology in the U.S. economy." Powerful new tools are being developed to search the research, led by DOE's Office of Scientific and Technical Information (OSTI) The web site is www.osti.gov.
The story is told that OSTI began as a filing cabinet in the Manhattan Project in the early 1940s. If so, it was the heavily used cabinet full of transformational breakthroughs. Officially, OSTI has always been the central source for research reports stemming from all DOE funded research, every project, collecting a huge number of reports over the years and more very day.
OSTI now makes its vast collection of research results available for free online, complete with full text search. Moreover, it has gone from providing reports coming from DOE, to now finding and making available useful research results from around the world. OSTI is now a global science and technology communications center.
In addition to DOE reports, OSTI provides electronic access to DOE and federal-wide project summaries, conference proceedings from many scientific associations, preprints of scholarly articles, university publication sites and a variety of other collections. Millions of pages are available. Technologists need to know about these capabilities, and many probably do, as downloads from the web site exceed 4 million a year. To speed things up even more, a few months ago OSTI launched the Science Accelerator (www.scienceaccelerator.gov), a search page that functionally federates their key collections into one searchable entity.
OSTI has also been part of a broader effort, tying together all U.S. federal research agencies, and ultimately the whole world, in a single search capability. The prototype today is www.science.gov. This site provides federated search of all major U.S. federal research agency repositories, about 50 million pages of technical content. Looking globally, OSTI recently launched a pilot of World Wide Science (www.worldwidescience.org), which aims to federate science and technology document collections from around the world.
OSTI's federated search technology is itself a major innovation. It is provided by Deep Web Technologies. The federation operates non-intrusively on existing document databases, via the local search engines. It does not duplicate these databases; or even crawl or index them. It queries them in real time and uses the results. The value added is relevance ranking of these separate search results.
OSTI's specialized tools exist because Google and other general purpose web search engines do not cover most of the research document repositories and databases, referred to as the deep web. By some estimates over 90 percent of all web-accessible technical content is hidden in the deep web. Web crawlers do not reach the deep web because these document databases are only accessed by specific local searches. So while they are on the web their contents are not crawler accessible. OSTI has been a pioneer in federating these document databases. Many of OSTI's tools work by first translating a user's query into a separate local search for each database, then combining and jointly ranking all the results. This is very different from what a crawler does, and it requires a lot of custom tailoring for each database.
By way of analogy, a similar situation exists in industrial and consumer shopping. Many shopping sites are database driven, so one has to go to that site and execute a search to gain entry and shop. In some cases multiple such databases have been federated and even include product ranking by price. OSTI's combined relevance ranking does the same job for science and technology content, and the documents are free.
A few of OSTI's many search tools are described below.Each is a highly specialized technical tool that must be understood to be properly used. Also, each is relatively crude. Google spends over $4 billion a year, including $500 million on R&D. OSTI's budget is just $8 million so there are no bells and whistles. But there are over ten million pages of research results and technical material, with more added every day.
Information Bridge searches OSTI's foundation collection, an electronic filing cabinet of all DOE research reports for the last decade. Tens of billions of dollars worth of research are documented here. Because it is an internal DOE collection there is also extensive bibliographic information for each entry.
This makes it possible to do complex advanced searches using different metadata fields in the document database. A powerful and independently useful feature in the fielded or advanced search function is the "select subject" button. This brings up a very large semantic structure or word-word link system that is designed to help users find the best technical search terms. The system includes 30,000 words, about 200,000 word-word relations, and 45,000 taxonomic pathways from broader to narrower concepts.
E-print Network is a federated collection of about five million technical articles and related materials found in databases and on the web. It includes articles that have not yet appeared in scholarly journals. It also includes the publication web pages of over 26,000 university faculty, many engineering and science departments and many other researchers. This makes it easy to go from a single paper to the whole body of a researcher's related work.
Science Conference Proceedings federates 26 large databases. Conference proceedings often precede publication of research results by a year or more and this collection includes hundreds of thousands of papers and presentations, many from professional societies.
Federal R&D Project Summaries is a federated gateway to individual project summaries from six of the largest research funding agencies, including DOE. In many cases the search results include recent awards, which may precede research reports or publications by several years.
David Wojick is senior consultant on innovation with the DOE Office of Scientific and Technical Information. WojickD@osti.gov

Copyright © 2012 | Innovation America