biodiversity and ecosystem informatics workshop report

EXECUTIVE SUMMARY

In June 2000, a group of computer scientists, biologists, and natural resource managersmet to examine the prospects for advancing computer science and information technol-ogy (CS/IT) research by focusing on the complex and often unique challenges found inthe biodiversity and ecosystem domain. We refer to this emerging, interdisciplinary fieldof study as Biodiversity and Ecosystem Informatics (BDEI). This report synthesizes thediscussions and recommendations made at the workshop. It itemizes current BDEIchallenges, lays out a national BDEI research agenda, and recommends actions to betaken within the national research agenda. It also proposes specific mechanisms tocommunicate and implement those actions. The following points summarize the con-clusions of this forum:

• The CS/IT research community plays a foundational role in creating the techno-logical infrastructure from which advances in the environmental sciencesevolve;

• The next-generation CS/IT applications required by our expanding need tounderstand complex, ecosystem-scale processes will require solutions to signifi-cant, ground-breaking CS/IT research problems;

• Important new research opportunities for the CS/IT community are provided bythe urgency, complexity, scale, and uniqueness of the data, processes, and prob-lems presented by work in the biodiversity and ecosystem domain; and

• There is an increased need for governmental and industrial support of basicCS/IT research in order to respond to these challenges. Both the national CS/ITand environmental research agendas would derive significant, synergisticbenefit from such investment.

In the remainder of this section, we introduce two major themes that weave throughoutthis report. First, the CS/IT requirements of biodiversity and ecosystem research aredrastically changing, thereby requiring new solutions to fit the altered landscape. Sec-ond, the CS/IT research community has a long and successful record of creating newsolutions and enabling the technology transfer needed to put these ideas into practicaluse. It is therefore a wise investment of public monies to ensure that the emerging,interdisciplinary field of biodiversity and ecosystem informatics becomes a healthy andviable discipline.

Dave Maier, Eric Landis, Judy Cushing, Anne Frondorf,Avi Silberschatz, Mike Frame, and John L. Schnase (Editors)

Report of an NSF, USGS, NASA Workshop on Biodiversity andEcosystem Informatics held at NASA Goddard Space FightCenter, June 22-23, 2000.1

RESEARCH DIRECTIONS INBIODIVERSITY AND ECOSYSTEM INFORMATICS

EXECUTIVE SUMMARY

1 This workshop wassupported by theNational ScienceFoundation DigitalGovernment Programunder grant EIA-0084541, the USGeological Survey, andthe National Aeronau-tics and Space Adminis-tration. All opinions,findings, conclusions,and recommendations inany material resultingfrom this workshop arethose of the workshopparticipants, and do notnecessarily reflect theview of the sponsoringagencies. See http://bio.gsfc.nasa.gov foradditional information.

Biodiversity and Ecosystem Sciences

The most striking feature of Earth is the existence of life, and the most striking featureof life is its diversity. This biological diversity — or biodiversity — provides us withclean air, clean water, food, clothing, shelter, medicines, and aesthetic enjoyment.Biodiversity, and the ecosystems that support it, contribute trillions of dollars tonational and global economies, directly through industries such as agriculture, for-estry, fishing, and ecotourism and indirectly through biologically-mediated servicessuch as plant pollination, seed dispersal, grazing land, carbon dioxide removal, nitro-gen fixation, flood control, waste breakdown, and the biocontrol of crop pests. Andbiodiversity — the biological richness of ecosystems per se — is perhaps the singlemost important factor influencing the stability and health of our environment.Clearly, this is one of our most important knowledge domains, vital to a wide range ofscientific, educational, commercial, and government activities.

There is an increasing need to understand and respond to complex environ-mental problems. Just as we are developing a capacity to predict long-term climateevents, we would now like to predict public health and ecological outcomes far intothe future. Unfortunately, we currently lack the technologies to do this. The environ-mental sciences are “resource limited” by fundamental inadequacies in the CS/ITtools that can be applied to problems of this scale. If we are to keep pace with our needfor quality information about the living systems of our planet, we must producesystems that can efficiently manage petabytes of a new generation of high-resolution,Earth-observing satellite data. We must understand how to integrate these newdatasets with traditional biodiversity data, such as specimen data held in naturalhistory collections, and genomic data from cellular- and molecular-level work. Wemust be able to make correlations among data from these and even more disparatesources, such as ecosystem-scale global change and carbon cycle data, compile thosedata in new ways, analyze them, and present the results in an understandable andusable way.

Despite encouraging advances in computation and communication performancein recent years, we are still unable to perform these activities on a large scale. It is onlyrecently, for example, that IBM announced plans to build the world’s fastestsupercomputer — Blue Gene — which will attempt to compute the three-dimensionalfolding of human protein molecules. Given the thousands of proteins that are producedby the unknown millions of species on this planet, and given too that many of thesemolecules may have potentially significant economic value or environmental impor-tance, we are clearly entering a new world of computer-mediated exploration.

Biodiversity and Ecosystem Informatics

Until recently, little attention has been paid to computer and information science andtechnology research in the biodiversity and ecosystem domain. The interdisciplinaryfield of biodiversity and ecosystem informatics (BDEI) is attempting to change that. Weare pushing the boundaries in two directions by identifying research challenges that cansimultaneously advance the environmental sciences and the computer and informationsciences. The potential for such synergies is high because of the nature of work in thebiodiversity and ecosystem domain.

The single most important factor influencing work in this field is the problem ofcomplexity. This complexity arises from several sources. First is the underlying biologi-cal complexity of the organisms themselves. There are millions of species, each ofwhich is highly variable across individual organisms, populations, and time. Specieshave complex chemistries, physiologies, developmental cycles, and behaviors resulting

EXECUTIVE SUMMARY

ii

from more than three billion years of evolution. There are hundreds, if not thousands, ofecosystems, each comprising complex interactions among large numbers of species andbetween those species and multiple abiotic factors.

The second source of complexity is sociologically generated and includes prob-lems of communication and coordination — among agencies, divergent interests, andgroups of people from different regions, from different backgrounds, and with differentpoints of view. Biodiversity and ecosystem data can be politically and commerciallysensitive and entail conflicts of interest. The kinds of data scientists have collectedabout organisms and their relationships vary greatly in precision and accuracy, and themethods used to collect and store these data are almost as diverse as the natural worldthey document. Many important observations are made by non-scientists, such asamateur birders and natural history enthusiasts. And the range of datasets with whichthese datasets must interact is unusually broad, including geographical, meteorological,geological, chemical, physical, and genomic sources. There is thus an unusual need toaccommodate differences in data quality within a democratized community informationinfrastructure that is both formal and informal.

As in most biological and earth sciences, location is central. Muchbiodiversity and ecosystem data is georeferenced — it is tied to some place on theglobe. Sometimes the designation of a location can be ambiguous or imprecise,especially with observations and samples taken in previous centuries. As a result,something as central to the science as a means for spatial referencing becomes acomplex issue. Biodiversity and ecosystem data are also distinctive for being species-referenced. Genetic data is frequently associated with a species or sub-species,invasions and extinctions are tracked at the species level, and much of the character-ization of an ecosystem is described through the number and distribution of itsconstituent species. However, the naming of species is an abstract process, deeplyembedded in long-standing scientific cultural processes — incomplete, subject tolocal variation, and changing with time. In the ongoing process of species discovery,different scientists may assign two or more names to the same species, and a singlespecies name may be applied to what turns out to be distinct species. To makematters worse, most species on the planet have not yet been named and classified,and there is no authoritative listing of all the species we do know. In this field,ontological complexities abound!

Many key biodiversity and ecosystem questions involve flux — changes inrange, numbers, distribution, genetics, and proportions over time. Extinctions, migra-tions, incursions, restorations, predicted environment impacts are all issues of flux.However, seldom does one dataset span enough time, area, or include enough speciesto answer important questions by itself. Scientists often require that biodiversity andecosystem data be assembled from different sources into time sequences of compa-rable datasets, realizing that the component datasets may have been compiled forquite different purposes. Scientists also often deal with data at small scales over a largearea or extended periods of time. Many significant situations will be lost if standardmethods for moving to larger scales are used.

Finally, historical information serves prominently in the work of biodiversityand ecosystem scientists. Examples include plant and animal specimens and their labels,publications (some dating back 250 years), maps, and personal field notebooks. Thestudy of biodiversity and ecosystems requires the analysis of trends, adaptations, andlong-term relationships. These historical sources are thus often as pertinent as contem-porary data. An additional and significant problem is that many of the historical infor-mation sources are not yet in digital form. For example, over 750 million natural historyspecimens and their accompanying metadata remain to be digitized in the US alone.

EXECUTIVE SUMMARY

iii

Because of these complexities, humans still play a crucial role in the process-ing of biodiversity and ecosystem data. This information is simply not as amenable toautomatic correlation, analysis, synthesis, and presentation as many other types ofinformation. People act as sophisticated filters and query processors — locatingresources on the Internet, downloading datasets, reformatting and organizing data forinput to analysis tools, then reformatting again to visualize results. This process ofcreating higher-order understanding from dispersed datasets is a fundamental intellec-tual process in the biodiversity and ecosystem sciences, but it breaks down quickly asthe volume and dimensionality of the data increase. Who could be expected to under-stand millions of cases, each having hundreds of attributes? Yet problems on this scaleare common in biodiversity and ecosystem research.

Biodiversity and Ecosystem Informatics Research Agenda

Given this context, there are clearly areas where computer science and informationtechnology research could be advanced — with great social and scientific benefit —by focusing on challenges in the biodiversity and ecosystem domain. These syner-gistic opportunities fall into three major categories: acquisition and conversion ofdata and metadata, analysis and synthesis of data and metadata, and disseminationof data and metadata. Some specific opportunities include the following:

Acquisition and Conversion of Data and Metadata

• Modernizing the Biological Library – The accumulated volume of biologicalinformation and data collected over the past 250 years is massive. Improvingmethods for organizing, storing and retrieving these records is extremely critical.New techniques and tools must be developed for information extraction, textunderstanding, and cross-lingual information retrieval, making this an importantnon-business application domain for research on data integration, data cleansing,data warehousing, and archiving.

• Digitizing the Biological Legacy – America’s museums and laboratories maintainnearly one billion biological specimens. There is an urgent need to convert them,their documentation, and new specimens into metric-quality digital formats.This provides an excellent opportunity to advance research on lossless imagecompression, 3D image understanding, robotics, and the problem of integratingphysical artifacts into digital libraries.

• Multi-dimensional Observation and Recording – Efforts are needed to enable thecollection of detailed information about the Earth in multiple dimensions and atmultiple scales. This provides rich opportunities for research on scaling sensor-fusion techniques to large fields and developing and testing temporal-spatial dataaccess methods.

• Mobile Computing – New instrumentation is needed to bring knowledge to thefield and to collect, store, and transmit data from the field. Specific opportunitieshere include applications of human-computer interaction research to multi-model interfaces, hands-free systems, wearable computers, remote presence,robotics, and human augmentation.

EXECUTIVE SUMMARY

iv

• Taxonomic Freedom – Changes in biological names and classification schemesover time and discipline present enormous challenges. There is a need to inte-grate various interpretations, views, and versions of taxonomic data and make itavailable in a simple, easy-to-understand formats. This provides an unusuallychallenging context in which to examine the flexibility and robustness of knowl-edge representation systems, particularly their temporal and versioning aspectsand their support for cross-ontology linking and translation.

Analysis and Synthesis of Data and Metadata

• Comparing Across Scales – Biological data from different sources and times arefrequently collected and presented in different scales and resolutions resulting ina loss of detail when multiple datasets are required for data synthesis and analy-sis. Tools and procedures to facilitate analysis across scales are needed, whichprovides an important opportunity for research on adaptive- and multi-resolutiontechniques for computation and modeling.

• Modern Modeling – Researchers, managers, and policy-makers require modelsfor biological decision-making rather than disaggregated collections of data andfacts. Improved spatio-temporal modeling of biological, ecological, and socialprocesses are required, providing a fertile area for multi-modal data assimilationresearch and high-performance computing.

• Taxonomic Retooling – Taxonomists need new and improved tools for namingand defining species, changing and manipulating taxonomic organization, andperforming other tasks regarding taxonomic content and structure. This providesa rich domain for research in knowledge acquisition and hierarchical displaytechniques.

• Making Data Usable – Too frequently, decision-makers underutilize researchresults. To enhance the use of biological data, decision-makers require systemsthat will facilitate the synthesis and analysis of scientific data and researchresults. This is an excellent application area for research on uncertainty analysis,reasoning with incomplete information, and automatic summarization.

• Machine Processable Metadata – Current scientific metadata is largely forhuman consumption. It is used to document and interpret datasets. However,much more value will be gained from it when it is complete, correct, and de-scriptive enough to help automate data manipulation tasks, such as summariza-tion, combination of datasets, and conversion of data to appropriate forms foruse in models and statistical tools. There are important opportunities here fortesting of data-based inferencing technology and metadata-based informationintegration research.

• Need for Speed and Accuracy – Many tasks in data management are iterativeand time consuming. Researchers are frequently challenged with data entryand pattern discovery procedures and are required to estimate the quality ofutilized data. Meanwhile species are disappearing at a rate greater than theycan be recorded. This is a challenging domain for research on data reductionand data mining algorithms, including parallel implementations, modeling andanalytic techniques with tunable accuracy, and data quality metrics.

EXECUTIVE SUMMARY

v

Dissemination of Data and Metadata

• Visualization – Users of biodiversity and ecosystem data and information,including land managers, policymakers, educators, non-governmental organi-zations, industry, and others outside biological research, need visualizationtechniques to better understand data, relationships among data, natural pro-cesses, and management actions over time. This leads to opportunities forresearch on advanced display and visualization techniques, including display ofuncertainty, user-adaptive display, and multi-dimensional data visualization.

• Interdisciplinary Collaboration and Communication – Stakeholders ofbiodiversity and ecosystem data are growing in numbers and breadth. No longerare management decisions made solely by individuals or single agencies, butinvolve communities of individuals. This calls for the development of computer-supported cooperative work and remote collaboration research suited for partici-pants with widely varying roles, specialties, and training. It also provides oppor-tunities to study cross-domain mapping, data integration, data quality manage-ment, ontologies, and other knowledge representations.

• Data Management Guidelines – Biodiversity and ecosystem information isfrequently used in complex and potentially controversial political, economic,and environmental discussions and decision-making. Informatics issues arisingfrom this context include issues of data security, data sharing policies, intellec-tual property rights, quality assurance, and reuse of data. This provides impor-tant opportunities for research on data models for representing annotation andprovenance, explicit modeling of data product generation, and policy develop-ment and dissemination techniques.

Biodiversity and Ecosystem Informatics Research Agenda Implementation Plan

A concerted effort should be made to build a sustainable biological information infra-structure that proactively engages the broader CS/IT community in BDEI research. Thefollowing are among the specific actions that should be taken:

• Interdisciplinary Planning Groups – Interdisciplinary planning groups, compris-ing members of the biodiversity and ecosystem and CS/IT research communi-ties, should be established to articulate and communicate the specialinformatics challenges from one community to research actions in the othercommunity. These planning groups should identify existing CS/IT technologiesthat could be transferred from other domains, long-term basic CS/IT researchquestions, short-term CS/IT research that needs to be pursued, and infrastruc-ture needs including, equipment, facilities, networks, and personnel.

• Matching Research Needs with Available and Appropriate Mechanisms – Effortsto implement any research agenda item need preliminary study to determinehow these research actions could benefit from existing programs. These pro-grams include mechanisms for funding, partnerships, interdisciplinary trainingand teaming, and resource sharing. There also is a need to insure that new CS/ITresearch is effectively applied to real biodiversity and ecosystem test cases.

EXECUTIVE SUMMARY

vi

• Communicating the Research Agenda – Every effort should be made to commu-nicate the BDEI research agenda to an audience that includes researchers incomputer science and the biodiversity and ecosystem sciences, as well as themany agencies and foundations that support their efforts. Recommended actionsinclude developing extended workshops or seminars; building a multi-sector,multi-disciplinary community; developing interdisciplinary “matchmaking”mechanisms; adding a CS/IT component to existing biodiversity and ecosystemprojects; developing venues for multi-disciplinary activities; and promotingbiodiversity and ecosystem informatics through the dissemination of reports,publications, email distribution lists, and websites.

• Short-term Critical Actions that Require Immediate Attention – Several activi-ties should be immediately initiated to “jump start” BDEI research. Paramountamong these is an urgent plea from the scientific community for the formationof an NSF, USGS, NASA interagency strategic partnership to promote BDEIresearch. Within the next fiscal year, every effort should be made to launch ahigh-profile solicitation for cutting-edge research in this area. This activityshould highlight the urgency and importance of biodiversity and ecosysteminformatics problems and opportunities, and provide a forum for organizingproblem-specific, interdisciplinary consultative and investigative teams andpursuing problems relevant to the core missions of the sponsoring agencies.

Conclusion

A more complete consideration of these issues is presented in the report that follows.The workshop and report emphasize that the biodiversity and ecosystem sciences arefundamentally information sciences, and worthy of special attention from the computerscience and information technology community because of their distinctive attributes ofscale and socio-technical complexity. At almost every turn, scale, complexity, andurgency conspire to create a particularly wicked set of problems. Working on theseproblems will undoubtedly advance our understanding and use of information technolo-gies, and, even more important, give us the tools to protect and manage our naturalworld so as to provide a stable and prosperous future.

EXECUTIVE SUMMARY

vii

TABLE OF CONTENTS

Executive Summary ...................................................................................................................... i

Introduction ................................................................................................................................... 1

Background and Current Situation ............................................................................................... 2Workshop on Biodiversity and Ecosystem Informatics ................................................... 2The Nature of Biodiversity and Ecosystem Data – What Makes it Different? .............. 3Biodiversity and Ecosystem Informatics – Current Needs and Unique Challenges ..... 4

Case Studies ................................................................................................................................... 6Case Study 1: Forecasting Exotic Plant Invasions in Colorado and Utah ...................... 6Cast Study 2: Meeting Information Needs of Listed Columbia River Salmon for Biological Decision Analysis .................................................................... 8

Informatics Challenges ................................................................................................................. 10Data Integration ................................................................................................................. 10Information Intensity ........................................................................................................ 10New Instrumentation ........................................................................................................ 11Need to Identify and Record Species ................................................................................ 11Multiple Data Formats ...................................................................................................... 12Complexity of Biological Processes .................................................................................. 12Evolving Nature of Biological Sciences ............................................................................ 12Ecological Prediction ......................................................................................................... 12Political Mandates ............................................................................................................. 13Multiple Scales and Purposes............................................................................................ 13Limits on Resources .......................................................................................................... 13

Biodiversity and Ecosystem Informatics Research Agenda ........................................................ 14Focus Area 1: Acquisition and Conversion of Data and Metadata ................................ 14Focus Area 2: Analysis and Synthesis of Data and Metadata ........................................ 15Focus Area 3: Dissemination of Data and Metadata ...................................................... 17

Implementing the Research Agenda ............................................................................................ 18Interdisciplinary Planning Groups ................................................................................... 18Matching Research Needs with Available and Appropriate Mechanisms ..................... 18Short-Term Critical Actions that Require Immediate Attention................................... 19

Communicating the Research Agenda ......................................................................................... 20

Conclusion ..................................................................................................................................... 21

References and Related Reading ................................................................................................... 22

Appendices ..................................................................................................................................... 23A-I. National Biological Information Infrastructure ................................................... 23A-II. Global Biodiversity Information Facility ............................................................... 24A-III. National Ecological Observatory Network ........................................................... 25A-IV. Workshop Participants and Observers ................................................................... 26

INTRODUCTION

It’s Tuesday afternoon. Karen Culver has just been asked by her boss at the state fishand game agency to address a meeting of the Eagle River Watershed Council. Thecouncil is presenting a restoration plan for Silver Creek, with a particular goal ofimproving bull trout habitat. One aspect of the plan involves the removal of a smalldiversion channel that feeds an irrigation pond, with the expectation that this wouldimprove stream flows and lower water temperatures in the summer. The proposal is toreplace the irrigation water drawn from the creek using a pipeline or culvert fromanother nearby water source to the pond. Local landowners and members of the publichave been at odds about whether closing the existing channel would have much ben-efit, and what the adverse effects and costs might be for installing a culvert or a pipe-line.

Karen thinks she could compile the scientific information relating to theseissues in about six weeks. To do this, she needs topographic maps of the Silver Creekdrainage and surrounding area to identify possible sources for replacement irrigationwater and likely routings of a culvert or pipeline. She also needs to locate any recenthydrology studies and fish counts of Silver Creek. Karen must also check what theownership and land use is of areas that a culvert may cross. If she could retrieve theappropriate meteorological data, along with the topographic and hydrologic data,maybe her co-worker, Tom Hamilton, could run a simple model to project SilverCreek’s stream flow, water temperature, and downstream sedimentation after closingthe channel. With that information, she might be able to look for streams with similarcharacteristics and what bull trout populations they support. Then she would need toget those tables of numbers into a form understandable by everyone at the upcomingmeeting. Maybe she can go through agency archives to see if there are any historicalsurveys of the area before the channel was dug in 1932. She also wonders if there areany sensitive populations of other plant or animal species currently dwelling in thecreek or channel. It would be helpful to go into the field and examine the Silver Creekarea. But it’s 220 miles away in the southwestern corner of the state. Besides, thewatershed council meeting is Friday morning. Karen has three days, not six weeks.

Dave Maier, Eric Landis, Judy Cushing, Anne Frondorf,Avi Silberschatz, Mike Frame, and John L. Schnase (Editors)

Report of an NSF, USGS, NASA Workshop on Biodiversity andEcosystem Informatics held at NASA Goddard Space FightCenter, June 22-23, 2000.

Karen’s situation typifies problems thatarise in trying to answer management andscientific questions about species diversityand ecosystem health using the currentinformation infrastructure in this domain.Relevant information is difficult to locate

RESEARCH DIRECTIONS INBIODIVERSITY AND ECOSYSTEM INFORMATICS

(if it exists), and may be in a variety ofdigital and non-digital forms. Integratingthe information and putting it into a formsuitable for use with a specific analytictool usually involves intensive and timeconsuming human interactions with the

INTRODUCTION

1

data. Visualizations of datasets are noteasy to construct quickly. And the ques-tions are posed in a climate of increasingpublic scrutiny of agency decisions andconcern about species and ecosystempreservation. We will return to Karen laterin this report to see how this situationcould be improved with some of theadvances in biodiversity and ecosysteminformatics proposed here.

BACKGROUND ANDCURRENT SITUATION

Many agencies and organizations areinvolved in research on and the monitor-ing and management of our biodiversityand ecosystems. Together, these partnersand collaborators in federal, state, andlocal government agencies, academia, non-profit organizations, and private industryare concerned with developing a nationaldata and information infrastructure tobetter support the collection, manage-ment, dissemination, and application ofbiodiversity and ecosystem data andinformation. The vision of an enhanced21st-century biodiversity and ecosysteminfrastructure is described in the 1998Teaming with Life report of the President’sCommittee of Advisors on Science andTechnology (PCAST, 1998).

The PCAST report called for develop-ing the “next generation” of the NationalBiological Information Infrastructure.NBII-2 will not only support more effec-tive biodiversity and ecosystem sciencesand resource management, it will alsoprovide a rich set of new challenges for thecomputer science and information tech-nology (CS/IT) research and developmentcommunity. The challenge to both thesecommunities now is to work together toarticulate and pursue an interdisciplinaryresearch agenda that will advance CS/ITresearch into new and exciting directions,while helping to advance biodiversity andecosystem sciences and resource conserva-tion. Towards this end, the NationalScience Foundation, along with the USGeological Survey and the NationalAeronautics and Space Administrationsponsored a workshop entitled “ResearchDirections in Biodiversity and Ecosystem

Informatics” in June 2000. This workshopbrought together scientists representing abroad cross-section of CS/IT disciplinesand scientists and resource managers fromthe biodiversity and ecosystem commu-nity. (See Appendix A-IV for a list ofworkshop participants.)

The workshop’s main goal was toexamine the prospects for advancingcomputer science and information tech-nology (CS/IT) research by focusing on thecomplex and often unique challengesfound in the biodiversity and ecosystemdomain. We refer to this emerging, inter-disciplinary field of study as Biodiversityand Ecosystem Informatics (BDEI). Thisreport synthesizes the discussions andrecommendations made at the workshop.It itemizes current BDEI challenges, laysout a national BDEI research agenda, andrecommends actions to be taken withinthe national research agenda. It alsoproposes specific mechanisms to commu-nicate and implement those actions.

Workshop on Biodiversityand Ecosystem Informatics

The workshop was held June 22 and 23,2000. After initial, overview presentationson pertinent national and global-levelbiodiversity and ecosystem informationnetworks (see Appendices I-III), the groupheard two case study presentations thathelped provide a representative introduc-tion to the requirements, issues, andchallenges inherent in the biodiversity andecosystem sciences and resource manage-ment arena. Following the case studypresentations, the group began discussionsto identify and articulate the informaticschallenges of the biodiversity and ecosys-tem domain. Full group discussions on thedomain’s informatics challenges were thenfollowed by breakout group discussions onthe specific types of CS/IT research ques-tions that flow from the informaticschallenges. On the second day, the fullgroup discussed, further refined, andsynthesized the findings of the previousday’s breakout groups. The afternoonsessions focused on breakout group discus-sions of key aspects of implementing the

BACKGROUND

2

BDEI research agenda, including commu-nication, funding opportunities, andfacilitating interdisciplinary research. Theconcluding session focused on groupdiscussion of the most significant issuesand ideas raised during the workshop andrecommendations for short- and medium-term “next steps.”

The Nature of Biodiversity and EcosystemData – What Makes it Different?

Biodiversity and ecosystem data aredistinctive, and information managementchallenges in this domain are, in manyways, unique when compared to thosefound in other disciplines. Not only isthere a tremendous volume of stored data,it is collected, archived and disseminatedin every conceivable scale, format, andlocation, and represents multiple points orranges of time and space. The require-ments of those accessing biodiversity andecosystem data are extraordinarily diverse.Users demand access to biodiversity andecosystem information through placenames, latitude and longitude, speciesnames, author, time, and content. Notonly is it presented in a variety of formats,scales, etc., there are also non-biologicaldata important to biodiversity and ecosys-tem questions, such as data about climate,geology, topography, land use and owner-ship, political boundaries, roads and othercultural entities, and population densities,to name a few.

As in most biological and earth sciencedomains, location is central. Questions ofdiversity are almost always couched interms of some area or place name. Thusmuch biodiversity and ecosystem data isgeoreferenced—it is tied to some place onthe globe. Sometimes the designation of alocation can be ambiguous or imprecise,especially with observations and samplestaken in previous centuries. A designationof location at a country or even a conti-nent level is not unheard of in naturalhistory collections. Current methods relyon better maps, remote imagery, and GPSsystems and produce datasets having highaccuracy. As a result, something as centralto the science as a means for spatial

referencing is thus a complex issue.

Biodiversity and ecosystem data arealso set apart because much of it is spe-cies-referenced. Genetic data is frequentlyassociated with a species or sub-species,invasions and extinctions are tracked atthe species level, and much of the charac-terization of an ecosystem is describedthrough the number and distribution of itsconstituent species. However, unlikecoordinate systems for geographic loca-tions, which are universal, consistent, andrelatively unchanging, the naming ofspecies is incomplete, subject to localvariation, and will change with time. Mostspecies on the planet have not yet beennamed and classified, and there is noauthoritative listing of all the knownspecies names (though efforts in thisdirection are underway [See Appendix A-II:Global Biodiversity Information Facility]).Further, the mapping of names to organ-isms is an abstract process and evolvesover time. It is possible that two or morenames for the same species have beenassigned by different scientists, or that thesame name has been applied to what turnsout to be two distinct species. (Physicaltype specimens are key to resolving suchnaming ambiguities.) Even if a completeand consistent lexicon of species names isproduced, there are centuries of observa-tions, specimens, and publications usingthe taxonomic schemes of their historicplaces and times.

To appreciate the problem with spe-cies-referenced data, consider the follow-ing analogy. Imagine dealing with observa-tions that are located relative to maps by15th century cartographers. These mapsare incomplete, and they disagree onnames of particular places. Or, where theyagree on place names, they might disagreeon the location of the place. They mighteven contain places that do not actuallyexist. Looking at these maps, it mightseem that the Earth’s features are changingover time, though generally these changesreflect improved knowledge of what isactually there.

The naming of species is not the onlyaspect of biological data where classifica-

BACKGROUND

3

tion schemes and terminology are ambigu-ous. Other characteristics, such as groundcover, vegetation type, and soil type, haveclassifications and terms that may varyover time, place, disciplines, institutions,and individuals. Any effective system formanaging and querying biodiversity andecosystem data must deal directly withchanging classification schemes andvarying terms and definitions rather thanignoring the problem. The ambiguities interms and changes in classification mustbe explicitly modeled, so, for example, it ispossible to access old data using currentterminology, or trace information about agiven category across time and place.

Another important aspect is that manykey biodiversity and ecosystem questionsinvolve flux—changes in range, numbers,distribution, genetics, and proportionsover time. Extinctions, migrations, incur-sions, restorations, predicted environmentimpacts are all issues of flux. However,seldom does one dataset span enough time,area, or include enough species to answer aspecific question by itself. When didexogenous species of shellfish first appearin the Great Lakes? Is the population ofsalmon in this river system becomingmore or less genetically diverse? What arethe long-term ecological consequences ofexotic species invasions? The informationto answer such questions must often bepieced together from many sources.Scientists require that biodiversity andecosystem data be assembled into timesequences of comparable datasets, realiz-ing that the component datasets may havebeen compiled for quite different purposes.

Unlike many disciplines, such asphysics or medicine, that primarily rely oncurrent or recent data, historical informa-tion serves prominently in the work ofbiodiversity and ecosystem scientists.Examples include plant and animal speci-mens and their labels, publications (somedating back 250 years), field notebooks andobservation files (often kept by hand), andmaps. The study of biodiversity andecosystems requires the analysis of trends,adaptations, and long-term relationships.Therefore these historical data are often aspertinent as contemporary data. Unfortu-

nately, many of the historical informationsources are not yet in digital form. Forexample, over 750 million natural historyspecimens and their accompanyingmetadata remain to be digitized in the USalone.

Another strong requirement for storingand manipulating biodiversity and ecosys-tem data not found in other disciplines isthe need to deal at small scales over a largearea or extended period of time. Manysignificant situations will be lost if stan-dard methods for moving to larger scalesare used, such as elision of features belowa certain size. For example, the knowninfestation of kudzu in the western UnitedStates consists of two plots of 1/4 and 1/8acre, out of over a million square miles ofland area. Riparian zones are often path-ways for invasion of species, but areessentially one-dimensional features in atwo-dimensional terrain. Small areas witha differing ecosystem, such as aspen grovesin meadows and grasslands, may sustain adisproportionate number of the distinctspecies in a larger region. Users ofbiodiversity and ecosystem data need to beable to preserve selective portions of adataset or map at higher detail than thatrequired for the dataset or map as a whole,particularly when moving to a larger scale.

It is important to realize that thehistorical and current record forbiodiversity and ecosystem data will neverbe as complete as desired. Some researchquestion or management decision willalways benefit from a dataset over a largerarea, or a sampling regimen at a smallergranularity or at shorter or longer timeintervals, or a complete census of a speciesin an area versus a sample. The challengeto meet these new requirements forbiodiversity and ecosystem informationproviders is grand, but with the right toolsand processes, achievable.

Biodiversity and Ecosystem Informatics –Current Needs and Unique Challenges

We are confronted by both opportunitiesand challenges in the development ofbiodiversity and ecosystem informatics.

CURRENT SITUATION

4

Our greatest opportunity is that enhance-ment of the information infrastructure forbiodiversity and ecosystems will allow usto address fundamentally new types ofquestions that span levels of biologicalorganization. How do ecological factorsaffect diversity at different levels ofphylogenetic relatedness? Which genesare held in common across taxa inhabit-ing streams, and which are unique? Whatare the biotic influences on climatechange or the carbon cycle? To answerthese (and other) questions, informationsystems need to make the connectionsbetween taxa, specimens, phylogenies,geographic locations, and environmentalfactors. Specific challenges include thefollowing:

Varying methodologies, measure-ments, taxonomies, and questions –Traditional information systems (typi-cally designed for business) are based onassumptions that may not apply tobiodiversity and ecosystem data. Tradi-tional applications emphasize dataintegrity and internal consistency, whilebiodiversity and ecosystem data, bynature, often contain inconsistent obser-vations driven by differences in method-ologies, measurement precision, or evendifferent taxonomic information collectedin the past relative to current (and future)taxonomic usages. Traditional applica-tions also emphasize the creation of“standard” reports, while biodiversity andecosystem applications need to be adapt-able to new types of questions. Adaptingor altering traditional software designs tomeet the needs of scientists is a majorchallenge for biodiversity and ecosysteminformatics.

Multiple spatial and temporal scalesin a vast volume – Biodiversity andecosystem applications need to deal witha variety of temporal and spatial scales,from the microsecond to the millenniumand from the gene to the biosphere. Forinstance, managers and researchers oftenrequire historical data to survey relation-ships and trends, and to estimate thecosts and benefits associated withplanned management or policy actions.These systems also need to deal effec-

tively with immense quantities of data,much of it (such as data on museumspecimens or species descriptions) notcurrently in digital form. Where digitalinformation exists, it is often distributedamong systems that may not beinteroperable.

Breadth of participation – Many impor-tant biodiversity and ecosystem observa-tions are made by non-scientists. Amateurbirders, community planners, students,gardeners, hikers, natural history enthusi-asts, and others often create potentiallyuseful datasets. There is a need to developtools for the community construction ofbiodiversity and ecosystem information,and create the mechanisms that allow usto establish relationships across all levelsof biological data and organization withina distributed information infrastructure.The resulting applications will need toaccommodate and evaluate differences indata quality in this formal and informalinfrastructure, and be flexible enough toprovide multiple views of the underlyingdata — including alternative phylogeniesand taxonomies, georeferencing systems,and biodiversity and ecosystem models.

CURRENT SITUATION

5

CASE STUDIES

The following two case studies werepresented at the workshop and illustratetypical informatics problems facingbiodiversity and ecosystem researchers andresource managers.

CASE STUDY 1: Forecasting Exotic PlantInvasions In Colorado and Utah

IntroductionInvasive species cost the United States ofAmerica an estimated $137 billion peryear—with $29 billion per year in croplosses alone (Pimentel, 1999). Invasiveplant species not only compete with crops,they compete with native plant species,attract pollinators, alter nutrient cyclingand fire frequency, and can cause theextirpation or extinction of native plantspecies (Mack, 2000). Examples of invasivespecies include Chestnut blight and Dutchelm disease in northeastern forests andparklands; yellow starthistle, Europeanwild oats, water hyacinth, and white pineblister rust in California; tamarisk, andAfrican lovegrass in the southwest;cheatgrass, smooth cordgrass, hydrilla, andwhite pine blister rust in the northwest;purple loosestrife and Kentucky bluegrassin the Midwest; and kudzu, water hya-cinth, and Brazilian pepper in the south-east. Hundreds of invasive species affectHawaii and Pacific territories. Landmanagement agencies recognize theseproblems and have listed invasive plantspecies as top priorities for research andresource management activities.

Invasion of exotic plant species createsa formidable stress on natural ecosystems(Stohlgren et al., 1999a,b). Degradation ofhabitats resulting from exotic invasion canbe assessed by quantifying native andexotic plant diversity at multiple scales.Experience from research projects onFederal and non-Federal lands has empha-sized multi-scale, statistically robustsampling, both in the field and in thelaboratory using remote sensing, GISanalysis, statistics, and modeling(Stohlgren et al., 1998, 1999a,b). Because of

the widespread use of these methods,comparability of field data among agenciesallows for local, regional, and nationalsyntheses and improved predictive model-ing capabilities. We are equally concernedthat the past and future spread of invasivespecies affects carbon storage, nitrogencycling, and fire dynamics in many ecosys-tems.

The predictive modeling challenge inassessing exotic plant invasions is stillimmense and forecasting is difficult.Various species are intentionally andunintentionally introduced into theUnited States each year. Some are metwith inhospitable sites, strong competi-tors, or other biotic constraints, but otherspecies find a new home, and we have anincomplete understanding of what makes ahabitat vulnerable to invasion. Likewise,the attributes of highly invasive speciesare not well documented or understood.However, we can look to the landscape forclear direction. Several studies have shownthat exotic plant species have invaded hotspots of native plant species richness(Stohlgren et al., 1997, 1998, 1999). Ourgoals should be to: (1) accurately describecurrent locations and physical environ-ment for many invasive plant species; (2)predict potential habitats of invasivespecies; (3) forecast rates of spread forselected exotic plant species; and (4)evaluate the effects of invasive alienspecies on ecosystem structure and func-tion to maintain native biodiversity andnatural ecological processes.

Current CapabilitiesLots of vegetation plot data exist on nativeand exotic plant species richness and foliarcover throughout Colorado and Utah.Colorado datasets include over 400 plotsfrom The Colorado Natural HeritageProgram, over 200 plots from the USGeological Survey, and over 100 plots fromthe USDA Forest Service’s Forest HealthMonitoring Program and other sites. Manyof the datasets contain information onsoils (texture, nitrogen, carbon), and can belinked to specific vegetation types. InUtah, over 150 plots have been estab-lished. Remote sensing data are available(Landsat TM imagery) and are

CASE STUDIES

6

georeferenced for some portions of theareas (e.g., Rocky Mountain National Parkin Colorado and Grand Staircase-EscalanteNational Monument in Utah). Spatialanalysis algorithms have been very suc-cessful in early tests (Kalkhan et al., 1999).

We have successfully used this land-scape analysis approach to address re-source issues in Rocky Mountain NationalPark, Colorado, and to assess weed inva-sions in the Central US (Stohlgren et al.,1998b, 1999a). Specifically, we are quanti-fying the effects of elk grazing on plantdiversity, identifying areas of high orunique plant diversity needing increasedprotection, and evaluating the patterns ofnon-native plant species on the landscape.This same approach is ideally suited forother areas. It relies on new multi-scalesampling methods that have been exten-sively peer-reviewed in the scientificliterature (Stohlgren et al., 1995, 1997a,b,c,Stohlgren et al., 1998a,b, 1999a,b, Kalkhanet al., 1998).

Current Constraints and Hurdles toOvercomeThere are however, several constraints andlimitations to forecasting exotic speciesinvasions. First, existing computer capa-bilities are woefully inadequate. Moderate-resolution spatial model simulations takea week to 10 days on a Sun Workstationfor each dependent variable. Coarse-resolution models (degrading the resolu-tion of the remotely sensed data by in-

creasing the grid size take 3 days, but arefar less informative. Second, high-resolu-tion remotely sensed data (side-lookingradar, high-resolution digital elevationmodels, etc.) are too cumbersome for ourexisting computer capabilities for regional,much less statewide, scales.

Species richness and cover data mustbe linked to productivity, carbon storage(vegetation and soils), and rapid vegetationchange at local, landscape, and regionalscales. This capability will require multi-scale and multi-phase sampling of severalspecies at many sites along environmentalgradients. High-performance computercapabilities are needed to integrate high-resolution remotely sensed data withdetailed field data on vegetation, soils, andtopography to develop “real time” simula-tions and forecasts of invasive plantspecies in Colorado or Utah as new dataarrive (as new vegetation plots are estab-lished).

ReferencesMack, R.N., D. Simberloff, W.M. Lonsdale, H. Evans, M. Clout, F. Bazzaz. 2000. Biotic invasions: causes, epidemiology, global conse-

quences, and control. Ecological Society of America, Issues in Ecology 5.Pimentel, D., L. Lach, R. Zuniga, and D. Morrison. 1999. Environmental and economic costs associated with non-indigenous species in

the United States. (http://www.news.cornell.edu/releases/Jan99/species_costs.html)Stohlgren, T.J., M.B. Falkner, and L.D. Schell. 1995. A modified-Whittaker nested vegetation sampling method. Vegetation 117: 113-121.Stohlgren, T.J., G.W. Chong, M.A. Kalkhan, and L.D. Schell. 1997a. Rapid assessment of plant diversity patterns: A methodology for

landscapes. Environmental Monitoring and Assessment 48: 25-43.Stohlgren, T.J., M.B. Coughenour, G.W. Chong, D. Binkley, M.A. Kalkhan, L.D. Schell, D. Buckley, and J. Berry. 1997b. Landscape

analysis of plant diversity. Landscape Ecology 12: 155-170.Stohlgren, T.J., G.W. Chong, M.A. Kalkhan, and L.D. Schell. 1997c. Multi-scale sampling of plant diversity: Effects of the minimum

mapping unit. Ecological Applications 7: 1064-1074.Stohlgren, T.J., K.A. Bull, and Y. Otsuki Y. 1998a. Comparison of rangeland sampling techniques in the central grasslands. Journal of

Range Management 51:164-172.Stohlgren, T.J., K.A. Bull, Y. Otsuki, C. Villa, and M. Lee. 1998b. Riparian zones as havens for exotic plant species. Plant Ecology 138:

113-125.Stohlgren, T.J., D. Binkley, G.W. Chong, M.A. Kalkhan, L.D. Schell, K.A. Bull, Y. Otsuki, G. Newman, M. Bashkin, and Y. Son. 1999a.

Exotic plant species invade hot spots of native plant diversity. Ecological Monographs 69: 25-46.Stohlgren, T.J., L.D. Schell, and B. Vanden Heuvel. 1999b. Effects of grazing and soil quality on native and exotic plant diversity in

Rocky Mountain grasslands. Ecological Applications 9: 45-64.Westbrooks, R. 1998. Invasive plants, changing the landscape of America: Fact book, Federal Interagency Committee for the Manage-

ment of Noxious and Exotic Weeds (FICNMEW), Washington, D.C. 109 pp.

Key Informatics Issues:

1. Need for faster algorithms and more effective geospatial models.2. New modeling techniques that incorporate spatial analyses.3. Uncertainty maps of invasive species distributions must be constructed.4. Real-time field data updates of existing models needed.5. Better use of remote sensing, ground truthing, and stratification.6. Augmentation of current sampling methods for multi-scale plots at multiple slopes, aspects and elevations.

CASE STUDIES

7

CASE STUDY 2: Meeting InformationNeeds of Listed Columbia River Salmonfor Biological Decision Analysis

IntroductionTwelve species of Columbia River Basinsalmon have been listed under the Endan-gered Species Act. As a result, numerousmanagement decisions, on many temporaland spatial scales, are made with thepurpose of protecting and restoring salmonpopulations in the Columbia River and itstributaries. Managing the data, informa-tion, and analyses associated with thesecomplex and inter-related decisions hasbecome a monumental challenge. A keycomponent of this challenge is to providedecision-makers with a decision supportsystem that is grounded in a probabilisticdecision analysis approach.

The Columbia River Basin includesfour states and two Canadian provinces.The basin has a complex network ofmainstream hydroelectric dams, numeroussmaller diversion dams, and many waterwithdrawal systems for agriculturalirrigation. Management of the ColumbiaRiver relies on many sources of data andforecasting models, as does the manage-ment of salmon migration through theriver. Water management decisions mustbalance the needs for salmon recovery,sturgeon recovery, and bull trout recoverywhile maximizing power production, floodcontrol, and irrigation needs. The life cycleand ocean migration patterns of salmondictate that data need to be collected on acoastwide and basinwide scale. Anthropo-genic factors influencing salmon include:agriculture and urbanization affecting thespawning and freshwater life stages,hydroelectric dams and water storage damsaffecting the juvenile and adult migrationcorridor, and, for chinook and coho,harvesting activities that take place in theocean from California to Alaska. In addi-tion, there are numerous river fisherieswithin the Columbia River system whoseactivities affect all salmon species.

Types of Data CollectedA considerable amount of data has beencollected and archived since the listing ofvarious salmon species under the Endan-gered Species Act. For juvenile salmon, thedata collected include:

• tributary trap counts of smolts,• release numbers of tagged smolts,• release numbers of hatchery smolts

(tagged and untagged) by species,• dam counts, and• recovery of tagged juvenile salmon

at dams.

For adult salmon the types of data col-lected include:

• counts by species at dams andhatcheries,

• tagged salmon recoveries at hatcheries,• catch estimates from river fisheries

(tagged and untagged),• fishing “effort” data for river and

ocean fisheries,• tributary spawning escapement counts

from nests, live fish, and carcasses, and• recovery of tagged salmon from

spawning grounds.

Given the salmon’s migratory pattern (seeFigure 1), these data must be integratedinto a “life cycle recovery analysis” toprovide decision makers with sufficientunderstanding of the possible affects of anymanagement action, be it on a local,regional, or international scale.

The Need – Biological Decision AnalysisDecision makers require a process toevaluate the various combinations ofmanagement actions that will identify thelikelihood of salmon recovery and therisks associated with those actions, andthis complex process should be facilitatedwith a state-of-the-art, information-richdecision support system. Such a biologicaldecision analysis system would assist inevaluating management actions for salmonharvests, hatchery obligations, hydro-

CASE STUDIES

8

electric decisions, including dam breach-ing, and large-scale land managementagreements that affect fresh water habitat.Specifically, the objectives of a biologicaldecision analysis system would includethe ability to:

• evaluate the evidence for differenthypotheses about how environmentalfactors influence survival,

• forecast the likelihood of survival andrecovery resulting from differentmanagement actions, and

• identify management options withthe least risk.

Figure 1. A time series of juvenile and adult data integrated into life-cycle recovery analysisdemonstrates various management actions across habitat, hydro, harvest, and hatcheries.

Key Informatics Issues:

1. Multiple, disparate data sources.2. Requirement for interagency collaboration.3. Massive volume of data representing wide spatio-temporal variation.4. Need for computational modeling.

CASE STUDIES

9

INFORMATICS CHALLENGES

These case studies illustrate some of theinformatics challenges in the biodiversityand ecosystem domain. Here we elaborateon some of these issues, based on discus-sions at the workshop.

Data Integration

Biodiversity and ecosystem datasets arecommonly maintained in monolithic,stand-alone systems that are developed inresponse to specific issues, projects, ormandates. These datasets and systemswere designed independent of one anotherand frequently maintain their own stan-dards for vocabulary, metadata, formats,scale, syntax, and access. As a result,biodiversity and ecosystem datasets areseldom effectively integrated with oneanother for purposes of access, synthesis,or distribution. True and meaningfulintegration of data is of particular signifi-cance in this domain, since the very natureof the ecological sciences is integrative,and most of the systems and processes weseek to understand comprise profoundlycomplex biotic and abiotic interactions.The science community thus faces anenormous challenge in locating, analyzing,and synthesizing existing data into usefulknowledge for research, management, orpolicy-making. To compound the problem,the number of biodiversity- and ecosys-tem-related datasets is increasing at a rapidpace as resource agencies, universities, andconservation organizations respond tosociety’s legal and social needs forbiodiversity and ecosystem research andinformation. Numerous impediments tolinking datasets exist. These impedimentsinclude data structure, semantic, andorganizational challenges.

Data structure challenges – Severaldataset-level metadata standards andnumerous data formats are in use by datacollectors. The use of multiple standardsand formats (e.g. text, GIS, spreadsheet,database, video, audio, etc.) is problematicfor those searching and correlating orintegrating multiple datasets. While someimprovements in the use of data and

metadata standards have occurred (e.g.expanded use of the Federal GeographicData Committee [FGDC] standard), manybiologists continue to use their own (or no)data and metadata formats.

Semantic challenges – Successful dataconnectivity requires more than physicalconnectivity via technological solutions.Biologists are faced with a wide array ofsemantic options for identifying species,actions, land features, etc. Many biologistsdevelop their own controlled vocabulariesfor project or organizational use and othersmay use recognized thesauri. Attempts tocompare data or research results acrossdatasets with differing controlled vocabu-laries can lead to missed or misinterpreteddata.

Organizational challenges – Multiple-agency information and data sharingseldom occur, in part due to differingmandates, locations, funding, and person-nel. Interagency efforts to develop com-mon standards or protocols for collecting,archiving, and maintaining biodiversityand ecosystem data on common issues orquestions are relatively uncommon. Often,data are collected and archived using theresources at hand in a given office ororganization. For example, biologists maystore collected data in Microsoft Excel as itis available to them and they know it,rather than in a more powerful environ-ment, such as a database managementsystem.

Information Intensity

The complexity of the Earth’s biodiversityand its ecosystems is reflected in themassive amount of biological data that hasbeen generated and archived for public use.These data represent the nation’s biologi-cal legacy and include every level of detailfrom genes to ecosystems. New technolo-gies in remote sensing further expand the“ecosystem catalogue” with terabytes ofsatellite imagery. With the advent of theInternet and World Wide Web, scientists,resource managers, teachers, and students,conservationists, lawmakers, and thegeneral public have all begun to access and


10

utilize this information at an increasingrate for an incalculable number of applica-tions, 24 hours a day, 365 days a year.

Unlike with many other disciplines,biodiversity and ecosystem data areseldom outdated — information on speciesand their interactions with environmentalfactors collected in the eighteenth centurycontinue to be relevant in the twenty-firstcentury. The collection and archiving ofnew biological data—to add to existingknowledge—will continue at an increasingrate as more efficient data collection toolsare developed and new demands for infor-mation are expressed.

New Instrumentation

New instrumentation is needed toimprove biodiversity and ecosysteminformatics from the data collectionprocess to analysis and dissemination ofthe resulting information and knowl-edge. Highlighted throughout the work-shop was the need for easy-to-usehandheld field instruments with remotedata transfer capabilities to enhance insitu discovery. Determining the bestmeans of applying new space-basedremote sensing instruments, such aslaser- and radar-based sensors,hyperspectral sensors, etc., will berequired. A need was also noted for toolsto analyze and synthesize collected dataand deliver the results of that data inpresentable and understandable formatsfor decision-makers.

Finally, high priority was placed ondeveloping improved scanning tools formetrically and chromatically accurateimages of physical specimens. It wasnoted that millions of physical speci-mens are currently residing in an uncer-tain state at numerous labs and muse-ums throughout the United States.These specimens, some of endangered orextinct species, are subject to naturaldegradation, and physical access to themby researchers is limited. Without digitalarchiving of these specimens the oppor-tunity for advanced research and biologi-cal understanding will, at the least, be

limited to the few who have access, andin the worst scenario, be lost. A sense ofurgency was expressed for research intothe development of these scanning tools.

Need to Identify and Record Species

Biologists have identified approximately1.8 million living species of organisms,but vast numbers of remain to bediscovered. The grand total for all life iscurrently estimated to be between 10and 100 millions species. However, lessthan one-third of species that occur inthe US have been discovered and thepercentage is much lower for other partsof the world. We are in the midst of thesixth major extinction event of theplanet’s history, this one primarily theresult of human modifications to theenvironment (PCAST, 1998). Given thiscontext, it is disturbing to realize thatspecies discovery is still largely amanual process requiring fieldwork andmonths or years of laboratory analysisand publishing activities. Current workpractices — largely unchanged over thepast two centuries — are unable to keeppace with rate of habitat destruction andspecies loss. If we ever hope to fathomthe Earth’s biodiversity, the biodiversityand ecosystem enterprise must reinventitself — develop wholly new approachesto dealing with global-scale problems ina rapidly changing, information age.

Herein lie some of the mostinteresting informatics researchchallenges. These challenges cut acrosssome of the most important researchthemes in computer science today, suchas collaboration-in-the-large,instrumented (species) discovery, andcomputational exploration. Thedominant mode of discovery for thebiodiversity and ecosystem enterprisewill increasingly become collaborativeand model- and computation-driven. Theresearch opportunities here areunlimited.


11

Multiple Data Formats

Biodiversity and ecosystem informationexists in thousands of independent, indi-vidual and institutional databases, and inlaboratory and personal field journalsscattered throughout the country. Thedetermination of which format any par-ticular dataset is stored in is a function ofseveral variables, including the availabilityand understanding of data collection andarchiving software, the purpose for whichthe data are used, time constraints of thedata collecting organization, and theamount of funding available for purchasingsoftware and conducting staff training. Theavailability and use of multiple formats forstoring biodiversity and ecosystem datawill, in all likelihood, continue to occur.The CS/IT challenge is to ensure maxi-mum compatibility and comparability ofdata across formats, while not constrain-ing data collection, archiving, conversion,and distribution efforts.

Complexity of Biological Process

Knowledge about biodiversity and ecosys-tems is vast and complex. The complexityarises primarily from two sources. Thefirst is the underlying biological complex-ity of the organisms themselves. There aremillions of species, each of which is highlyvariable across individual organisms,populations, and time. Species havecomplex chemistries, physiologies, devel-opmental cycles, and behaviors resultingfrom more than three billion years ofevolution. There are hundreds, if notthousands, of ecosystems, each comprisingcomplex interactions among large num-bers of species and multiple abiotic factors.There is an immediate need to describeand model ecosystem processes andbiological systems in a machine-processable form through computationalmodeling. The second source of complex-ity is sociologically generated. The socio-logical complexity includes problems ofcommunication and coordination—amongagencies, among divergent interests, andamong groups of people from differentregions and different backgrounds, such asacademia, industry, and government—all

having different views and requirements.

Evolving Nature of Biological Sciences

Early biologists were principally con-cerned with species identification,descriptions, and range. Today’s biolo-gists deal with the complexity of bio-lifecycles. For example, the salmon casestudy presented above identified thebroad spatial and temporal scale thatfisheries biologists must deal with indeveloping management plans for theColumbia River Basin. Within any givenspatial-temporal scale, factors influenc-ing salmon life cycles may includenumerous biological and non-biologicalevents and processes for which thebiologist must find information in orderto help support those making manage-ment decisions. These events andprocesses may include climate, geology,fire, hydropower development, land use,vegetation, water quality, exotic species,harvests, and much more. Locating,retrieving, and analyzing such diversedata requires interdisciplinary collabora-tion among all data providers and datausers. A single project might requireparticipation of scientists representingseveral agencies or offices. Adding to thecomplexity of data needs, biologists anddecision-makers often seek these data formultiple years to better understand anddescribe natural and anthropogenictrends.

Ecological Prediction

Biodiversity and ecosystem sciencesrequire extensive knowledge and under-standing of past events, interrelationships,and outcomes in order to gain new scien-tific understanding and predict futureoutcomes. The biologist is constantlychallenged to predict results, risks, andtimelines for proposed actions. As newdiscoveries and information accrue,adjustments in research or managementstrategy are required—in essence, anongoing adaptive process. Computationalmodels that analyze known information,incorporate new discoveries, and predict


12

interactions and change under variousconditions are needed.

These new models must take intoaccount the complex and adaptive quali-ties of ecosystems as well as the spatialand temporal aspects of biological data.The models must include tools for auto-mated data and information updating,building “what if” scenarios, developingvisualization of predicted outcomes overtime, and computing confidence or uncer-tainty characteristics.

Political Mandates

Public natural resource agencies mustobserve numerous mandates and directivesthat, among other things, dictate the typesof data they collect and how they distrib-ute those data. For example, in some cases,the Endangered Species Act influenceswhat data are collected and how they arecollected. The Freedom of Information Actoutlines how data are to be distributed,including who has access to what data.Other directives influence monitoringrequirements as well as quality assuranceactivities. Some directives require docu-ments to be developed for, and open to,public comment. This requirement has abearing on content, format, scale, scope,and other factors affecting the data man-agement process.

Multiple Scales and Purposes

Multiple users access data for multiplepurposes. The users of a particular datasetcan conceivably include everyone fromelementary school pupils to Nobel Laure-ates. Their use of the data can vary fromshowing the range of a particular speciesto investigating the effects an invasivespecies has on the forest canopy or theeconomy in a given region. This widerange of users of, and uses for, data pre-sents a significant challenge forbiodiversity and ecosystem informatics.How can a single dataset serve such a widerange of interests and needs? To avoidrecollecting data for different users, asingle dataset must meet the need of the

most detailed users while not losinginformation or accuracy at smaller scalesfor the more general user.

Limits on Resources

As new demands are placed on publicresource agencies to collect, archive,maintain, and provide biological data,additional burdens are placed on theagencies’ physical, financial, and humanresources. This situation can create abottleneck in the flow of data and threat-ens the nation’s capacity to record, main-tain, and make accessible its biologicalrecords. Biodiversity and ecosysteminformatics research can assist by develop-ing new technologies and standard pro-cesses that meet resource agencies budgetsof time, expertise, and funding.


13

BIODIVERSITY AND ECOSYSTEMINFORMATICS RESEARCH AGENDA

Based on the informatics challenges asidentified through the case studies anddiscussions between biologists and com-puter scientists at the workshop, partici-pants recommended 13 activities thatshould be included in a national BDEIresearch agenda. These activities can begrouped under three topics: acquisitionand conversion of data and metadata,analysis and synthesis of data andmetadata, and dissemination of data.Identification of specific CS/IT researchopportunities follows each recommenda-tion.

FOCUS AREA 1: Acquisition and Conver-sion of Data and Metadata

Modernizing the Biological Library – Theaccumulated volume of biological infor-mation and data collected over the past250 years is massive. Unlike some disci-plines, biological information is never outof date and will be required into theforeseeable future. Improving methods fororganizing, storing and retrieving theserecords is extremely critical. New tech-niques and tools must be researched anddeveloped for:

• digitizing the existing corpus ofscholarly work related to biodiversityand ecosystems on a large scale,

• simultaneous acquisition of biologicaldata from multiple sources includingagencies, research stations, andindividual scientists;

• correlating data from different formatsincluding GIS, tabular, and text;

• storing massive amounts of complexbiological data and information,including petabytes of high-resolutionsatellite imagery;

• handling variation in spatial andtemporal scale across datasets;

• providing centralized and high-speedaccess to data and metadata;

• automating the capture of metadatafrom archived data and information;

• managing volume, quality, andversioning of data; and

• integrating data through essentialattributes such as georeferences, URLs,and taxonomy.

CS/IT opportunities: Testing techniquesfor information extraction, text under-standing, and cross-lingual informationretrieval. Non-business application do-main for data integration, data cleansing,data warehousing, and archiving research.Evaluation of temporal data models,cross-media linkage methods, and multi-scale representations.

Digitizing the Biological Legacy –America’s museums and laboratoriesmaintain at least 750 million biologicalspecimens. There is an urgent need toconvert them, their documentation, andnew specimens into metric-quality digitalformats. Many scientists would benefitfrom gaining network access to accuratedigital representations of these specimens.Research needs to be conducted to:

• develop new holographic scannersof high-level accuracy,

• refine virtual reality andthree-dimensional modeling, and

• improve the processes and techniquesfor labeling specimens withannotations, collection methods,taxonomic data, and other relevantmetadata.

CS/IT opportunities: Test domain forimage processing, video sensing, losslessimage compression, and 3-D image under-standing technology. A possible applica-tion for advanced robotic manipulationcapabilities.

Multi-dimensional Observation andRecording – Efforts are needed to enablethe collection of detailed informationabout the Earth’s surface in multipledimensions and at multiple scales. Toaccomplish this, workshop participantsrecommended research to:

• develop new remote sensing methods,particularly for species identification,habitat characterization, andbiodiversity “hot spot” identification;

RESEARCH AGENDA

14

• enable the integration of data fromdisparate sensors; and

• store map information in fourdimensions for complete spatial andtemporal analysis.

CS/IT opportunities: Scaling sensor-fusion techniques to large fields. Testingtemporal-spatial data access methods.

Mobile Computing – New instrumenta-tion is needed to bring knowledge to thefield and to collect, store, and transmitdata from the field. Specific researchactivities include:

• developing techniques and tools forin situ access of remote data storage ofspecies and ecosystem characteristics,

• methodologies for extracting subsets ofdata for regionalization of species andenvironmental attributes, and

• designing portable multi-media datarecorders.

CS/IT opportunities: Application ofhuman-computer interaction research inaugmented reality, multi-model inter-faces, hands-free systems, wearablecomputers, and remote presence. Possiblearea for using adaptive communicationtechniques for delivery of multi-mediainformation and robotics and humanaugmentation.

Taxonomic Freedom – Changes in, andadaptations of, taxonomic names andclassification schemes over time anddiscipline present challenges to biologistsin recording and locating relevant dataand information. The creation of new,and sometimes independent and concur-rent, taxonomic classifications willcontinue. There is a need to integratevarious interpretations, views, andversions of taxonomic data and make itavailable in a simple, easy-to-understandformat. Tools are needed to:

• perform automated associations acrossdifferent taxonomic schemes,

• display and browse multiple taxonomicclassification systems simultaneously,

and• support the option of reorganizing

taxonomic classifications to meet newfindings and systematic knowledge.

CS/IT opportunities: Examining theflexibility and robustness for knowledgerepresentation systems, particularly theirtemporal and versioning aspects and theirsupport for cross-ontology linking andtranslation.

FOCUS AREA 2: Analysis and Synthesis ofData and Metadata

Comparing Across Scales – Biological datafrom different sources and times arefrequently collected and presented indifferent scales and resolutions resulting ina loss of detail when multiple datasets arerequired for data synthesis and analysis.Tools and procedures to facilitate compari-son and analysis across scales whileretaining important details are needed.Recommended research activities includedeveloping:

• methods of specifying what data topresent at various levels of scale,

• techniques and policies for data“averaging” at various levels of scale,and

• processes for extrapolating betweenspecific ecosystem attributes and wholeecosystems without compromisingconclusions.

CS/IT opportunities: Testing adaptive- andmulti-resolution techniques for computa-tion and data modeling.

Modern Modeling – Researchers, managers,and policy-makers require models forbiological decision-making rather thansimply studying disaggregated collections ofdata or facts. Further, because of the com-plexity of biological processes and interac-tions, questions are frequently “fuzzy” innature. Improved spatio-temporal modelingof biological resources, and biological andsocial processes, is required. Identifiedcomputer science research activities formodeling include investigating:

RESEARCH AGENDA

15

• high-performance modeling andsimulation development,

• methods and techniques forincorporating data into models,

• how to define model parameters anddevelop automated parameter inductionwithin models, and

• standards and methods for validatingbioscience and social models.

CS/IT opportunities: A fertile area fortesting developments in data assimilationresearch, including 4-D assimilation,continuous assimilation, variationalassimilation, and multi-model assimila-tion.

Taxonomic Retooling – Taxonomists neednew and improved tools for naming anddefining new species, changing and ma-nipulating taxonomic organization, andperforming other tasks regarding taxo-nomic content and structure. Useful toolsfor taxonomists would facilitate:

• simultaneous display and browsing oftaxonomic content of different schemesdating back several hundred years,

• editing taxonomic content andstructure, and

• validating content and structure ofexisting classification schemes.

CS/IT opportunities: Test domain forresearch in knowledge acquisition andhierarchical display techniques.

Making Data Usable – Too frequently,decision-makers underutilize researchresults. To enhance the use of biologicaldata, decision-makers require systemsthat will facilitate the synthesis andanalysis of scientific data and researchresults. Such decision-support systemsmust include new techniques to:

• detect and represent biological andsocial changes over time and space, and

• identify gaps in, and duplication of,relevant data and information.

CS/IT opportunities: Application area foruncertainty analysis techniques, reason-

ing with incomplete information, andautomatic summarization.

Machine Processable Metadata – Currentmetadata is largely for human consump-tion. It is used to document and helpinterpret datasets. It is sometimes col-lected and made available for searching inorder to locate particular data. However,much more value will be gained from itwhen it is complete, correct, and descrip-tive enough to help automate data ma-nipulation tasks, such as summarization,combination of datasets, and conversion ofdata to appropriate forms for use in modelsand statistical tools. Reaching this level ofsophistication requires:

• understanding the requirements onmetadata to allow machineinterpretation of it,

• producing robust systems for thecollection and checking of metadata,and

• developing metadata-driven tools fordata manipulation.

CS/IT opportunity: Testing of data-basedinferencing technology and metadata-based information integration research.

Need for Speed and Accuracy – Many tasksin data management are iterative andrequire considerable time. Researchers arefrequently challenged with data entry andpattern discovery procedures and arerequired to estimate the quality of utilizeddata. Meanwhile species are disappearingat a rate greater than they can be recorded.Automating many of these tasks can assistbio-scientists in time and accuracy. Work-shop participants recommended that CS/IT research be conducted to:

• facilitate the extrapolation of datasampled from an ecosystem to thetotality of that ecosystem,

• develop techniques that indicate or rankquality and indicate the reliability ofaccessed biological data,

• make improvements in data miningprocesses, and

• develop processes for automating data

RESEARCH AGENDA

16

and metadata entry.

CS/IT opportunity: Challenging domainfor data reduction and data miningalgorithms, including parallel implemen-tations; modeling and analytic techniqueswith tunable accuracy; and data qualitymetrics.

FOCUS AREA 3: Dissemination of Dataand Metadata

Visualization – Users of biodiversity andecosystem data and information includeindividuals that require visualization ofnatural processes and management actionsover time. Use of such visualizationtechniques can be for public discourse,adaptive management process (e.g. evaluat-ing management options) or educationalpurposes, as well as scientific inquiry. Inparticular, land managers, policymakers,educators, non-governmental organiza-tions, industry, and others outside biologi-cal research need visualization techniquesto better understand the data and relation-ships among data. Research is required to:

• better characterize the needs andrequirements of data users,

• gain insight into how to adapt systemsto specific user needs and behaviors,and,

• find techniques to indicate gaps,inconsistencies, and uncertainties whenviewing data.

CS/IT opportunity: Application of ad-vanced display and visualization tech-niques, including display of uncertainty,user-adaptive display, and multi-dimen-sional data visualization.

Interdisciplinary Collaboration and Com-munication – Stakeholders of biodiversityand ecosystem data are growing in num-bers and breadth. No longer are manage-ment decisions made solely by individualsor single agencies, but involve communi-ties of individuals. Additional researchneeds to be conducted in methods tofacilitate community involvement inbiodiversity and ecosystem research and

decision-making. These collaborativeefforts must occur throughout a project’slife cycle from its conception and, in fact,need to be second nature to the field ofbiodiversity and ecosystem informatics.Specific areas for research include:

• improving avenues of interdisciplinarycommunication between scientists,resource managers, decision-makers,and the public;

• investigating ways of sharing resources,including facilities, personnel, andfunding, among all stakeholders;

• developing and testing innovativemodels for enhancing collaborativecomputer and biological science research; and

• developing techniques for bridging theuse of different standards for temporal,semantic, and spatial references bydifferent disciplines and communities.

CS/IT opportunity: Calls for the develop-ment of computer-supported cooperativework and remote collaboration researchsuited for participants with widely vary-ing roles, specialties and training. Alsotest case for cross-domain mapping andintegration of data, ontologies, and otherknowledge representations.

Data Management Guidelines –Biodiversity and ecosystem information isfrequently used in complex and potentiallycontroversial political, economic, andenvironmental discussions and decision-making. Informatics issues arising fromthis context include issues of data security,data sharing policies, intellectual propertyrights, quality assurance, and reuse of data.Further development of technological toolsto ease the burden of meeting the demandsthat these issues bring is needed. CS/ITresearch is needed in:

• the development of controlled read/write access programs;

• developing policy templates for IPR,copyright, access, and distribution,acknowledgements, and qualityassurance and control;

• tracking use and property rights tosupport conformance with policies; and

RESEARCH AGENDA

17

• keeping the pedigree or provenanceassociated with data to better understand its sources and history. (Forexample, Is it primary or secondarydata? Who collected it? Whatmethodology was used? and Whatprograms were used in subsequentprocessing?).

CS/IT opportunities: Application of datamodels for representing annotation andprovenance, explicit modeling of dataproduct generation. Possible area forinvestigation of data disseminationtechniques and policies.

IMPLEMENTING THE RESEARCHAGENDA

Implementation of the biodiversity andecosystem informatics research agendadescribed here will require a number ofactions. Workshop participants recom-mended three areas where concerted effortshould be made to proactively engage thebroader CS/IT community in BDEI re-search: form implementation planninggroups; utilize appropriate and availablemechanisms for funding, partnering,collaborating, and resource sharing; andidentify short-term “critical actions” thatrequire immediate attention.

Interdisciplinary Planning Groups

Interdisciplinary planning groups, compris-ing members of the biodiversity andecosystem and CS/IT research communi-ties, can serve as a mechanism to articu-late and communicate the specialinformatics challenges from one commu-nity to research actions in the othercommunity. Such planning groups will bemore effective if they are formed to ad-dress specific biodiversity and ecosystem“problem areas” (e.g. salmon conservation,invasive species, carbon cycle, etc.) ratherthan general research issues. The planninggroups could identify the following:

• existing CS/IT technologies that couldbe transferred from other domains,

• the long-term basic CS/IT research

questions,• the short-term CS/IT research that

needs to be applied, and• the infrastructure needs including,

equipment, facilities, networks, andpersonnel.

Matching Research Needs with Availableand Appropriate Mechanisms

Efforts to implement any research agendaitem need preliminary study to determinehow these research actions could benefitfrom existing programs. These programscould include mechanisms for funding,partnerships, interdisciplinary research,and sharing resources with compatibleprograms.

Funding options – Funding sources forBDEI research activities are numerous.NSF’s Information Technology Research(ITR), Biocomplexity, and Digital Govern-ment programs provide potential venues asdoes NASA’s High Performance Comput-ing and Communications (HPCC), GlobalChange, Carbon, and Terrestrial Ecologyprograms. In addition, the Department ofDefense provides funding opportunitiesthat may be appropriate, and new activi-ties, such as the proposed National Eco-logical Observation Network (NEON) (seeAppendix A-III) and the GlobalBiodiversity Information Facility (GBIF)(see Appendix A-II), may create additionalopportunities in the future.

Partnerships – The tasks required toaccomplish many research agenda items ismonumental. The development of partner-ships and the use of existing partnershipssuch as the National Partnership forAdvanced Computational Infrastructure(NPACI) are highly encouraged.

Interdisciplinary training and teaming– Biological scientists need tools andcomputer scientists seek challengingproblems that will result in applied tech-nologies. Too frequently, meaningfulinterchange between the biological andcomputer science disciplines does notoccur. Efforts must be made to link thesetwo disciplines from the project planning

IMPLEMENTING THE RESEARCH AGENDA

18

process through testing and evaluation ofresulting technologies. Cross-trainingprograms through academic institutions,workshops, and other avenues need to bedesigned, funded, and implemented. Cross-teaming of biological and computer scien-tists can be encouraged through innovativefunding mechanisms that support aninitial “spin-up” phase to develop interdis-ciplinary teams, basic infrastructure, anddetailed research plans before launchinginto the main research activities. NSF’sDigital Government program is respondingto this need and can serve as example forfacilitating interdisciplinary processes inresearch and application of informationtechnologies.

Sharing resources – Many institutionshave developed technologies and infra-structure that could be leveraged tostrengthen the implementation of selectedactivities within the proposed researchagenda. Examples of such institutionsinclude the National Biological Informa-tion Infrastructure (NBII), GlobalBiodiversity Information Facility (GBIF),Long-Term Ecological Research (LTER),and National Ecological ObservationNetwork (NEON).

Short-term Critical Actions that RequireImmediate Attention

Workshop participants noted a sense ofurgency to begin implementation of thenew BDEI research agenda. Several activi-ties were noted that could be acted uponimmediately to “jump start” BDEI re-search: developing an interagency strategicpartnership to begin preliminary andrequisite work on the research agenda,recognizing the time critical nature ofspecific biodiversity and ecosystem prob-lems, and organizing problem-specificconsultant teams to develop appropriateimplementation plans.

Interagency strategic partnership –Workshop participants endorsed a strategicpartnership between NSF, NASA, andUSGS to advance cutting-edge computerand information science research andresearch in biodiversity and ecosystem

sciences. By pooling existing resources,this interagency BDEI R&D effort wouldimmediately launch interdisciplinary“seed” projects and prepare for the start ofa major joint program in FY2003 thatwould address long-term research agendarecommendations. These partner agencieswould also jointly support low-cost “com-munity building” activities for the com-munity. Such activities could includeusing Web-based information services toannounce the availability of BDEI “chal-lenge problems” and data testbeds for CS/IT research activities.

Recognizing the urgency and impor-tance of biodiversity and ecosystemproblems – The time-critical nature ofmany biodiversity and ecosystem scienceand conservation issues could be moreeffectively communicated to prospectiveCS/IT researchers. The urgency of theseproblems is an important factor in theoverall challenge and excitement ofworking in this domain, and can provideresearchers with a sense of accomplish-ment, knowing that they are makingimportant contributions to problems ofglobal concern. It was recommended thatevery effort in this direction be made.

Organizing problem-specific consultantteams – The use of specialized, issue-specific, teams of specialists was endorsedto develop appropriate implementationplans. It was envisioned that these planswould be written for a specific biodiversityand ecosystem problem area, e.g. invasivespecies, and included:

• long-term vision of biodiversity andecosystem applications and informationmanagement,

• basic research (e.g., for predictivemodels) needed from the computerscience community,

• applied research (e.g., for dataacquisition) required, and

• opportunities for meaningfulinterdisciplinary CS/IT and biologycollaboration.

IMPLEMENTING THE RESEARCH AGENDA

19

COMMUNICATINGTHE RESEARCH AGENDA

Workshop participants were asked todevelop recommendations for how best tocommunicate the BDEI research agenda toan audience that includes researchers incomputer science and the biodiversity andecosystem sciences, as well as the manyagencies and foundations that supporttheir efforts.

Develop extended workshops orseminars – The duration of these eventswould be, at a minimum, one month andwould focus on specific biodiversity andecosystem informatics problems usingcase studies as testbeds. Biologists and CS/IT scientists would work together to learnand share, with a goal of developing aproduct that is applied and tested in thefield. In the very near term, a request forproposals to conduct the workshops needsto be developed and advertised.

Build a multi-sector, multi-disciplinarycommunity – Considerable discussionduring the workshop centered on develop-ing mechanisms for linking CS/IT re-searchers with biodiversity and ecosystemresearchers. Some progress, such as NSF’sDigital Government Program, has beenmade in this direction and may be used asmodels. Collaboration between the twocommunities would achieve a greaterdegree of success if the projects wereconducted on specific testbeds. (Seeexamples of Bio-CS/IT collaborations inthe Appendix).

Develop “matchmaking” mechanisms– CS/IT researchers attending the work-shop noted their lack of awareness ofopportunities to collaborate withbiodiversity and ecosystem scientists onchallenging informatics questions andproblems. Biodiversity and ecosystemscientists were likewise unaware of theextent to which the CS/IT communitymight benefit from collaborations on themany unique and challenging researchquestions in this field. Professional societ-ies could be used to launch“matchmaking” activities such as postingcontact lists and publishing white papers.

A specific suggestion along these lines wasto establish a corpus of BEDI “challengeproblems” contributed by biodiversity andecosystem scientists. Such a challengeproblem would describe the desired capa-bility, the limitations of current solutions,and the characteristics of a good solution(e.g. efficiency, scale, precision). Eachproblem would be accompanied by one ormore datasets on which to test newapproaches. The ready availability of testdata, along with the presumed importanceof the problems, could entice computerresearchers to try out their new ideas andlatest developments in the biodiversityand ecosystem domain.

Add CS/IT component to existingbiodiversity and ecosystem projects –Biodiversity and ecosystem projects arestaffed by scientists trained (and inter-ested) in the biological sciences. The newemphasis on informatics within thebiological sciences presents special chal-lenges that are best addressed by membersof the CS/IT community. Incorporating oradding a CS/IT research emphasis intoexisting biodiversity and ecosysteminitiatives will provide cost- and time-effective opportunities for CS/IT research-ers and bolster the informatics aspects ofexisting biodiversity and ecosystemresearch projects.

Develop venues for multi-disciplinaryactivities – Currently, computer scienceand the ecological sciences are “verticallyintegrated” and seldom present opportuni-ties for members across disciplines to meetor work together for a common objective.Workshop participants recommended thatan effort to establish university depart-ments or research centers and inter-disciplinary curricula in biodiversity andecosystem informatics be sponsored anddeveloped. Participants also felt thatpublishing a new journal (or special issuesof existing journals) on biodiversity andecosystem informatics would provide ahigh-profile forum for the exchange ofchallenges and ideas across disciplines.

Promoting biodiversity and ecosysteminformatics through dissemination ofreports – Much still needs to be accom-

COMMUNICATING THE RESEARCH AGENDA

20

plished with regards to articulating andpromoting the biodiversity and ecosysteminformatics mission. Continuing work-shops and disseminating the findings andrecommendations in this report are criticalto broadcasting the biodiversity andecosystem informatics research agenda.These reports should be made available toboth biodiversity and ecosystem and CS/ITresearchers and should provide backgroundand the current status of both communi-ties. Reports should also include visionarycase studies that motivate interdiscipli-nary research initiatives in biodiversityand ecosystem informatics. It is alsoimportant to provide “experience papers”

describing successful (and maybe unsuc-cessful) collaborations, past and present.

Promoting the biodiversity and ecosys-tem informatics agenda through publica-tions, email distribution lists, andwebsites – A wealth of publicationswithin both communities is available toengage scientists in the biodiversity andecosystem informatics research agenda.Examples of publications include SIGModand the Bulletin of the Ecological Societyof America. Professional societies canassist in this effort by offering links ontheir websites and emails to their mem-bers.

CONCLUSION

It’s 2010, and Karen Culver is once again evaluating the proposed closure of the SilverCreek diversion channel. Back in 2001, the channel wasn’t closed. While she feltclosing the diversion channel would help the bull trout population, in the limited timeavailable, she wasn’t able to get all the information she needed to make an effectivepresentation to the local council. The channel is now in need of repair, and closure is being considered again. Karen isat the site where the channel leaves the stream, to get a feel for the situation. She donsa pair of visualization goggles that interface with her portable computer. Using voicecommands, Karen can overlay her view of the terrain with different maps and datasets.She quickly superimposes land ownership, topographic lines, and locations of previousbiological studies. She is also able to view the creek in false color, to see seasonaltemperature variations, and flow rates. She focuses her gaze on the channel and bringsup counts of species that have been surveyed there. She notes there has been an obser-vation of a species of tiger salamander that is listed as threatened. She switches to the screen of her portable to look at the area in plan view. Sheexamines some aerial imagery of the drainage and adds a map showing the location ofthe farms that draw water from the irrigation pond that the channel supplies, plusanother map showing land ownership and use in the area. Using this information shestarts sketching a route to nearby Crabb Lake that could supply replacement water. Karen now turns to the effects of channel closure. She gets her co-worker, TomHamilton, online, and he helps her select a model to use for predictions. He shows herhow to work a wizard that can help select and convert appropriate datasets for use withthe model. Within about fifteen minutes Tom and Karen have located suitable topo-graphic and meteorological data, and the wizard has suggested two possible hydrologicdatasets. Tom recommends using the second one, as it has more complete historicalcoverage. The model is then dispatched to run remotely on a compute server, to workthrough the range of expected stream temperatures and flows if the channel wereclosed. Although Karen isn’t explicitly aware of it, the computation is actually splitinto three parts that take place on three different high-performance cluster computers. Karen is also wondering about sedimentation of downstream gravel beds where bulltrout currently lay eggs. She does a similarity search for documents about other streammodifications in areas with comparable soil types and hydrology. She finds six andexamines them to find which most closely match the current situation.

CONCLUSION

21

The model calculations on predicted temperatures and flows after closing thechannel are done and have been transferred to Karen’s portable, as well as sent to Tomback at the main office. She gets him back online to help interpret them in terms ofeffect on fish. He helps her construct a plot comparing the periods each year whenwater temperature or oxygen levels are likely to adversely affect the fish. They comparethat plot to one based on records from a recent year. They see that the closure wouldlikely yield a great improvement, with periods of adverse conditions being both lessfrequent and of shorter duration. With a little help from Tom, she launches a task torender an animation of water conditions before and after closure, with a color spectrumrepresenting favorable to adverse conditions. That task is routed to a remote server; allKaren cares about is that an MPEG-9 file for the animation is downloaded onto herportable when she gives her presentation to the watershed council in the afternoon. The one nagging issue in Karen’s mind still is the tiger salamanders in the creek.She’d really like to know if that species of salamander was present before the channelwas dug (and thus can be expected to survive if the creek returns to a similar state).Unfortunately, amphibian survey data on Silver Creek only go back about 15 years.Karen has an idea, however. She dispatches a query through the National BiologicalInformation Infrastructure to search holdings of natural history collections throughoutthe country. In about four minutes, she gets back two records of tiger salamanderscollected at Silver Creek, in 1914 and 1933. She is quite impressed by the results, as thequery system knew that Silver Creek was called Sinners Creek before 1920, and thatthe scientific name of that particular species had been modified in the 1950s. She isable to view the digitized label information for the 1933 specimen, which contains anannotation that tiger salamanders were abundant at several places in the stream,including one site near the channel junction. She is reassured that there likely will besuitable habitat for the salamanders if the channel is closed, though there will stillneed to be some further study. Karen sets off to her afternoon meeting with the council feeling much more confi-dent about the presentation she’s going to make than she did ten years earlier. She didin three hours what she was unable to do in three days in 2001.

REFERENCES

Invasive species case study powerpoint presentation (6.2 mb). http://www.nrel.colostate.edu/projects/stohlgren/images/Inventory-monitoring.ppt

President’s Committee of Advisers on Science and Technology (PCAST). March 1998. Teaming with Life:Investing in Science to Understand and Use America’s Living Capital. http://www.whitehouse.gov/WH/EOP/OSTP/Environment/html/teamingcover.html

RELATED READING

Bowker, Geoffrey. Work and Information Practices in the Sciences of Biodiversity. Presented at “Marking theMillennium” – 26th International Conference on Very Large Databases Cairo, Egypt, 10-14 September 2000.http://www2.aucegypt.edu/vldb2000/bowker.pdf

Nielsen, Ebbe, James Edwards and Meredith Lane. Biodiversity Informatics: The Challenge of RapidDevelopment, Large Databases, and complex Data. Keynote presentation at “Marking the Millennium” –26th International Conference on Very Large Databases Cairo, Egypt, 10-14 September 2000. http://www2.aucegypt.edu/vldb2000/NeilsenE.pdf

Schnase, John. Research Directions in Biodiversity Informatics. Presented at “Marking the Millennium” –26th International Conference on Very Large Databases Cairo, Egypt, 10-14 September 2000.http://www2.aucegypt.edu/vldb2000/schnase.pdf

Website for “Marking the Millennium” – 26th International Conference on Very Large Databases Cairo,Egypt, 10-14 September 2000.http://www2.aucegypt.edu/vldb2000/

CONCLUSION

22

APPENDICES

A-I. National Biological InformationInfrastructure (NBII)http://www.nbii.gov/

The National Biological InformationInfrastructure (NBII) was established in1993 to help create a national partnershipfor sharing information on the nation’sbiodiversity and ecosystems. The philoso-phy of the NBII is to build a distributedelectronic federation of biological data andinformation sources and, as part of thiseffort, provide an operational infrastruc-ture that includes policies, protocols, andstandards for participation. These “guide-lines” support discovery, retrieval, integra-tion, and application of biological dataacross NBII’s distributed network. TheNBII is a broad collaborative activityinvolving many agencies and organiza-tions. The US Geological Survey providesoverall coordination and leadership. Therange of NBII’s content and the develop-ment of operational guidelines are twoareas of focus.

Natural history collections and muse-ums represent one example of a veryimportant and rich biological data“source” for the NBII. These institutionsproduce a tremendous amount ofbiodiversity and ecosystem information(much of which is not even digital). NBIIstaff is working directly with museums, aswell as strategically on a national basis, tofacilitate future access to these importantbiological collections.

The goal for NBII is to have diversebiodiversity and ecosystem content frommany types of sources linked together andinteroperable so users can locate allrelevant information for a specific ques-tion in a single search. As an example, a

single query may result in retrievingmuseum specimen data, satellite imagerydata, an ecological model, and a technicalreport. NBII partners and collaborators areworking together to develop the “under-pinnings” or infrastructure of the NBIIfederation, including:

• providing a standardized way forscientists to describe and documenttheir biological data and information;

• providing an online, consistentreference for biological nomenclature;and

• providing a comprehensive biologicalsciences thesaurus or controlledvocabulary.

The first element is being accomplishedthrough the adoption of accepted dataset-level metadata standards (i.e., the biologi-cal data profile of the FGDC metadatacontent standard). All metadata producedfollowing this standard is made accessiblefor online searching through NBII’smetadata clearinghouse.

While spatial coordinates locateobjects in the spatial data world, speciesnames locate objects in the biological dataworld. Providing access to a scientificallycredible, consistent biological speciesnomenclature reference system is criticalto success of the NBII infrastructure.

The third important infrastructurecomponent is the development of a bio-logical controlled vocabulary or thesaurusthat would be available as a consistentreference for use by content providers indocumenting contributions and by NBII’scustomers in locating relevant content.This will be accomplished by building onexisting vocabularies, “knitting” thesevocabularies together, and making theresulting product available for use online.

APPENDICES

23

A-II. Global Biodiversity InformationFacility (GBIF)http://www.gbif.org/

Working in close cooperation with estab-lished programs and organizations thatcompile, maintain, and use biologicalinformation resources, the GlobalBiodiversity Information Facility (GBIF)will be an interoperable network ofbiodiversity and ecosystem databases andinformation technology tools. GBIF willenable users to navigate and put to use theworld’s vast quantities of biodiversity andecosystem information to produce na-tional economic, environmental, andsocial benefits.

The central purpose of establishingGBIF is to design, implement, co-ordinate,and promote the compilation, linking,standardization, digitization, and globaldissemination of the world’s biodiversityand ecosystem data, within an appropriateframework for property rights and dueattribution. It will have the characteristicsof a large, distributed public domaindatabases with a number of interlinkedand interoperable modules (databases,software and networking tools, searchengines, analytical algorithms, etc.). GBIFwill:

• Be a distributed facility, whileencouraging co-operation and

coherence;• Be global in scale, though implemented

nationally and regionally;• Be open to participation by individuals

from all countries, and offeringpotential benefits to all countries,while being funded primarily by thosecountries that have the greatestfinancial capabilities;

• Help bridge human language barriers bypromoting standards and software toolsdesigned to facilitate their adaptationinto multiple languages, character setsand computer encodings;

• Serve to disseminate technologicalcapacity by drawing on and makingwidely available scientific andtechnical information; and

• While aiming to make biodiversity andecosystem information universallyavailable, facilitate respect for thecontribution made by those gatheringand furnishing this information.

Operationally, GBIF will be established asa freestanding international organization.It will be supported by a small secretariatthat will work internationally to co-ordinate national and regional efforts andbring focus to the organization and itsactivities. In addition, it will manage asmall amount of seed money to be used tosupport activities being conducted byother agencies.

APPENDICES

24

A-III. National Ecological ObservatoryNetwork (NEON)http://www.archbold-station.org/abs/neon/index.html

The National Science Foundation’s pro-posed National Ecological ObservatoryNetwork (NEON) will establish 10 obser-vatories located around the country thatwill serve as national research platformsfor integrated studies in field biology. Eachobservatory will provide state-of-the-artinfrastructure to support interdisciplinary,integrated research. Collectively, thenetwork of 10 observatories will allowscientists to conduct comprehensive,continental-scale experiments on ecologi-cal systems. Each NEON observatory willinclude site-based experimental infrastruc-ture; natural history archive facilities; andfacilities for biological, physical and dataanalyses. In addition, the 10 NEONobservatories will be linked via a cutting-edge communications network.

The objectives of NEON are:

• To provide a state-of-the-art nationalfacility for field biologists to conductcutting edge research spanning all levelsof biological organization frommolecular genetics to whole ecosystemstudies and across scales ranging fromseconds to geological time and frommicrons to kilometers;

• To interconnect the geographicallydistributed parts of the facility into onevirtual installation via communicationnetworks so that members of the fieldbiology research community can accessthe facility remotely; and

• To facilitate predictive modeling ofbiological systems via data sharing andsynthesis efforts by users of the facility.

The diversity of ecosystems that compriseour national landscape, from forests tograsslands and deserts to tundra, precludesusing only a single observatory forbiocomplexity research. NEON willconstitute a distributed and virtual na-tional laboratory for field biology and will

foster integrated, interdisciplinary researchthrough long-term collaborations and datasharing. NEON is needed to understandhow our nation’s ecosystems function andto predict their response to natural andanthropogenic events. NEON will alsosystematically and directly support appli-cation of new technologies (e.g., functionalgenomics, molecule-specific stable iso-topes, etc.) to advance ecological research.

Each NEON observatory will contain asuite of common instruments for conti-nental-scale measurement and analysis. Inaddition, each observatory will haveunique infrastructure to address site-specific research questions. Intensivestudies at each observatory will be facili-tated by standardized equipment forintegrated field research (e.g., high resolu-tion global positioning grid arrays,mesonet scale meteorological equipment,eddy flux correlation towers, hydrologicalfacilities, etc.) and laboratory analyses (e.g.confocal microscopes, DNA sequencers,stable isotope mass spectrophotometers,CHN Analyzers, ultracold tissue archives,and digital museum technology). Theobservatories will have scalable computa-tion capabilities and will be networked viasatellite and landlines to the vBNS, to eachother, and to specialized facilities, such assupercomputer centers.

NEON will provide a superb platformfor educational uses and outreach. K-12students, undergraduate students, and thegeneral public heavily use biological fieldstations, potential affiliates of NEONobservatories. NEON communications andresearch facilities will introduce studentsat all levels to cutting-edge ecologicalresearch. Many experimental research sitesmanaged by potential NEON members arelocated close to community colleges, land-grant colleges, and HBCUs. In addition toits value for scientific and educationpurposes, NEON activities will develop awide range of data that will be of value to abroad array of users. The general publicwill be able to access NEON databases, aswill decision-makers.

APPENDICES

25

A-IV. List of Workshop Participantsand Observers

PEGGY AGOURIS

University of Maine5711 Boardmann Hall, Room 342Orono, ME 04469-5711

Phone: 207-581-2180Fax: 207-581-2206Email: [email protected]

GEOF BOWKER

Department of CommunicationUniversity of California, San Diego9500 Gilman DriveLa Jolla, CA 92093-0503


LARRY BRANDT

National Science Foundation4201 Wilson Blvd, Room 1160Arlington, VA 22230


MICHAEL L. BRODIE

GTE Laboratories40 Sylvan RoadWaltham, MA 02451-1128


ROZ COHEN

NOAA National Oceanographic Data CenterSSMC3, #48221315 East-West HighwaySilver Spring, MD 20910-3282

Phone: 301-713-3267 x146Fax: 301-713-3300Email: [email protected]

GLADYS COTTER

US Geological SurveyMS 30012201 Sunrise Valley DriveReston, VA 20192


JUDY CUSHING

The Evergreen State College2700 Evergreen Parkway NWOlympia, WA 98505

Phone: 360-866-6000Email: [email protected]

JIM EDWARDS



MIKE FRAME



MICHAEL FREESTON

Department of Computer ScienceUniversity of California, Santa BarbaraSanta Barbara, CA 93106


ANNE FRONDORF

US Geological SurveyMS 30012201 Sunrise Valley DriveReston, Virginia 20192


VALERIE GREGG

National Science Foundation4201 Wilson Blvd.,Arlington, VA 22230


APPENDICES

26

MILT HALEM

Chief Information OfficerNASA Goddard Space Flight CenterGreenbelt, MD 20771


CHRISTINA HARGIS

US Forest ServiceForestry Sciences Lab2500 S. Pine KnollFlagstaff, AZ 86001


JOHN HELLY

San Diego Supercomputer Center9500 Gilman DriveLa Jolla, CA 92024-0527


EDUARD HOVY

Information Science InstituteUniversity of Southern California4676 Admiralty WayMarina del Rey, CA 90292-6695


ERIC LANDIS

Oregon Graduate Institute13875 Tangen RoadNewberg, OR 97132


MEREDITH LANE

The Academy of Natural Sciences1900 Benjamin Franklin ParkwayPhiladelphia, PA␣ 19103


MIRON LIVNY

University of Wisconsin – Madison1210 West Dayton StreetMadison, WI 53706-1685


DAVID MAIER

Oregon Graduate Institute20000 NW Walker RoadBeaverton, OR 97006


REAGAN MOORE

San Diego Supercomputer CenterUniversity of California, San Diego9500 Gilman DriveLa Jolla, CA 92093-0505


JOHN PORTER

VCR Long-Term Environmental ResearchDepartment of Environmental SciencesUniversity of VirginiaPO Box 400123291 McCormick RoadCharlottesville, VA 22904-4123


JIM QUINN

Dept. of Environmental Science and PolicyUniversity of California, DavisDavis, CA 95616-8576

Phone: 530-752-8027FAX: 530-752-3350Email: [email protected]

JOHN L. SCHNASE

Earth and Space Data Computing DivisionNASA Goddard Space Flight CenterGreenbelt, MD 20771


APPENDICES

27

BERNARD SHANKS



AVI SILBERSCHATZ

Lucent Technologies Bell Labs600 Mountain AvenueMurray Hill, NJ 07974-0636


JIM SMITH

Laboratory for Terrestrial PhysicsNASA Goddard Space Flight CenterBldg. 33, Rm. G125D, Code 920Greenbelt, MD 20771


ANTHONY STEFANIDIS

University of Maine5711 Boardman Hall, Room 348Orono, ME 04469-5711


BRUCE STEIN

Association for Biodiversity Information4245 North Fairfax DriveSuite 100Arlington, VA␣ 22203-1606


TOM STOHLGREN

Midcontinent Ecological Science CenterUS Geological SurveyNatural Resource Ecology LaboratoryColorado State UniversityFort Collins, CO 80523


WOODY TURNER

Mail Code YSNASA HeadquartersWashington, DC 20546-0001


MATT WARD

Worcester Polytechnic Institute100 Institute RoadWorcester, MA 01609


MARIA ZEMANKOVA



APPENDICES

28

biodiversity and ecosystem informatics workshop report

Documents