e-science, information infrastructure, and digital libraries
DESCRIPTION
e-Science, Information Infrastructure, and Digital Libraries. Libraries in the Digital Age Dubrovnik 30 May 2005 Christine L. Borgman Professor & Presidential Chair in Information Studies University of California, Los Angeles & Oxford Internet Institute (2004-05). Outline of talk. - PowerPoint PPT PresentationTRANSCRIPT
e-Science, Information Infrastructure, and Digital Libraries
Libraries in the Digital Age
Dubrovnik
30 May 2005
Christine L. BorgmanProfessor & Presidential Chair in Information Studies
University of California, Los Angeles
&
Oxford Internet Institute (2004-05)
Outline of talk
• What are we building, for whom, and why?
– e-Science / e-Social Science / Cyberinfrastructure
– Digital libraries
• Case studies
– Alexandria Digital Earth Prototype (ADEPT)
– Center for Embedded Networked Sensing (CENS)
• Summary of issues
Goals of Cyberinfrastructure
• Cyberinfrastructure (U.S. initiative)*– “technology … now make(s) possible a comprehensive
“cyberinfrastructure” on which to build new types of scientific and engineering knowledge environments and organizations and to pursue research in new ways and with increased efficacy.”
– “NSF must ensure that the exponentially growing amounts of data are collected, curated, managed, and stored for broad, long-term access by scientists everywhere.”
*Atkins, D., Revolutionizing Science and Engineering Through Cyberinfrastructure: Report of the National Science Foundation Blue-Ribbon panel on Cyberinfrastructure. 2003, January. Executive summary. http://www.communitytechnology.org/nsf_ci_report/
Goals of e-Science
• e-Science (U.K. initiative)– “e-Science: large scale science … carried out through distributed
global collaborations enabled by the Internet.” *– “collaborative scientific enterprises … require access to very large
data collections, very large scale computing resources and high performance visualisation” *
– “e-Science data generated from sensors, satellites, high-performance computer simulations, high-throughput devices, scientific images … will soon dwarf all of the scientific data collected in the whole history of scientific exploration.””**
*About the UK e-Science Programme. http://www.rcuk.ac.uk/escience/**Hey & Trefethen, 2003, The data deluge: an E-science perspective.
Digital Libraries
• Systems that support searching, use, creation of content
• Institutions with people, digital collections, and services
• Repositories of digital data and documents, as a component of cyberinfrastructure, e-research, e-science, e-social science, e-learning…– Institutional repositories
– Open archives
– Data collections
Digital libraries and e-Science: Architecture
• JISC/NSF working group on digital libraries and e-science / cyberinfrastructure– Layered model of cyberinfrastructure
– Venn diagram model
Images courtesy of Norman Wiseman, JISC
Information & knowledge layer
Middleware
services layer
ITC Infrastructure
Processors, memory, network
ContentApplications
Space
Cyberinfrastructure: An Evolving View
Digital Libraries
Scientific DBs
UserInterfaces &Tools
The Focus for DL Infrastructure:Version 2
E-ScienceOr
E-ResearchE-Learning
Hybrid Libraries
E-resources: common to all, need demonstrators of how it will work
Digital libraries and e-Science: Content
• Scholarly communication: Secondary resources– Conference proceedings– Journal articles– Working papers, manuscripts, pre-prints– Books
• Research data: Primary sources– Raw data from research
• Instrumented data collection (labs, sensor networks)• Field notes
– Archival sources• Unique documents• Records of individuals and organizations
• Value chain: linking data, documents, and related sources
E-bank image courtesy of Jeremy Frey and Liz Lyon; Crystallography image courtesy of Tony Hey
Grid
E-Scientists
Entire E-Science CycleEncompassing experimentation, analysis, publication, research, learning
5
Institutional Archive
LocalWebPublisher
Holdings
Digital Library
E-Scientists Graduate Students
Undergraduate Students
Virtual Learning Environment
E-Experimentation
E-Scientists
Technical Reports
Reprints
Peer-Reviewed Journal &
Conference Papers
Preprints & Metadata
Certified Experimental
Results & Analyses
Data, Metadata & Ontologies
eBank Project
Crystallographic e-Prints Direct Access to Raw Data from scientific papers
Raw data sets can be very large and these are stored at National Datastore using SRB server
Digital libraries and e-Science: Curation
• Curation: maintenance of data over its entire life-cycle and over time for current and future generations of users. Involves active management and appraisal… adding value… to a trusted body of information.*
• Scholarly communication: Secondary resources– Libraries– Disciplinary repositories (arXiv)– Institutional repositories (ePrints, Dspace)– Publishers
• Research data: Primary sources– Archives (arts and humanities)– Disciplinary repositories (e.g., IVOA, BADC, IRIS, BIRN, NEON, GEON)– Research groups
*What is digital curation? (2004). Edinburgh, Digital Curation Center. 2004.
Image of British Atmospheric Data Centre courtesy of Bryan Lawrence
British Atmospheric Data Centre
British Oceanographic Data Centre
Simulations
Assimilation
Complexity + Volume + Remote Access = Grid Challenge
Digital libraries: Uses of secondary resources
• Community orientation of researchers
– “open science” depends upon sharing, transparency, rapid dissemination
– documents may be self-archived, deposited, shared with peers
– incentive and reward system based on publication
– researchers contribute to digital collections via publication
• Individual orientation of students
– searchers of digital collections, not contributors
– reliant upon search mechanisms and bibliographic control
• Scholarly publications: inputs and outputs of research
• Digital libraries: “boundary objects” between experts and novices
Digital libraries: Uses of primary sources
• Community orientation of researchers– e-Science goal to facilitate creation and use of research data by
collaborating groups
– Sharing practices vary widely by research topic and discipline
– Agreements made for access to data, credit for publications
• Data: input and output of research
• Providing context to interpret data– Scholarly publications provide context
– Digital libraries remove context
Digital libraries: Sharing primary sources
• Incentives to share data– Tradition of “open science” to promote access, transparency– Establish trust and reciprocity within a research group– Ability to mine large data sets, compare results– Ability to replicate experiments, studies– Requirement of some funding agencies
• Incentives not to share data– Rewards for publication, not for data management– Benefits of contributing data may accrue to other parties– Risks of misinterpretation of your data– Risks of losing control over data – Risks of loss of intellectual property
Digital libraries, e-Science and e-Learning
• Case studies in the design of scientific digital libraries that will support research and teaching applications – Alexandria Digital Earth Prototype (ADEPT)
• http://is.gseis.ucla.edu/adept/
– Center for Embedded Networked Sensing (CENS)• http://www.cens.ucla.edu
Alexandria Digital Earth ProtoType<#>
What information do you have about her?
If it has a latitude and longitude thenit can be in ADL
Ocean Science Data
Zoological Habitat Study
Botanical Study
Alexandria DL of Distributed Spatial Information Objects
Archeological Dig
Other Digital Archives
Museum Artifacts
Earth Art
Contaminant Transport Group
“backbone”network
adapted fromCA DWR website
• Multimedia, Multiscale problems (time and space) • Multidisciplinary (current and as yet unknown)
problems• Management, visualization, exploration of
massive, heterogeneous data streams
Data models for habitat monitoring and sensor networks
Center for Embedded Networked Sensing:Education and Data Management Projects
• Goals • Make sensor data useful for scientists• Make sensor data useful for teaching high school science• Facilitate inquiry learning with scientific data
• User communities• Research scientists (habitat ecology, seismology) • High school science teachers (biology and physics)• High school students
• Activities to be supported• Scientific data management by scientists• Enable students to formulate and test hypotheses • Enable students to design experiments “tasking” sensors
Some CENS Results-1
• CENS committed in principle to sharing data• Data management practices vary widely by discipline
• Seismic: High experience, use of community repository (IRIS) and metadata format (SEED)
• Habitat ecology: Low experience, exploring new community repository (Morpho) and metadata (Environmental Metadata Language)
• Avian biology (localization of birdsongs): High experience across multiple disciplines
• Education: Standards exist for educational modules but not for data
Some CENS Results-2
• No single metadata model or crosswalk will serve all CENS scientific applications• Metadata for individual disciplines, e.g.,
• Environmental Metadata Language: biocomplexity data• SEED: seismic data
• Metadata for technical devices, e.g., Sensor Markup Language
• Geospatial coordinates required for most applications• Geospatial data standards exists for 2D points• Context descriptors also needed (distance from sea
level, local distance from ground, above/below leaf, north/south side of tree)
METADATA FOR SENSOR DATA FOR HABITAT MONITORING
METADATA FOR EDUCATION MODULES FOR HABITAT MONITORING
CENS Schema SensorML EML 2.0 LOM GEM ADN
CENS_Node.Node_NameName of Node
Sml:IdentifiedAs(2.2.2)
CENS_Node.Node_DescDescription of Node
AssetDescription:sml:description(2.2.12)
CENS_Location.Location_IDUnique location ID
CrsID (2.2.5) Eml-Coverage (2.4.4)
CENS_Location.X_Pos(Position on X axis)
HasCRS (2.2.5)ObjectState (3.3.6)
Eml-Coverage-GeographicCoverage(2.4.4)
CENS_Location.Time_RecordedTime location was captured
Eml-Coverage-TemporalCoverage(2.4.4)
CENS_Location.Time_Type_IDRefers to type of time of Time_Type ID table
Eml-Coverage (2.4.4)
Educational-Typical Age Range(5.7)
Audience-Age
Audience
Life Cycle-Contribute(2.3)
Creator
Resource Creator
General-Coverage(1.6)
Coverage-Spatial, Temporal
Coverage (spatial and temporal)
Life Cycle-Date (2.3.3)DateTime (8)
Date
Creation date Accession date
General-Description(1.4)
Description
Description
Educational (5)
Pedagogy
Educational
CENS Research Directions
• Infrastructure goals for CENS• Support scientists in collecting, managing, preserving,
sharing data • Develop modular, extensible metadata architecture (XML-
based)• Develop filtering tools to extract and visualize scientific
data for educational use• Conduct behavioral studies of scientists, teachers, and
students • How do they determine their data requirements?• What are their criteria for selecting, preserving data?• How do they use scientific data?• How do their uses evolve over time?• What are their incentives and disincentives to contribute
data to repositories?
Summary
• Technical infrastructure of e-Science under construction
• Social aspects of e-Science infrastructure poorly understood
• Value chain of e-Science requires linkage between data and documents– Scholarly communication infrastructure evolving, but exists
– Data infrastructure fragmented across disciplines, communities
• Digital libraries: intersection of e-science, e-learning, and libraries– Data practices vary more widely than do publication practices
– Communities may represent, organize, and use the same data differently
– Individuals may use the same content differently for different purposes
– Research on digital libraries and e-learning can identify some of the
challenges for building an e-science infrastructure
Acknowledgements
• ADEPT is funded by National Science Foundation grant no. IIS-9817432, Terence R. Smith, University of California, Santa Barbara, Principal Investigator. http://is.gseis.ucla.edu/adept/– ADEPT collaborators: Gregory Leazer, Anne Gilliland, Richard
Mayer• CENS is funded by National Science Foundation
Cooperative Agreement #CCR-0120778, Deborah L. Estrin, UCLA, Principal Investigator. http://www.cens.ucla.edu– CENS collaborators: Jonathan Furner, William Sandoval, Noel
Enyedy, Michael Hamilton
Alexandria Digital Earth ProtoType<#>
ADEPT Project: Geospatial digital libraries Goals
Add services to Alexandria Digital Library for teaching undergraduate courses in geography
Facilitate inquiry learning by providing access to primary sources
User communities Faculty, as researchers Faculty, as teachers of undergraduate courses Undergraduate students
Activities to be supported Information searching and retrieval Composing lectures that incorporate text, concepts, and objects Constructing learning modules in which students can formulate
and test hypotheses
Alexandria Digital Earth ProtoType<#>
Some ADEPT Results (1999-2004)-1 Technology use by geographers
Research: sophisticated users of IT; mapping software, atmospheric simulations, sensor networks
Teaching: minimal users of IT; rely on overhead projectors, chalk, paper maps, boxes of rocks, lecturing from hand-written notes
Information seeking by geographers Research: typical library use, online searching Teaching: irregular, non-directed, often a by-product of research
activities Information resources used by geographers
Research: varies by specialty; all want maps and images Teaching: varies by course; all want maps and images
Alexandria Digital Earth ProtoType<#>
Some ADEPT Results (1999-2004)-2 Search queries of geographers
Research: concept, place (place name, latitude/longitude) Teaching: concept, place, process (examples of erosion, population
movements, etc.)
Selection of digital resources for teaching Clarity of presentation more important than familiarity of location Desire to annotate, highlight, resize, for presentation
Use of primary data in instruction Preference for use of own research data Tools to manage own research data would make DL teaching
services more attractive Most prefer to hold their data privately; want control over personal
digital libraries of their teaching and research resources