e-science, information infrastructure, and digital libraries

39
e-Science, Information Infrastructure, and Digital Libraries Libraries in the Digital Age Dubrovnik 30 May 2005 Christine L. Borgman Professor & Presidential Chair in Information Studies University of California, Los Angeles & Oxford Internet Institute (2004-05)

Upload: holli

Post on 19-Jan-2016

51 views

Category:

Documents


0 download

DESCRIPTION

e-Science, Information Infrastructure, and Digital Libraries. Libraries in the Digital Age Dubrovnik 30 May 2005 Christine L. Borgman Professor & Presidential Chair in Information Studies University of California, Los Angeles & Oxford Internet Institute (2004-05). Outline of talk. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: e-Science, Information Infrastructure, and Digital Libraries

e-Science, Information Infrastructure, and Digital Libraries

Libraries in the Digital Age

Dubrovnik

30 May 2005

Christine L. BorgmanProfessor & Presidential Chair in Information Studies

University of California, Los Angeles

&

Oxford Internet Institute (2004-05)

Page 2: e-Science, Information Infrastructure, and Digital Libraries

Outline of talk

• What are we building, for whom, and why?

– e-Science / e-Social Science / Cyberinfrastructure

– Digital libraries

• Case studies

– Alexandria Digital Earth Prototype (ADEPT)

– Center for Embedded Networked Sensing (CENS)

• Summary of issues

Page 3: e-Science, Information Infrastructure, and Digital Libraries
Page 4: e-Science, Information Infrastructure, and Digital Libraries
Page 5: e-Science, Information Infrastructure, and Digital Libraries
Page 6: e-Science, Information Infrastructure, and Digital Libraries

Goals of Cyberinfrastructure

• Cyberinfrastructure (U.S. initiative)*– “technology … now make(s) possible a comprehensive

“cyberinfrastructure” on which to build new types of scientific and engineering knowledge environments and organizations and to pursue research in new ways and with increased efficacy.”

– “NSF must ensure that the exponentially growing amounts of data are collected, curated, managed, and stored for broad, long-term access by scientists everywhere.”

*Atkins, D., Revolutionizing Science and Engineering Through Cyberinfrastructure: Report of the National Science Foundation Blue-Ribbon panel on Cyberinfrastructure. 2003, January. Executive summary. http://www.communitytechnology.org/nsf_ci_report/

Page 7: e-Science, Information Infrastructure, and Digital Libraries
Page 8: e-Science, Information Infrastructure, and Digital Libraries
Page 9: e-Science, Information Infrastructure, and Digital Libraries

Goals of e-Science

• e-Science (U.K. initiative)– “e-Science: large scale science … carried out through distributed

global collaborations enabled by the Internet.” *– “collaborative scientific enterprises … require access to very large

data collections, very large scale computing resources and high performance visualisation” *

– “e-Science data generated from sensors, satellites, high-performance computer simulations, high-throughput devices, scientific images … will soon dwarf all of the scientific data collected in the whole history of scientific exploration.””**

*About the UK e-Science Programme. http://www.rcuk.ac.uk/escience/**Hey & Trefethen, 2003, The data deluge: an E-science perspective.

Page 10: e-Science, Information Infrastructure, and Digital Libraries

Digital Libraries

• Systems that support searching, use, creation of content

• Institutions with people, digital collections, and services

• Repositories of digital data and documents, as a component of cyberinfrastructure, e-research, e-science, e-social science, e-learning…– Institutional repositories

– Open archives

– Data collections

Page 11: e-Science, Information Infrastructure, and Digital Libraries

Digital libraries and e-Science: Architecture

• JISC/NSF working group on digital libraries and e-science / cyberinfrastructure– Layered model of cyberinfrastructure

– Venn diagram model

Images courtesy of Norman Wiseman, JISC

Page 12: e-Science, Information Infrastructure, and Digital Libraries

Information & knowledge layer

Middleware

services layer

ITC Infrastructure

Processors, memory, network

ContentApplications

Space

Cyberinfrastructure: An Evolving View

Digital Libraries

Scientific DBs

UserInterfaces &Tools

Page 13: e-Science, Information Infrastructure, and Digital Libraries

The Focus for DL Infrastructure:Version 2

E-ScienceOr

E-ResearchE-Learning

Hybrid Libraries

E-resources: common to all, need demonstrators of how it will work

Page 14: e-Science, Information Infrastructure, and Digital Libraries

Digital libraries and e-Science: Content

• Scholarly communication: Secondary resources– Conference proceedings– Journal articles– Working papers, manuscripts, pre-prints– Books

• Research data: Primary sources– Raw data from research

• Instrumented data collection (labs, sensor networks)• Field notes

– Archival sources• Unique documents• Records of individuals and organizations

• Value chain: linking data, documents, and related sources

E-bank image courtesy of Jeremy Frey and Liz Lyon; Crystallography image courtesy of Tony Hey

Page 15: e-Science, Information Infrastructure, and Digital Libraries

Grid

E-Scientists

Entire E-Science CycleEncompassing experimentation, analysis, publication, research, learning

5

Institutional Archive

LocalWebPublisher

Holdings

Digital Library

E-Scientists Graduate Students

Undergraduate Students

Virtual Learning Environment

E-Experimentation

E-Scientists

Technical Reports

Reprints

Peer-Reviewed Journal &

Conference Papers

Preprints & Metadata

Certified Experimental

Results & Analyses

Data, Metadata & Ontologies

eBank Project

Page 16: e-Science, Information Infrastructure, and Digital Libraries

Crystallographic e-Prints Direct Access to Raw Data from scientific papers

Raw data sets can be very large and these are stored at National Datastore using SRB server

Page 17: e-Science, Information Infrastructure, and Digital Libraries

Digital libraries and e-Science: Curation

• Curation: maintenance of data over its entire life-cycle and over time for current and future generations of users. Involves active management and appraisal… adding value… to a trusted body of information.*

• Scholarly communication: Secondary resources– Libraries– Disciplinary repositories (arXiv)– Institutional repositories (ePrints, Dspace)– Publishers

• Research data: Primary sources– Archives (arts and humanities)– Disciplinary repositories (e.g., IVOA, BADC, IRIS, BIRN, NEON, GEON)– Research groups

*What is digital curation? (2004). Edinburgh, Digital Curation Center. 2004.

Image of British Atmospheric Data Centre courtesy of Bryan Lawrence

Page 18: e-Science, Information Infrastructure, and Digital Libraries
Page 19: e-Science, Information Infrastructure, and Digital Libraries

British Atmospheric Data Centre

British Oceanographic Data Centre

Simulations

Assimilation

Complexity + Volume + Remote Access = Grid Challenge

Page 20: e-Science, Information Infrastructure, and Digital Libraries
Page 21: e-Science, Information Infrastructure, and Digital Libraries
Page 22: e-Science, Information Infrastructure, and Digital Libraries
Page 23: e-Science, Information Infrastructure, and Digital Libraries

Digital libraries: Uses of secondary resources

• Community orientation of researchers

– “open science” depends upon sharing, transparency, rapid dissemination

– documents may be self-archived, deposited, shared with peers

– incentive and reward system based on publication

– researchers contribute to digital collections via publication

• Individual orientation of students

– searchers of digital collections, not contributors

– reliant upon search mechanisms and bibliographic control

• Scholarly publications: inputs and outputs of research

• Digital libraries: “boundary objects” between experts and novices

Page 24: e-Science, Information Infrastructure, and Digital Libraries

Digital libraries: Uses of primary sources

• Community orientation of researchers– e-Science goal to facilitate creation and use of research data by

collaborating groups

– Sharing practices vary widely by research topic and discipline

– Agreements made for access to data, credit for publications

• Data: input and output of research

• Providing context to interpret data– Scholarly publications provide context

– Digital libraries remove context

Page 25: e-Science, Information Infrastructure, and Digital Libraries

Digital libraries: Sharing primary sources

• Incentives to share data– Tradition of “open science” to promote access, transparency– Establish trust and reciprocity within a research group– Ability to mine large data sets, compare results– Ability to replicate experiments, studies– Requirement of some funding agencies

• Incentives not to share data– Rewards for publication, not for data management– Benefits of contributing data may accrue to other parties– Risks of misinterpretation of your data– Risks of losing control over data – Risks of loss of intellectual property

Page 26: e-Science, Information Infrastructure, and Digital Libraries

Digital libraries, e-Science and e-Learning

• Case studies in the design of scientific digital libraries that will support research and teaching applications – Alexandria Digital Earth Prototype (ADEPT)

• http://is.gseis.ucla.edu/adept/

– Center for Embedded Networked Sensing (CENS)• http://www.cens.ucla.edu

Page 27: e-Science, Information Infrastructure, and Digital Libraries

Alexandria Digital Earth ProtoType<#>

What information do you have about her?

If it has a latitude and longitude thenit can be in ADL

Ocean Science Data

Zoological Habitat Study

Botanical Study

Alexandria DL of Distributed Spatial Information Objects

Archeological Dig

Other Digital Archives

Museum Artifacts

Earth Art

Page 28: e-Science, Information Infrastructure, and Digital Libraries

Contaminant Transport Group

“backbone”network

adapted fromCA DWR website

• Multimedia, Multiscale problems (time and space) • Multidisciplinary (current and as yet unknown)

problems• Management, visualization, exploration of

massive, heterogeneous data streams

Data models for habitat monitoring and sensor networks

Page 29: e-Science, Information Infrastructure, and Digital Libraries

Center for Embedded Networked Sensing:Education and Data Management Projects

• Goals • Make sensor data useful for scientists• Make sensor data useful for teaching high school science• Facilitate inquiry learning with scientific data

• User communities• Research scientists (habitat ecology, seismology) • High school science teachers (biology and physics)• High school students

• Activities to be supported• Scientific data management by scientists• Enable students to formulate and test hypotheses • Enable students to design experiments “tasking” sensors

Page 30: e-Science, Information Infrastructure, and Digital Libraries

Some CENS Results-1

• CENS committed in principle to sharing data• Data management practices vary widely by discipline

• Seismic: High experience, use of community repository (IRIS) and metadata format (SEED)

• Habitat ecology: Low experience, exploring new community repository (Morpho) and metadata (Environmental Metadata Language)

• Avian biology (localization of birdsongs): High experience across multiple disciplines

• Education: Standards exist for educational modules but not for data

Page 31: e-Science, Information Infrastructure, and Digital Libraries

Some CENS Results-2

• No single metadata model or crosswalk will serve all CENS scientific applications• Metadata for individual disciplines, e.g.,

• Environmental Metadata Language: biocomplexity data• SEED: seismic data

• Metadata for technical devices, e.g., Sensor Markup Language

• Geospatial coordinates required for most applications• Geospatial data standards exists for 2D points• Context descriptors also needed (distance from sea

level, local distance from ground, above/below leaf, north/south side of tree)

Page 32: e-Science, Information Infrastructure, and Digital Libraries

METADATA FOR SENSOR DATA FOR HABITAT MONITORING

METADATA FOR EDUCATION MODULES FOR HABITAT MONITORING

CENS Schema SensorML EML 2.0 LOM GEM ADN

CENS_Node.Node_NameName of Node

Sml:IdentifiedAs(2.2.2)

       

CENS_Node.Node_DescDescription of Node

AssetDescription:sml:description(2.2.12)

       

CENS_Location.Location_IDUnique location ID

CrsID (2.2.5) Eml-Coverage (2.4.4)

     

CENS_Location.X_Pos(Position on X axis)

HasCRS (2.2.5)ObjectState (3.3.6)

Eml-Coverage-GeographicCoverage(2.4.4)

     

CENS_Location.Time_RecordedTime location was captured

  Eml-Coverage-TemporalCoverage(2.4.4)

     

CENS_Location.Time_Type_IDRefers to type of time of Time_Type ID table

  Eml-Coverage (2.4.4)

     

      Educational-Typical Age Range(5.7)

Audience-Age 

Audience 

      Life Cycle-Contribute(2.3)

Creator 

Resource Creator 

      General-Coverage(1.6)

Coverage-Spatial, Temporal 

Coverage (spatial and temporal)

      Life Cycle-Date (2.3.3)DateTime (8)

Date 

Creation date Accession date  

      General-Description(1.4)

Description 

Description

      Educational (5)

Pedagogy 

Educational

Page 33: e-Science, Information Infrastructure, and Digital Libraries

CENS Research Directions

• Infrastructure goals for CENS• Support scientists in collecting, managing, preserving,

sharing data • Develop modular, extensible metadata architecture (XML-

based)• Develop filtering tools to extract and visualize scientific

data for educational use• Conduct behavioral studies of scientists, teachers, and

students • How do they determine their data requirements?• What are their criteria for selecting, preserving data?• How do they use scientific data?• How do their uses evolve over time?• What are their incentives and disincentives to contribute

data to repositories?

Page 34: e-Science, Information Infrastructure, and Digital Libraries

Summary

• Technical infrastructure of e-Science under construction

• Social aspects of e-Science infrastructure poorly understood

• Value chain of e-Science requires linkage between data and documents– Scholarly communication infrastructure evolving, but exists

– Data infrastructure fragmented across disciplines, communities

• Digital libraries: intersection of e-science, e-learning, and libraries– Data practices vary more widely than do publication practices

– Communities may represent, organize, and use the same data differently

– Individuals may use the same content differently for different purposes

– Research on digital libraries and e-learning can identify some of the

challenges for building an e-science infrastructure

Page 35: e-Science, Information Infrastructure, and Digital Libraries

Acknowledgements

• ADEPT is funded by National Science Foundation grant no. IIS-9817432, Terence R. Smith, University of California, Santa Barbara, Principal Investigator. http://is.gseis.ucla.edu/adept/– ADEPT collaborators: Gregory Leazer, Anne Gilliland, Richard

Mayer• CENS is funded by National Science Foundation

Cooperative Agreement #CCR-0120778, Deborah L. Estrin, UCLA, Principal Investigator. http://www.cens.ucla.edu– CENS collaborators: Jonathan Furner, William Sandoval, Noel

Enyedy, Michael Hamilton

Page 36: e-Science, Information Infrastructure, and Digital Libraries

Alexandria Digital Earth ProtoType<#>

ADEPT Project: Geospatial digital libraries Goals

Add services to Alexandria Digital Library for teaching undergraduate courses in geography

Facilitate inquiry learning by providing access to primary sources

User communities Faculty, as researchers Faculty, as teachers of undergraduate courses Undergraduate students

Activities to be supported Information searching and retrieval Composing lectures that incorporate text, concepts, and objects Constructing learning modules in which students can formulate

and test hypotheses

Page 37: e-Science, Information Infrastructure, and Digital Libraries

Alexandria Digital Earth ProtoType<#>

Some ADEPT Results (1999-2004)-1 Technology use by geographers

Research: sophisticated users of IT; mapping software, atmospheric simulations, sensor networks

Teaching: minimal users of IT; rely on overhead projectors, chalk, paper maps, boxes of rocks, lecturing from hand-written notes

Information seeking by geographers Research: typical library use, online searching Teaching: irregular, non-directed, often a by-product of research

activities Information resources used by geographers

Research: varies by specialty; all want maps and images Teaching: varies by course; all want maps and images

Page 38: e-Science, Information Infrastructure, and Digital Libraries

Alexandria Digital Earth ProtoType<#>

Some ADEPT Results (1999-2004)-2 Search queries of geographers

Research: concept, place (place name, latitude/longitude) Teaching: concept, place, process (examples of erosion, population

movements, etc.)

Selection of digital resources for teaching Clarity of presentation more important than familiarity of location Desire to annotate, highlight, resize, for presentation

Use of primary data in instruction Preference for use of own research data Tools to manage own research data would make DL teaching

services more attractive Most prefer to hold their data privately; want control over personal

digital libraries of their teaching and research resources

Page 39: e-Science, Information Infrastructure, and Digital Libraries