evolving digital libraries to support geographically distributed scientific research rick luce...

24
Evolving Digital Libraries to Support Geographically Distributed Scientific Research Rick Luce Research Library Director Library Without Walls Project Leader Los Alamos National Laboratory Symposium on Knowledge Environments for Science NSF, October 22, 2002

Post on 20-Dec-2015

221 views

Category:

Documents


4 download

TRANSCRIPT

Evolving Digital Libraries to Support

Geographically Distributed Scientific Research

Rick LuceResearch Library DirectorLibrary Without Walls Project LeaderLos Alamos National Laboratory

Symposium on Knowledge Environments for ScienceNSF, October 22, 2002

Standards &InteroperableFrameworks

EnablingTechnologies

& Infrastructure

Content:• Access• Retrieval

Financial Models• Funding • Content licensing

User Behavior• User needs • Collaboration• Scholarly communication changes• Adoption curves

Some Puzzle Pieces for Digital Libraries

Delivery of Content & Services

• Libraries replicating one another• Requires integrated framework• Lack of interoperability• Tough work

Trend

• Publisher pricing flip for e-content• Old model of libraries facing decline or aggregation

DL Models: Delivery

Content CaptureIngest repositories • Easy entry in network environment• Digitization of old stuff• E-collections distributed but

archiving is unknown• Largely publisher controlled today

New players emergingLow barrier entry

Trend

DL Models: Capture

eprint systems

Eprint Systems:xxx or arXiv e-print archive Physics: 1991 Ginsparg, LANL RePEc - (Economics - Surrey U - Krichel) NCSTRL - (Computer Science - Cornell U - Lagoze) NDLTD - (Theses - Virginia Tech - Fox) CogPrints - (Cognitive Sciences - Southampton - Harnad)

Harvesters ARC & ARCHON - Computer Science Dep’t, ODU SCIRUS – Elsevier even at the individual level … Kepler - ODU

Capture Systems

Content Capture

NSF-

NSDLDLESE

Share usage logs between nodesShare citations & digital archives

New collaboration opportunities

Normalization

Authentication –Shibboleth DRM

Delivery of Content & Services

OAI protocolsStandards

Digital Library Hybrid

Stanford Univ Pacific Northwest Nat’l Lab Edwards AFB Univ Nevada Idaho Nat’l Eng. & Enviro Lab

4 New Mexico Universities Sandia National Labs Air Force Research Lab Nat’l Renewable Energy Lab Santa Fe Institute

Albany Research Cntr. Brooks AFB Brookhaven Nat’l Lab Eglin AFB Enviro Measurem’t Lab DOE HQ Energy library Fed. Technology Center Griffith AFB Oak Ridge Nat’l Lab Savanah River Co. Tyndall AFB Hanscomb AFB Wright Patterson AFB Montana State Univ

29 Institutional Customers in the U.S.

Sandia National Labs

Who has access to 80%+ of e-content

3%

97%

Open Access 6%

94%

Open Access

~8M full text articles

Copyrightrestrictions

Copyrightrestrictions

~60M metadata records

Large fraction of scholarly content has significant access restrictions & cost barriers

ChallengesFALLOUT: WITH PUBSCIENCE GONE, SIIA SEEKS OTHER

CLOSURES -- With PubSCIENCE now history, the trade association that lobbied for its dismantling is reportedly set to focus its energies on other freely accessible government information resources. According to FEDERAL COMPUTER WEEK, Software and Information Industry Association (SIIA) public policy director David Le Duc said the group was "looking into a couple of other databases and agencies," in particular one "law-related" and one that "has to do with agriculture." After more than a year of intense lobbying by the SIIA, a major trade association for the software and digital content industry, the federal government discontinued PubSCIENCE in early November …They argue, that it is unfair for taxpayer dollars to fund databases that compete with commercial products.

Library Journal Academic News Wire: November 19, 2002

Repository Models

• Distributed – MIT individual faculty upload and manage their own scholarly output

• Semi-distributed – UC eScholarship assigns management responsibility to organizational units (research units, departments) that then assist faculty with uploading their papers.

• Semi-centralized - CalTech repository sites are set up for any university unit, but the library uploads the papers on the faculty's behalf. Its digital collections range from computer science technical reports to theses and dissertations.

Institutional Repositories: Roy Tennant, 9/15/02

So far: harvesting of descriptive metadata ... but coming, harvesting of:

references usage logs certification metadata metadata rights citation mapping co-citation visualization personalization

OAI’s roleOAI’s Role

OpenURLInformation resources allow open linking by including a hook along with

each metadata description... which presents itself as an actionable OpenURL

Create Shared User Group in MyLibrary

1. Knowledge contexts categorized– Keywords & keyword semantic proximity– Citations and citation proximity– Semantic proximity– Traversal proximity

2. Recommendation(s) calculated3. Traversal proximity analyzed4. Adaptation in system

Users + Profiles = learning community

Adaptation of Structure and Semantics –- Using Collective Behavior of Users

LANL Active Recommendation System

LANL Active Recommendation System

Finding the Balance Point

Community specific tools Encourage/support trans- disciplinary research

Small teams Deployable across Lab or multiple institutions

New technologies, new tools Legacy data & systems

Knowledge is represented by articles, books, etc.

Knowledge characterized by relationships among objects, documents & resources

Hub/spoke model for DL’s: balance resources and focused efforts

Known path, existing infrastructure (people, buildings) institutional pride

• is nonalgorithmic (path cannot be fully specified in advance)

• tends to be complex (total path not visible from one vantage point)

• often yields multiple solutions (each with costs/benefits rather than unique solutions)

• involves nuanced judgment and interpretation

• involves the application of multiple criteria (which sometimes conflict with one another)

• often involves uncertainty (not everything bearing on the task is known)

• involves self-regulation of the thinking process (someone else does not ‘call the plays’ at every step)

• involves imposing meaning, finding structure in apparent disorder

• is effortful. (considerable mental work involved in the kinds of elaborations and judgments required)

*Resnick (’87)

Higher Order Thinking* …

Visualization• Scientific visualization – use of interactive visual

representation of scientific data, typically physically based to amplify cognition

• Information visualization – use of interactive visual representations of abstract, nonphysically based data to amplify cognition

Successes

Culture of measurement – long term focus on user driven requirements and corresponding satisfaction levels

Open Archives Initiative – small, quick, right players Eprint arXiv – communities of common interest,

timeliness, passionate people, didn’t take a lot of $$ OpenURL – small, quick, right players, passionate

people, (standards efforts too long) MyLibrary – personalized, adhoc collaboration

? Recommendation systems with shared knowledge models – uses available logs, complex, privacy concerns

Challenges

IP, copyright limitations Post 9/11 pressure to close government access Integrating formal and informal systems – need

new mechanisms for peer review and rewards

Archiving – not glamorous but a research problem

Problem space is larger than NSF domain –– Requires cross organizational collaboration (DOE, NIH,

etc.) and international connections