the digital catch: an integrative role for iamslic in the worlds of metadata, harvesters and...
TRANSCRIPT
The Digital Catch:an integrative role for IAMSLIC in the worlds of
metadata, harvesters and repositories
1: Species Diversity Pauline Simpson, National Oceanography Centre, Southampton, UK
2: Aquatic CommonsStephanie Haas, Digital Library Centre, University of Florida,
Gainsville USA
Information for Responsible Fisheries: Libraries as Mediators
10 - 14 October 2005
Rome, Italy
IAMSLIC Rome 2005
The Digital Catch:an integrative role for IAMSLIC in the worlds of
metadata, harvesters and repositories
Part 1: Species Diversity
Pauline SimpsonNational Oceanography Centre, Southampton, UK
IAMSLIC Rome 2005
National Oceanography Centre, Southampton
NOC is one of the world’s leading centres for research and education in marine and earth sciences, for the development of marine technology and for the provision of large scale infrastructure and support for the marine research community
University of SouthamptonResearch-led multidisciplinary university:20,000 students5000 staff (3000 researchers)
IAMSLIC Rome 2005
Open Access to Research – Southampton a key player
• 27 Jun 1994 Stevan Harnad’s ‘Subversive Proposal’ leading to the open access vision for scholarly material. In an ideal world of scholarly communication – all research should be freely available
• School of Electronic and Computer Science developed the first repository software EPrints.Org to support the vision and implemented their own Departmental Repository
• Work on Citation analysis based on repository content, research reporting etc
• National Oceanography Centre Southampton – early adopterand Project Manager established University of Southampton Research Repository. First in UK to move from project to University funded service
…….. Fertile open access and repository background
–
IAMSLIC Rome 2005
Sharing at IAMSLIC Conferences
2002 e-Prints and the Open Archive Initiative - opportunities for libraries, presented by
Pauline Simpson in, Bridging the Digital Divide, 28th Annual IAMSLIC Conference, 6-11 Oct 2002, Mazatlan, Sinaloa, Mexico
2003 Institutional Repositories :an opportunity for IAMSLIC, presented by Pauline
Simpson in, , Navigating the Shoals : evolving User Services in Aquatic and Marine Science Libraries, 29th Annual IAMSLIC Conference, 5–9 Oct 2003, Mystic, Connecticut USA
2004 The Culture, Care and Content of Institutional Repositories, presented by Pauline
Simpson (contribution to Preserving our Institutional Intellectual Property: Panel Discussion) in, Voyages of Discovery, parting the seas of information technology, 30th Annual IAMSLIC Conference, 6-9 Sep 2004 , Hobart, Tasmania,
IAMSLIC Rome 2005
Two Calls
1. Encourage IAMSLIC members to implement
Institutional Repositories within their organizations to contribute to the provision of global Open Access to aquatic and marine science research
IAMSLIC Rome 2005
Repository Development
ArXiv (from 1991 at Los Alamos now at Cornell) for high energy physics community (incl Atmospheric and Oceanic Physics, Math, Computing Science and Nonlinear Science).
Despite success of ArXiv and others - RePEc (Economics), Cogprints (Cognitive Psychology), Mathematics, etc – varying success by other subject communities (Chemistry Preprints Server now finished)
2000 onwards complementary implementation of Institutional repositories fuelled by project funding eg Mellon Foundation, Howard Hughes, JISC UK, Open Society Institute and powered by the Information Community
IAMSLIC Rome 2005
Why it should be Institutional Repositories
Institutions are logical implementers of repositories – Centralise a distributed activity– Framework and Infrastructure– Permanence that can sustain changes– Stewardship of Digital assets– Preservation policy– Provide central digital showcase for the research, teaching and scholarship of the institution
Subject or project repositories often linked to an individual or a group – can be transitory - collection at risk
IAMSLIC Rome 2005
Repositories are spreading because …
• Supplementary to traditional publication
• Do not affect current research publication processes
• Give easy and rapid access
• Give long-term access
• Increase readership and use of material – more citations
• They offer advantages to institutions
• They offer advantages to research funders
• They offer new ways for information to be linked and used
IAMSLIC Rome 2005
Increasing number of Repositories
• 2002 = 112 (TARDis Subject Categorization Survey)
• 2005 = 466 (from Institutional Archives Registry)
IAMSLIC Rome 2005
http://archives.eprints.org/
IAMSLIC Rome 2005
Truly global movement
IAMSLIC Rome 2005
http://www.opendoar.org/
Main Partners:
Lund University
Univ of Nottingham)
IAMSLIC Rome 2005
Increasing numbers – Repository choices
• Subject - arXiv, Cogprints, RePEC,
• Institutional – Southampton, Glasgow, Nottingham, MBA UK, WHOI • National - DARE (all universities in the Netherlands), Scotland,
• National / Subject - ODINPubAfrica
• International - Internet Archive ‘Universal’, OAIster
• Regional - White Rose UK
• Consortia - SHERPA-LEAP (London E-prints Access Project)• Funding Agency – NIH (PubMed), Wellcome Trust (UK PubMed),
NERC
• Project - Public Knowledge Project EPrint Archive• Conference - 11th Joint Symposium on Neural Computation, May 15
2004 • Personal – peer to peer, web pages etc• Media Type - VCILT Learning Objects Repository, NTDL (Theses)• Publisher – journal archives• Data Repositories/Archives - NODC, BODC, DOD, JODC, BADC
etc
IAMSLIC Rome 2005
Dilemma for Researcher
• Mandates from major funding agencies now require grantees to deposit research output in a ‘designated repository’ or ‘any’
• Where should the full text of their research be deposited?
• Researcher wants to enter metadata and deposit only once
• Situation at present– Duplicate keying metadata into repositories of choice – Harvesting, but harvester is not the choice of the depositor– Cannot target multiple repositories with one exercise
• Does it matter where it is deposited since Google Scholar, Yahoo, Scopus , will pick it up wherever it is deposited?
IAMSLIC Rome 2005
The Cavalry - Building on Repository Diversity
• Contributing to The Knowledge Cycle
Encompassing experimentation, analysis, publication, research, learning
– Joined up research – a hub linking text and data
– An audit trail from whatever point of access
IAMSLIC Rome 2005
Learning & Teaching workflows
Research & e-Science workflows
Aggregator services
Repositories : institutional, e-prints, subject, data, learning objects
Institutional presentation services: portals, Learning Management Systems, u/g, p/g courses, modules
Harvestingmetadata
Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media
Resource discovery, linking, embedding
Deposit / self-archiving
Peer-reviewed publications: journals, conference proceedings
Publication
Validation
Data analysis, transformation, mining, modelling
Resource discovery, linking, embedding
Deposit / self-archiving
Learning object creation, re-use
Searching , harvesting, embedding
Quality assurance bodies
Validation
Presentation services: subject, media-specific, data, commercial portals
Resource discovery, linking, embedding
From: Lyon : CNI - JISC - SURF Conference, May 2005
IAMSLIC Rome 2005
Knowledge Cycle – linking IRs and Data
Grid
E-Scientists
Entire E-Science CycleEncompassing experimentation, analysis, publication, research, learning
5
Institutional Archive
LocalWebPublisher
Holdings
Digital Library
E-ScientistsGraduate Students
Undergraduate Students
Virtual Learning Environment
E-Experimentation
E-Scientists
Technical Reports
Reprints
Peer-Reviewed Journal &
Conference Papers
Preprints & Metadata
Certified Experimental
Results & Analyses
Data, Metadata & Ontologies
IAMSLIC Rome 2005
CLADDIER Project(Citation, Location And Deposition in Discipline and
Institutional Repositories)
• The CLADDIER system will be a step on the road to a situation where (in this case, environmental) scientists will to be able to move seamlessly from information discovery (location), through acquisition to deposition of new material, with all the digital objects correctly identified and cited. The lessons learned will be of applicability for the relationships between other discipline based repositories and institutional repositories.
IAMSLIC Rome 2005
Persistent identifiers
Data Citations
Automated Linking (png)
IAMSLIC Rome 2005
A plus for researchers
• One outcome of CLADDIER Project• Present OAI-PMH Harvesting = ‘pull’ • CLADDIER outcome = ‘push’ • Enable researcher to deposit in one repository and
choose to upload (push) the metadata to another repository of choice
• Institutional Repository to Subject Repository/s of choice– redundancy of records does it matter?
IAMSLIC Rome 2005
Two Calls
2. IAMSLIC to host a repository for those who did not have the support to set up their own
and
a harvester service for aquatic and marine science
providing discovery and location through a one search interface
………………………….
IAMSLIC Rome 2005
IAMSLIC Aquatic and Marine Science Repository and Harvester - early conceptMarine Science Institutional e-Print repositories
IAMSLIC Marine Science e-Print Service
Harvester (General)
Regional e-Print Repository
Odin PubAfrica
ArXiv (Atmos & Oceanic Physics)
User Search
OAI-PMH
Depositor
IAMSLIC Rome 2005
Open repositories IAMSLIC as service provider.
Author
Content Aquatic Commons
Reader
Institutional Repositories
Disciplinary Repositories(incl IAMSLIC)
Peer-to-peerRepositories
Inte
rop
erab
ilit
y S
tan
dar
ds
Repositories of every flavour
Repository( value added services)
Linking
(Z39.50 Library)
Multimedia
OAI-PMH
IAMSLIC Rome 2005
The Digital Catch:an integrative role for IAMSLIC in the worlds of
metadata, harvesters and repositories
Part 2: Aquatic Commons
Stephanie C. HaasDigital Library Center,
University of Florida Libraries, Gainesville
IAMSLIC Rome 2005
Aquatic Commons is a model for digital resource sharing between stakeholders in the marine/aquatic information world. Its integrative architecture accommodates researchers and research institutions at all technological levels.
The model includes repositories, harvesting functions, searchable database creation, and integration with IAMSLIC’s Z39.50 distributed library and the ASFA database.
IAMSLIC Rome 2005
Special thanks is extended to the Florida Center for Library Automation (FCLA) for providing technical expertise, computer hardware/software, and the programming to develop a proof-of-concept model.
IAMSLIC Rome 2005
AQUATIC COMMONS : the purpose
Aquatic Commons is being developed to:
1) Create a central metadata and digital document reservoir related to marine and aquatic scienceinformation worldwide.
2) Support IAMSLIC’s long term goal of helping researchers and the public freely access needed information.
3) Integrate the efforts of the total community by harvesting metadata where available and by creating repository and harvesting opportunities where needed.
IAMSLIC Rome 2005
Identified stakeholders in the development of the Aquatic Commons
1) Researchers and research institutions in the marine and aquatic sciences2) UN, International, and National ASFA partners3) CSA4) FAO ASFA Secretariat5) Other marine research agencies such as IOC, NOAA, etc.6) IAMSLIC and its affiliated regional groups7) Florida Center for Library Automation (FCLA)
IAMSLIC Rome 2005
Aquatic Commons architecture consists of an integrated Open Archive Initiative (OAI)* System that includes:
a harvester, an OAI provider, a search interface, a database, and a zebra Z39.50 server.
At production level, the system will be based on Open Access software and scalable to accommodate new repositories coming online.
IAMSLIC Rome 2005
IAMSLIC Rome 2005
Overview of Components
Aquatic Commons is designed as an OAI integrated system that will functionally:
Harvest and create a searchable database of OAI compliant metadata from extant repositories or OAI static repositories including the Aquatic repository developed as part of this model, and in turn
Serve OAI complaint metadata to other services.
It will also create:
An Aquatic eprint Repository to house digital works and metadata created by researchers or institutions that don’t have stable IT support.
IAMSLIC Rome 2005
and
A zebra Z39.50 server that will interface with the IAMSLIC Z39.50 distributed library.
OPTIONAL FUNCTIONALITY:
Digital archiving at the FCLA Digital Archives of publications submitted to the Aquatic e-print Repository server.
Metadata with links to documents can be harvested from the Aquatic Repository by CSA for inclusion in ASFA.
IAMSLIC Rome 2005
Aquatic Commons architecture responsibilities:
Harvest and create a searchable database of subject relevant OAI compliant metadata including a sample from the Aquatic eprint Repository
Currently in test
FCLA has harvested and made searchable metadata from six collections including the Aquatic eprint Repository developed as part of this model.
FCLA has implemented a functional OAI static repository gateway to harvest metadata from OAI static repositories.
IAMSLIC Rome 2005
FCLA is harvesting the following sites:
Aquatic eprints RepositoryBaltic Marine Environment Bibliography
1970-W. M. Keck Laboratory of Hydraulics and
Water Resources Technical ReportsOregon Institute of Marine BiologyWoods Hole Oceanographic InstitutionODINPubAFRICA
IAMSLIC Rome 2005
Database
IAMSLIC Rome 2005
Database
IAMSLIC Rome 2005
IAMSLIC Rome 2005
IAMSLIC Rome 2005
IAMSLIC Rome 2005
OAI Static Repository OAI static repository records are records wrapped in XML
and served as a Web page at a persistent URL. They contain header information and metadata information.
Currently in testFCLA is using a static gateway to broker records from an XML web page created at the University of Florida.
Sample of one record
<oai:record>
<oai:header>
<oai:identifier>oai:www.uflib.ufl.edu/digital/temporary/IAMSLIC.xml/00002</oai:identifier>
<oai:datestamp>2005-06-03</oai:datestamp>
</oai:header>
IAMSLIC Rome 2005
<oai:metadata> <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"><dc:creator>Brown, Mark T.</dc:creator><dc:title>Successional development of forested wetlands on reclaimed phophate mined lands in Florida: Final report volume I </dc:title> <dc:subject>Phosphate mines and mining; Florida; wetlands
</dc:subject><dc:description>Prepared for Florida Institute of Phosphate Research, 1855 West Main Street, Bartow, Florida 33830 USA, Contact Manager: Steven G. Richardson, FIPR Project Numbers: 95-03- 117R and 98-03-131. </dc:description> <dc:description>Howard T. Odum Center for Wetlands.</dc:description> <dc:date>2002</dc:date><dc:identifier>http://purl.fcla.edu/fcla/tc/feol/UF00015102.pdf</dc:identifier></oai_dc:dc> </oai:metadata> </oai:record>
IAMSLIC Rome 2005
Aquatic eprint Repository
Currently in test
Using open source eprint software from the University of Southampton, FCLA has set up a testbed for creating metadata and submitting documents by researchers and/or institutions without access to IT support.
IAMSLIC Rome 2005
IAMSLIC Rome 2005
IAMSLIC Rome 2005
IAMSLIC Rome 2005
IAMSLIC Rome 2005
IAMSLIC Rome 2005
Zebra Z39.50 server interfaces with IAMSLIC’s Z39.50 Library Gateway
FCLA and IAMSLIC inputs:
Steve Watkins will be working with FCLA to develop this functionality. Searches initiated in the IAMSLIC Library Gateway will be searching the Aquatic Commons database as well.
IAMSLIC Rome 2005
SET UP COST ESTIMATES(year 1)
Hardware / network
Server, dual cpu, 4GB memory, 156GB internal disk $ 5,000
Tape cartridge for backup $ 200SoftwareRed Hat Linux (OS) $ 50Tivoli (backup server) $ 50Tripwire (security) $ 300
Staff Development and setup (320 hours) $ 4,800
Total one-time costs $ 10,400
IAMSLIC Rome 2005
ANNUAL ONGOING COST ESTIMATES(starting year 2)
Hardware / networkServer maintenance $ 500Network cost $ 86SoftwareRed Hat Linux (OS) $ 50Tivoli (backup server) $ 50Tripwire (security)
$ 165Staff
Ongoing maintenance and support (20 hrs/mo) $ 3,600
Total annual ongoing costs $ 4,451
IAMSLIC Rome 2005
OPTIONS: DIGITAL ARCHIVING at FCLA of publications submitted to the Aquatic eprint Repository server.
FCLA has created one of the first “true” digital archives in the U.S.
The FCLA Digital Archive may be found athttp://www.fcla.edu/digitalArchive/index.htm
IAMSLIC Rome 2005
IAMSLIC Rome 2005
Third party service
The University of Florida Libraries has the opportunity to develop collaborative agreements that extend digital archiving services to third parties.
Formal agreements would be negotiated should this service be desired.
IAMSLIC Rome 2005
OPTION: If FAO and CSA become collaborators on the Aquatic Commons, metadata from the
Aquatic e-print Repository could be harvested for inclusion in ASFA
CSA inputs:
Enhancing metadata to meet ASFA record standards
IAMSLIC Rome 2005
If we accept the premise that most research papers are composed in an electronic environment then they can be shared. Even ancient formats such as WordPerfect can be converted and served as PDF files. If Internet access is unstable, files can be submitted on disk, CD, or DVD.
Capture at creation is the most efficient means of sharing knowledge and assuring archival fidelity.
IAMSLIC Rome 2005
There are many details to be worked out including:
1) Assurances that digital content is available for open access without copyright infringement,
2) Defining relationships between participating organizations,
3) The technical aspects of the Aquatic eprint Repository including the handling of multi-languagerecords and digital documents, and
4) Formal agreements with FCLA for its technicalsupport of this initiative.
IAMSLIC Rome 2005
In collaboration with others, IAMSLIC has the opportunity to create Aquatics Commons as an essential digital resource for those involved in all aspects of research and resource management.