the driver initiative for networking repositories
DESCRIPTION
The DRIVER initiative for networking repositories. Wolfram Horstmann Universität Bielefeld. DRIVER motivation. Scholarly communication changes towards distributed provision of text, data and services Repositories are thought as a saviour in this development building such a distributed system - PowerPoint PPT PresentationTRANSCRIPT
DRIVER motivation
Scholarly communication changes towards distributed provision of text, data and services
Repositories are thought as a saviour in this development building such a distributed system
An infrastructure supporting distributed repositories and services is needed
(and reactions)
Question today
Is an overarching infrastructure bridging between distributed text-data and primary/secondary data possible?
DRIVER has adressed many problems and found many answers in the domain of distributed text repositories
But we don‘t know yet, whether or not these are transferable to the data domain
Some observations on data
Data landscape very diverseFormats differ widely – unlike text publications
Descriptions are often highly subject-specific
Some have special provenance (e.g. vendor software)
Some require special rendering, education, caution …
Data require disciplinary supportBetter managed by researchers than service providers
Still, data interoperability acknowledgedDouble effort: many data are lost to re-use/remix
„Good practice“ in research, also WRT publications Transparency, „Falsifiability“, testability …
Some observations on repositories
They represent a shift towards …open internet-exposure as opposed to closed database (‚graveyards‘)
content orientation as opposed to mere technical orientation (‚web-servers‘)
distributed systems centralized structures not immediateley required nowadays
„Everybody can be a publisher“Common description standards e.g. Dublin Core Metadata Initiative Many subject-specific standards
Common transfer protocols e.g. OAI-PMH, but also FTP, XML-RPC, WS, etc.
Searchability is possible!
Still: many data are lost to re-use/remixClosed: too sensible, weakly described, unimportant (???)
Missing service frameworks / infrastructures
Problems: Data and service interoperability
Solution: „Infrastructure“
Repositories can solve access problem
What infrastructures are: DRIVER terms
Not an infrastructureSingle repository
Single application for search and retrieval (e.g. BASE) Only local operation Backwards causation on repositories is missing
Maybe an infrastructureDistributed repository landscape as a whole As a capacity for emergent properties, e.g. quality and quantity
incentive for data population Nurturing development of service providers
Definitely an infrastructureMany service providers in one organisational and technical context (e.g. run-time environment)
Enabling re-use and remix of data and services
DRIVER Objectives
Organisational structure for repositoriese.g. the „Confederation“
Improving quality and standards in local rep.e.g. validation procedures
Building a distributed runtime systeme.g. service and data sharing
Target GroupsRepository Managers
Service Providers
Information System Executives
The DRIVER approach is incremental
Start with publication metadataExisting distributed system, somehow connected
Considerable homogeneity and formats: OAI-PMH
Extend geographical coverageFrom 5 countries, to 10, to 27, to ???
Extend towards other contentsFrom publication metadata to enhanced publications, i.e. representations of „texts + data“
Learn about subject specificityData bring in disciplinary requirements
1010
The DRIVER Initiative
DRIVER-I 6/2006 – 11/2007
Organisational Models and Technical Test-Bed
DRIVER-II 12/2007 – 11/2009
Running Organisation and Production Infrastructure
DRIVER-Confederation 2010ff
Operations Office and Technical Deployment
NB: DRIVER is not an authoritative body, it is a liberal
bottom-up initiative of stakeholders
DRIVER partners and related projects
Networking, Support, Policy, StudiesGöttingen, Nottingham, SURF, Genth, Ljubiljana, Minho, Copenhagen
Technical development and deploymentAthens, Bielefeld, Pisa, Warsaw
Partners make links to many other thingsOA-services: Sherpa-ROMEO, OpenDOAR, BASE…
Projects: Europeana, PEER, DELOS, DL.org, D4Science, PARSE-Insight, NESTOR…
Orgs: DINI, JISC, LIBER, SPARC, KE …
Platforms: DSPACE/FEDORA/OPUS/ePrints
Some Results: Guidelines
Build on knowledge from past & current IR projects (EU)26 actively involved contributors (experts and repository managers) from 8 countries.Practical answers on how to:
Improve full-text access Standardize metadata qualityCreate a reliable infrastructure for permanent identification, resolution, traceability and storageResolve semantic and classification issues
Some Results: Service-Oriented-Arch.
9 hosting nodes
25+ Functionality typologies(services)
36serviceInstances
3 applications: DRIVER Main, Belgium, Spain-Recolecta
2020
Some Results: Runtime-System & Hosting
Enabling Layer
Data Layer
EU Open AccessRepositories
Functionality Layer
Ad
min
istr
ato
rsE
nd
use
rs
Advanced User InterfacesNational portals
Project Applications
22
Current Work: DRIVER-II
NetworkingConfederation with who-is-who advisory board
Outreach: LIBER, SPARC, US, JAPAN etc…
ConsolidationDRIVER-I Services packaged and performing in production quality
EnhancementDRIVER-I Services Improved indexing and data aggregation functionalities
DRIVER-II Services Enhanced publication management and functionality
Lessons learnt
Distributed data infrastructure requires links between organisational and technical concepts
Data specialists, computer scientists, service providers
Guidelines / content policies as a „glue“
In distributed data provision, quality and access measures are the most ‚expensive‘ tasks
Distributed service operation (not data provision) can be solved but asks novel questions (SLAs)
„Infrastructure“ is a very tough concept to get across and eventually forms a complex system
Simplification makes it weaker, e.g. re-use is restricted
Summary
DRIVER tackles the data infrastructure challenge from the text-repository side (mostly OAI-PMH)DRIVER handshakes with primary & secondary data through „enhanced publications“DRIVER isn‘t only a project but a forum for information specialists‚Products‘ include: Studies, Infrastructure run-time-system in production, software, support …DRIVER has adressed many problems for data and service interoperability and found solutions
What are the required steps to support data?