1 we must all be curators now from ingest to service delivery, in data library & national data...
TRANSCRIPT
1
We Must All Be Curators Now
from Ingest to Service Delivery, in Data Library & National Data Centre
Peter Burnhill
Director, EDINA
JISC National Data Centre, University of Edinburgh, Scotland UK
10 October 2006
Roles & Responsibilities
2
Three different voices / roles
1. Director, EDINA National Data Centre– serving researchers, lecturers and students across the UK
* so something about what EDINA is & what EDINA does
– EDINA is funded by the JISC* so something about the JISC & the JISC IE
2. A time-served data person & fellow professional, from the University of Edinburgh– building on the past, planning for the future
3. A substitute for another guy … – trying to make sense of what is going on– working towards shared understanding– proposing a framework of verbs & nouns
Joint Information Systems Committee (JISC) …
… of all the UK funding councils for higher and further education
Mission:
“world-class leadership in the innovative use of ICT for support of education & research”
Information Communication Technology
Income mix of ‘top-slice’ recurrent funding + capital grants
4
Funding Councils, the JISC and EDINA
UK National Data Centres
Higher Ed funding councils
Further Ed funding bodies (Learning & Skills
Council)
Research Councils as ‘Partners’
NDCs are now HEFCE-related bodies
organisational infrastructure for JISC Services• UKERNA – runs Joint Academic Network (JANET)
• EDINA & MIMAS – national data centres
+
• Arts & Humanities Data Service (AHDS)
• Economic and Social Data Service (ESDS)
+
• UKOLN; Centre for Educational Technology Interoperability Standards (CETIS); Digital Curation Centre (DCC); British Universities Film & Video Council (BUFVC); Technical Advisory Service on Images (TASI); Open Source Advisory Service; Nat. Centre for Text Mining; Plagiarism Advisory Service
• JISC Legal/Monitoring/…TechDis ; Regional Support Centres; UK Access Management / Athens
* most located in universities across UK *
6
What is EDINA? • A National Data Centre, designated by the JISC in 1995/96
– based on Edinburgh University Data Library, est. 1983/84
Mission to enhance productivity of research, learning & teaching in UK higher and further education
• part of JISC Information Environment– Keywords have been Accessibility/Outreach/Inter-working/Inter-operability …
• range of development projects and 24/7 services– Geo-spatial, about which more later ..
– Scholarly communication & Multimedia * films & images; spoken word
– Infrastructure for Digital Library* certificates; rights; middleware
* SDSS -> UK Access Management Federation
• And the name, what’s that stand for?– Edinburgh Data Information Access– ‘Edina’ is the poetic name for Edinburgh …
7
Delivering online services, 24/7 …
http://edina.ac.uk
http://edina.ac.uk/
Biog: as data person these past 25+ years …
• Moved to the University of Edinburgh in 1979 – formerly science staff at Social Science Research Council (ESRC), 1974/77 – then medical statistician at Queen Charlotte’s Maternity Hospital, 1978/79
• first as statistician & researcher (& senior lecturer)– with Scottish Education Data Archive, from 1979
* making survey data at Govt-funded research centre (CES)– from design, data creation and documentation, onto analysis
* as survey methodologist in Edinburgh Survey Methodology Group
• then recruited to do R&D for service delivery– setting up & managing Edinburgh University Data Library, 1984 -– Co-director, ESRC Regional Research Laboratory, Scotland 1986/90
* early days of Geographical Information Systems (GIS)* member of Data Task Force, Inter-Agency Global Env. Change
– European Secretary (1993/95); President (1996/2001) of IASSIST* international assoc. for (social science) data librarians and archivists
• Now EDINA & IS Directorate at Univ. of EdinburghWas Set-up Director for Digital Curation Centre, 2003/4 to 2004/5
10
• Scottish Education Data Archive, late 1970s – mid ‘80s– Database of surveys of school leavers & cohorts of young people (16-19)
* derived data, trend datasets over time, changing classifiers (eg Social Class)– integrating data from different sources, eg census ‘small area’ statistics– made available online but under ‘privileged’ not ‘open’ access’
• Edinburgh University Data Library, mid- ‘80s & on– Wider variety of datasets, obtained from others, often via others
* A ‘local’ library of datasets* Easing access to data held elsewhere (eg UKDA)
– made available online across ERCC wide area network and beyond* building databases, sometimes with special software,
• ESRC Regional Research Laboratory, Scotland 1986/90– early days of Geographical Information Systems (GIS)– Integrating ‘large-scale’ data, much geographic or geo-spatial
• EDINA national data centre, mid-1990s & on– National online access to wider range of reference and source data
* obtained under licence– required value-added ‘curation’
* Digimap as but one example
… maybe I’ve been a ‘data curator’ all along
one example of ‘data curation’
OS digital data
Software + application of cartographic skill/rules
Value added component
11000152100913Playing Field 0901103 120001016400000%2100000010001004040097130 0%15000155 0321 0901103 0000000%2100000010001055810075820 0%15000156 0321 0901103 0000000%2100000010001057130076690 0%15000157 0321 0901103 0000000%2100000010001060110075460 0%15000158 0321 0901103 0000000%2100000010001063260074650 0%15000159 0321 8010619 0000000%2100000010001063370071760 0%15000160 0321 0901103 0000000%2100000010001066730076700 0%15000161 0321 0901103 0000000%2100000010001058910068550 0%15000162 0321 0901103 0000000%2100000010001064490069040 0%15000164 0321 0901103 0000000%2100000010001055710052730 0%15000173 0321 0901103 0000000%2100000010001058730050390 0%15000174 0321 0901103 0000000%2100000010001059520050430 0%15000175 0321 0901103 0000000%2100000010001056430049210 0%15000176 0321 0901103 0000000%
Software + default rules
12
• Scottish Education Data Archive, late 1970s – mid ‘80s– Database of surveys of school leavers & cohorts of young people (16-19)
* derived data, trend datasets over time, changing classifiers (eg Social Class)– integrating data from different sources, eg census ‘small area’ statistics– made available online but under ‘privileged’ not ‘open’ access’
• Edinburgh University Data Library, mid- ‘80s & on– Wider variety of datasets, obtained from others, often via others
* A ‘local’ library of datasets* Easing access to data held elsewhere (eg UKDA)
• ESRC Regional Research Laboratory, Scotland 1986/90– early days of Geographical Information Systems (GIS)– Integrating ‘large-scale’ data, much geographic or geo-spatial
• EDINA national data centre, mid-1990s & on– National online access to wider range of reference and source data
* obtained under licence– required value-added ‘curation’
* Digimap as but one example– national repositories of digital content: Jorum, GRADE, TheDepot
• Digital Curation Centre, 2004 & 2005 – strategic role: ‘data curation’ & ‘digital preservation’– even wider range of databases (e-science), held by others
… maybe I’ve been a ‘data curator’ all along
Data Provider
e.g. Ordnance Survey
end user(staff/student)
access
HE & FE funding councils
Institution(Licence)
£
££
£
Licensing Agent(JISC
Collections)
Value-added Service Provider
Authorising Institutions for free-at-point of use
Key role for Authentication (is-member of Institution) and Authorisation (is-licensed Institution)
14
EDINA as national data centre
• http://edina.ac.uk
• 50% direct funding from JISC for delivering services– Good reputation for helpdesk, user interfaces, FAQs etc
– 24/7, 99% uptime
• 50% is extra awarded for Development activity
– Developing services; developing JISC IE; working with Researchers
– Acknowledged project competence for R&D
• Strategic role as Geographic Data Centre
– For JISC (Digimap etc), for ESRC (UKBORDERS)
– Building Spatial Data Infrastructure with NERC and internationally (OGC)
Existing Geo-data Services
16
Where are we with GIS?• University of Edinburgh & its Data Library have long run interest &
experience– Geography Department (Coppock/Hotson; Waugh/GIMMS) & PLU
first MSc GIS course, and much else
– ESRC Regional Research Laboratory for Scotland, 1987-– Launch of UKBORDERS in 1994
• EDINA has continued and extended that for geo-spatial data– JISC eLib project: access to Ordnance Survey mapping, 1996- – Launch of Digimap service, 2000 -– Extension of UKBORDERS, 2001 -
• ‘Shared Services’ provisionGo-Geo! (geo-data portal) geoXwalkGRADE – Geospatial Repositor for Academic Deposit and Extraction
• Not all (only a fraction) of geo-referenced data at EDINA• Strategic importance of interoperability
– GI web services
• Interested in furthering the use of GI data across disciplines– Geo-parsing & mark-up; geo-finding; geoXwalk (vocabularies)
17
Disciplinary data-centres
* Something’s special about the spatial *
EDINA role as Geographic Data Centre?
Slide ‘borrowed’ from Liz Lyon, & curated ..
2. Getting back to Problem Statement
‘roles & responsibilities’Some Thoughts, and Questions…
• What resources, and how should we share?– What are ‘scholarly resources’?
• What is special about scholarship?
• What is different about digital?
• Who should do what?– A division of labour that leverages
* ‘responsibility’ and ‘expertise’ for curation* Means of service delivery
I. Find our place – in old and new geography• ‘words, numbers, pictures, sounds
all to be digital & accessed from afar’
19
Scholarship: Services and Stewardship
• Services, in support of scholarship, – Libraries have traditionally focussed on the formal part of
scholarly communication– Relevance: searching strategies– new challenges: how to cope with digital everything?
• Stewardship– Was ‘Special Collections’, now ‘Collections, inc. the digital’– Ensuring provenance & continuing access
* Digital curation, preservation & archiving* Sharing with future scholarship* Sharing with wider world
• Research– What do researchers do, and what do they want/need?– eScience, Data, and ‘scholar workstation’ and the VRE
• Learning and Teaching– What do students need?– What do teachers/lecturers need?– e-learning and the VLE (virtual learning environment)
20
Infrastructure to support four ‘demand-side’ verbs
discover information object of intereste.g. article referenced in database, A&I, eToC, etc
locate organisation offering service e.g. library (union catalogue/OPAC)
or document delivery service
request use of servicevia payment of money or privilege of membership
access object of interestvia personal visit, document delivery, online access
based on MODELS workshops (UKOLN/JISC eLib)
21
Simplified workflow
Discover
Locate
Access
Use
‘Publish*’
Fit for purpose?
Curate
*Issue
22
Dataset publishing
• Re examine concept of Dataset Publishing (Callahan, Johnson, and Shelley 1996)
– analogous to publishing papers– rewards for publishing datasets (e.g. promotion, RAE)– procedures (e.g. standards to use, peer review) & resources to
manage procedures* Should minimise time and effort required
– need tools to assist in creation, maintenance and dissemination of dataset descriptions
• Means of ‘putting’ into a public/community– Deposit and Share are too cosy– to ‘publicate, to issue
• Terms of access and use– Open? – Privilege of membership– Payment of money
Repositories of digital content
• So what is a digital repository?– I like (user) verbs, not (supply-side) nouns …
• A repository is a noun that meets a set of (user) verbs/tasks, by supporting delivery of [services] for a given/designated client community:
– Put [ingest service]– Keep-safe [storage service]– Get [access service]
Motivation:
• for the record? preservation; prospect of access
• for re-use? curation; current access • Can we say, “Behind every great service, there is a wonderful
managed repository”?
No, not if access service does not have corresponding ingest service.
Repositories & OAIS Reference Model
?? In a classic Repository, the DIP is the same as the SIP ??
In a data centre, and many data libraries, it rarely is.
4-1
.2
MANAGEMENT
Ingest
Data Management
SIP
AIPDIP
queries
result setsAccess
PRODUCER
CONSUMER
Descriptive Info
AIP
orders
Descriptive Info
Archival Storage
Administration
Preservation Planning
25
Support for Research & research-led learning• Data, software and facilities
– Data as ‘evidence’– Data curation and digital preservation: continuing access
• Data Archives and Data Libraries– Social surveys, and much more – IASSIST
* International Association for data professionals (1972 -)* Members in Philippines and Vietnam
• Census Programme– Small area statistics [MIMAS]– UKBORDERS (boundaries for thematic mapping) [EDINA]
• EDINA Digimap Collection– Topographic mapping data, from national mapping agency– Marine & Geological mapping data
• then there is the challenge of scientific visualisation, and observational images and documentary films!
26
Scholarly Communication
1. Access to commercial services & resources– Consortium licensing– ‘local’ hosting licensed data at National Data Centres (NDCs)
2. Focus on community-generated resources– Union catalogues (& links to ILL/docdel) - SUNCAT– digital library developments– Open Access repositories
* “Put it in The Depot” (www.depot.ac.uk)
3. Need for Access Control as Middleware development– Shibboleth framework, developed as part of Internet2
* UK Access Management Federation for Education & Research* Managed by UKERNA, based on work by EDINA SDSS
– replacing vendor’s UserID & password with community scheme
Scholarly Communication
Author
Reader
writes to be recognised by peer community &
for institutional Research Assessment Excersise (RAE) purposes
… perhaps to be read
Key User (Reader) Verbs:
Discover article of interestLocate service on those articlesRequest permission to use serviceAccess to service/article
(content of) article is the ‘information
object of desire’
Author(article)
Reader(article)
Publisherarticle serial
issue
Library(serial)
Licence
Scholarly Communication(simple model: focus on article–length work published in journals)
Libraries and Publishers provide framework …
the traditional ‘middleware’/infrastructure’
... with Licence(s) for electronic (online) and print (on-shelf)
£
P.Burnhill, EDINA/JISC, 2005
Author(article)
Reader(article)
Publisherarticle serial
issue
Library(serial)
Licence
Scholarly Communication & Open Access(Access to article–length work)
peer review
peer exchange
Informal: ‘invisible college’ and the ‘gift economy’
Institutional arrangement
Licensed Online Access
Forma£
economy
ILL/docdel
repositories
‘Open Access’‘Digital Preservation’
free2web access
E-prints££
learned
society
Research Data
Creator
Researcher
Generates (curates) data for own purpose, or as part of team
… wants/has to ‘put’ it somewhere for use by others
(perhaps to be recognised by a peer community)
Key User (Researcher) Verbs:
Discover data of interestLocate service on that data with documentation on provenance etc
Request permission to use serviceAccess to service/data,
Evidential value of data in analysis as
object of desire’
Creator(dataset)
Researcher(data)
Data Centre(database)
(Data) Library
Licence
Data (simple model)
who provides framework? … the ‘middleware’/infrastructure’
... with what kind of Licence(s) for access?
£ ??
P.Burnhill, EDINA/JISC, 2006
Creator(dataset)
Researcher
Institution
Licence
Doing Data
peer review
peer exchange
Informal: ‘invisible college’ and the ‘gift economy’
Institutional arrangement
Authorised Online Access
Forma£
economy
repositories
‘Open Access’‘Digital Preservation’
free2web access
datasets££
learned
society
Data Centre
33
All Curators Now …
Thank you
http://edina.ac.uk
http://jisc.ac.uk
JISC Information Environment Architecture
(Idealised) Technical Infrastructure for ServicesAndy Powell, 2005
35
Disciplinary data-centres
* Something’s special about the spatial *
EDINA has role as Geographic Data Centre
Slide ‘borrowed’ from Liz Lyon, & curated ..
36
Support for Research & research-led learning• Data, software and facilities
– Data as ‘evidence’– Data curation and digital preservation: continuing access
* Digital Curation Centre established (Edinburgh-led)
• Data Archives and Data Libraries– Social surveys, and much more – IASSIST
* International Association for data professionals (1972 -)* Members in Philippines and Vietnam
• Census Programme– Small area statistics [MIMAS]– UKBORDERS (boundaries for thematic mapping) [EDINA]
• EDINA Digimap Collection– Topographic mapping data, from national mapping agency– Marine & Geological mapping data– I could say very much more about Digimap!!
• And then there are images and documentary films!
37
38
Focus on community-generated resources
1. ‘traditional ground for libraries’– Union catalogues (& links to ILL/docdel) – SUNCAT– [SAsk me about SUNCAT]
2. ‘digital library developments’* Resource Discovery Network* Inter-operability – not just http, but m2m interfaces* Digitisation
– Newspapers, NewsFilm, Manuscripts …– DIWAN: digitising Islamic Materials in UK university collections
3. New challenge: Open Access repositories* International development – UK active * Institutional Repositories
– ‘put it in The Depot’ – www.depot.ac.uk [not yet launched]
need Access Management Federation for Education & Research – Shibboleth framework, developed as part of Internet2