a centre of expertise in data curation and preservation imeche workshop, london, 26 th september...
TRANSCRIPT
![Page 1: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on](https://reader036.vdocuments.net/reader036/viewer/2022062312/5516096655034694308b4ff2/html5/thumbnails/1.jpg)
a centre of expertise in data curation and preservation
IMechE Workshop, London, 26th September 2006
Looking to the longer term: some perspectives on data curation
and preservation
This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 2.0
Funded by:
Dr Liz Lyon,
DCC Associate Director Outreach Director, UKOLN, University of Bath, UK
![Page 2: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on](https://reader036.vdocuments.net/reader036/viewer/2022062312/5516096655034694308b4ff2/html5/thumbnails/2.jpg)
About UKOLN
• “a centre of expertise in digital information management”• Funding: Joint Information Systems Committee (JISC) +
Museums, Libraries & Archives Council (MLA)• Portfolio of R&D projects Delos, DRIVER, Grand Challenge• 29+ staff based at the University of Bath• Inform the library, information, education and cultural
heritage communities• Policy, advocacy at national level, build innovative Web-
based systems & services, R&D, e-journal Ariadne, workshops and conferences.
• http://www.ukoln.ac.uk/
Acknowledgement: Alex Ball, Grand Challenge Project
![Page 3: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on](https://reader036.vdocuments.net/reader036/viewer/2022062312/5516096655034694308b4ff2/html5/thumbnails/3.jpg)
UK Digital Curation Centre
• Digital Curation Centre• Funded by JISC & EPSRC• Development activities• Research agenda• Delivering services• Outreach Programme• http://www.dcc.ac.uk/
![Page 4: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on](https://reader036.vdocuments.net/reader036/viewer/2022062312/5516096655034694308b4ff2/html5/thumbnails/4.jpg)
a centre of expertise in data curation and preservation
IMechE Workshop, London, 26th September 2006
Overview• Data curation and digital preservation issues • Draw on research and scholarship
perspectives• Data / information flows and the “business
process”• UK Digital Curation Centre activities
“maintaining and adding value to a trusted body of digital information for current and
future use”
![Page 5: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on](https://reader036.vdocuments.net/reader036/viewer/2022062312/5516096655034694308b4ff2/html5/thumbnails/5.jpg)
Data-centric 2020 vision
Reference datasets as infrastructure?
![Page 6: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on](https://reader036.vdocuments.net/reader036/viewer/2022062312/5516096655034694308b4ff2/html5/thumbnails/6.jpg)
(Very simple) Product Research Cycle & Data Curation
Formulate ideas / hypothesis, test, experiment, observe, design: data
creation, collection & capture
Adding value: Data linking, annotation,
visualisation, simulation
(New) knowledge extraction: data mining, modelling, analysis, synthesis
e-Infrastructure
Open ?? access
Collaboration
Scholarly communications & Business transactions: data disclosure, publication, citation, discovery, re-use
Data management storage & validation: description, deposit,
self-archiving, preservation,
certification
Data processing
Data processingData processing
Data processing
Data processing
This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 2.0
![Page 7: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on](https://reader036.vdocuments.net/reader036/viewer/2022062312/5516096655034694308b4ff2/html5/thumbnails/7.jpg)
Maintenance Engineer Aircraft Lands
Visual Inspection
Provide Information
Quote Diagnos is
Brief Diagnos is / Prognos is
Check Diagnoses
Maintenance Procedure
Diagnos is Result
Release Engine
complete
Maintenance Result
Maintenance Analys t (Fleet Manager)
Detailed Diagnos is / Prognos is
Provide Further Details
Reques t Information
Sign-off Diagnos is
Analys t Decis ion
[ information required ]
[ diagnosis ]
DAME signal processing workflows using Grid Services
Domain Expert
Detailed Analys is
[ unknown ]
Reques t Further Details
Expert Decis ion
[ known ][ Clear ]
[ unknown ]
[ information required ]
[ diagnosis ]
[ fault unresolved ]
[ fault resolved ]
Rolls RoyceDS&SAirport
• RepoMMan: Repository Metadata and Management (Hull) using WS-BPEL
• Are your engineering workflows identified and described?
Workflowe-Scientist desktop?
Slide: Carole Goble
![Page 8: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on](https://reader036.vdocuments.net/reader036/viewer/2022062312/5516096655034694308b4ff2/html5/thumbnails/8.jpg)
Research outputs in institutional repositories: engineering
![Page 9: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on](https://reader036.vdocuments.net/reader036/viewer/2022062312/5516096655034694308b4ff2/html5/thumbnails/9.jpg)
“JISC Vision”: a global landscape of federated repositories
fusion layer ‘repository federator’
repository repository repository repository repository
portal portal portal portal portal
heterogeneous - metadataformats, content formats,identifiers, packagingstandards
homogeneous - metadataformats, content formats,identifiers, packagingstandards
From Andy Powell: http://www.ukoln.ac.uk/distributed-systems/jisc-ie/arch/presentations/jiie-jcs-2005/
• Multi-disciplinary, cross-sectoral
• National, institutional
• Different platforms
• Many format types: data, eprints, images, geospatial
• e-Framework and Information Environment context
• Define common + domain-specific + repository “services”
• Interoperability based on open standards, software tools
![Page 10: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on](https://reader036.vdocuments.net/reader036/viewer/2022062312/5516096655034694308b4ff2/html5/thumbnails/10.jpg)
Pilot Engineering Repository Xsearch PerX http://www.engineering.ac.uk/
![Page 11: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on](https://reader036.vdocuments.net/reader036/viewer/2022062312/5516096655034694308b4ff2/html5/thumbnails/11.jpg)
a centre of expertise in data curation and preservation
IMechE Workshop, London, 26th September 2006
![Page 12: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on](https://reader036.vdocuments.net/reader036/viewer/2022062312/5516096655034694308b4ff2/html5/thumbnails/12.jpg)
STEP ISO10303
Interoperability???
![Page 13: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on](https://reader036.vdocuments.net/reader036/viewer/2022062312/5516096655034694308b4ff2/html5/thumbnails/13.jpg)
Repositories and OAIS Reference Model“an archive consisting of an organisation of people and systems that has
accepted the responsibility to preserve information and make it available for a Designated Community..an identified group of potential consumers who
should be able to understand a particular set of information”
4-1
.2
MANAGEMENT
Ingest
Data Management
SIP
AIPDIP
queries
result setsAccess
PRODUCER
CONSUMER
Descriptive Info
AIP
orders
Descriptive Info
Archival Storage
Administration
Preservation Planning
![Page 14: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on](https://reader036.vdocuments.net/reader036/viewer/2022062312/5516096655034694308b4ff2/html5/thumbnails/14.jpg)
Assuring permanence: digital preservation• Trusted DR Audit Checklist for Certification Draft Research Libraries Group-NARA Taskforce 2005
Defined criteria: – Organisation– Functions, processes & procedures– Designated community & usability– Technologies & technical infrastructure
• Revised Checklist based on feedback and pilot audits (KB, BADC)
• Self-certification: DINI-Zertifikat: requirements & recommendations:– Server policy / Guidelines– Author support– Legal issues– Authenticity and integrity– Cataloguing– Access statistics– Long-term sustainability
• Has your repository / PLM been audited?
![Page 15: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on](https://reader036.vdocuments.net/reader036/viewer/2022062312/5516096655034694308b4ff2/html5/thumbnails/15.jpg)
Interdisciplinary discovery• Validation, publication & discovery of data
models & schema• Harmonisation and normalisation of
metadata and semantics• Packaging standards: METS,
MPEG-21 DIDL• Formal high-level and domain ontologies• ePrints DC Application Profile
http://www.ukoln.ac.uk/repositories/digirep/index/Eprints_Application_Profile
• eBank Application Profile crystallography data http://www.ukoln.ac.uk/projects/ebank-uk/schemas/
• What data models and metadata schema are in place?
![Page 16: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on](https://reader036.vdocuments.net/reader036/viewer/2022062312/5516096655034694308b4ff2/html5/thumbnails/16.jpg)
Persistent identifiers for data citation• How will they be used? We need use cases: depositor, author,
service provider, researcher, publisher?• Schemes: DOI, Handle, ARK, PURL• Global identification: express as http URIs• Data citation (human and machine-actionable)• Publication & citation of scientific primary data project National
Library for Science & Technology (TIB), University of Hanover, Germany. STD-DOI Project DOI registry for datasets http://www.std-doi.de
• Is there a data citation policy?
• What persistent identifiers have been assigned to your data?
![Page 17: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on](https://reader036.vdocuments.net/reader036/viewer/2022062312/5516096655034694308b4ff2/html5/thumbnails/17.jpg)
Discovering data: eBank Project
Coles, S.J., Day, N.E., Murray-Rust, P., Rzepa, H.S., Zhang, Y., Org. Biomol. Chem., 2005, (10),1832-1834. DOI: 10.1039/b502828k
• Domain identifier: International Chemical Identifier (INChI) code• Google molecule using INChISlide from Simon Coles
Domain identifiers for engineering?
![Page 18: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on](https://reader036.vdocuments.net/reader036/viewer/2022062312/5516096655034694308b4ff2/html5/thumbnails/18.jpg)
Format migration challenges? CAD Program Compatibility Chart http://www.okino.com/conv/filefrmt_cad.htm
![Page 19: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on](https://reader036.vdocuments.net/reader036/viewer/2022062312/5516096655034694308b4ff2/html5/thumbnails/19.jpg)
Registry development
![Page 20: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on](https://reader036.vdocuments.net/reader036/viewer/2022062312/5516096655034694308b4ff2/html5/thumbnails/20.jpg)
Development: Representation Information Registry Repository
• “DCC Approach to Digital Curation” based on OAIS• Representation Information Registry Repository • Prototype demonstrator: based on 2 key concepts to facilitate
sharing of the curation effort– Curation Persistent Identifier (CPID)– Descriptive “label” (structural, semantic, other metadata)
• Development of (M2M) tools and interfaces for creating, using and re-using representation information
• http://dev.dcc.ac.uk Wiki and email list
• EU CASPAR Integrated Project
• Task Force on the Permanent Access to the Records of Science http://www.casparpreserves.info/pages/1/index.htm
http://tfpa.kb.nl/
![Page 21: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on](https://reader036.vdocuments.net/reader036/viewer/2022062312/5516096655034694308b4ff2/html5/thumbnails/21.jpg)
Registry APIAllows applications to talk to many different registry implementations e.g. GDFR, PRONOM, UDDI
•GUI Access and via Web browser http://registry.dcc.ac.uk
![Page 22: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on](https://reader036.vdocuments.net/reader036/viewer/2022062312/5516096655034694308b4ff2/html5/thumbnails/22.jpg)
Adding value through annotation Research at the University of Edinburgh
• Scientific databases: Annotation scoping report
• New annotation model + prototype MONDRIAN
• Intuitive visual interface iMONDRIAN
• Annotate sets of values
• Support for querying annotations
![Page 23: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on](https://reader036.vdocuments.net/reader036/viewer/2022062312/5516096655034694308b4ff2/html5/thumbnails/23.jpg)
Nature 23 March 2006 OTMI: Open Text Mining Interface
NaCTeMhttp://www.nactem.ac.uk/
Emerging tools: TerMine, GENIA, Cafetiere
Knowledge extraction:• Mining (data, text, structures)
• Modelling (economic, climate, mathematical, biological…)
• Analysis (statistical, lexical, gene….)
![Page 24: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on](https://reader036.vdocuments.net/reader036/viewer/2022062312/5516096655034694308b4ff2/html5/thumbnails/24.jpg)
Supporting the community: Services• [email protected] • legal - technical guidance • Curation Manual 45 chapters planned
– Metadata (umbrella)– Open Source– Archival metadata– Preservation metadata– Selection & appraisal– Curating emails
• Briefing Papers– Curating emails – Digital repositories – Geospatial data – Data protection – eScience data
• Case studies
![Page 25: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on](https://reader036.vdocuments.net/reader036/viewer/2022062312/5516096655034694308b4ff2/html5/thumbnails/25.jpg)
a centre of expertise in data curation and preservation
IMechE Workshop, London, 26th September 2006
DCC Case Study published: Wide Field Astronomy Unit
![Page 26: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on](https://reader036.vdocuments.net/reader036/viewer/2022062312/5516096655034694308b4ff2/html5/thumbnails/26.jpg)
Supporting the community: Outreach & Services • Workshops:
• Geospatial data, NeSC, 27 October• OAIS 5 year Review, October• Audit & Certification Forum, October• Records Management, L’pool 30 Nov• Curation & Preservation Training, Dec• 2007 Preservation of journals tbc• 2007 Legal environment tbc• 2007 Preparing for audit tbc
• Information Days British Library L’pool UCL
• 2nd International DCC Conference 21-22 November, Glasgow
• Keynotes: Hans F. Hoffmann, CERN, Clifford Lynch, CNI
![Page 27: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on](https://reader036.vdocuments.net/reader036/viewer/2022062312/5516096655034694308b4ff2/html5/thumbnails/27.jpg)
a centre of expertise in data curation and preservation
IMechE Workshop, London, 26th September 2006
DCC Phase 2: 2007-2010• Working more closely with data centres, e-Science
Programmes and Research Councils• SCARP Project: disciplinary approach• JISC Digital Repository Programme collaboration• RepInfo Registry service migration• Define self-assessment procedures and tools• Collaborate with CASPAR, DPE and PLANETS (EU-
funded Digital Preservation Projects)• Workshop Programme, International Conference 2007
![Page 28: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on](https://reader036.vdocuments.net/reader036/viewer/2022062312/5516096655034694308b4ff2/html5/thumbnails/28.jpg)
University of Bath, 13 September 2006
a centre of expertise in data curation and preservation
Thank you.Questions?
Join the DCC Associates Network at www.dcc.ac.uk