scoping a geospatial repository for academic deposit and extraction anne robertson edina edina...
TRANSCRIPT
Scoping a Geospatial Repository for Academic Deposit and Extraction
Anne RobertsonEDINA
EDINA National Data CentreUniversity of Edinburgh
JISC Geospatial Working Group UCL, 9th January 2006
JISC Digital Repositories Programme
• June 2005 JISC £4m programme
• Aim of encouraging growth of repositories in
UK universities and colleges
• Programme consists of 25 projects exploring
role and operation of projects
• Focus on how repositories can assist
academic researchers both to do and share
work more easily
• Open access is key driver plus growing
demand for outputs of publicly-funded
research to be freely available on the web
Today’s climateAccording to OECD Follow up Group on Issues of Access to Publicly Funded Research Data1 …
“More widespread and efficient access to and sharing of research data will have substantial benefits for most areas of scientific research.”
Evidence of re-use of data within UK data centres is low:
– “Level of re-use of data held in the AHDS and ESRC archives has been disappointingly low” (Alison Allden, 2003)
– “NERC spends about £5 million per annum on data management, but unclear what benefit it derives from this. More research is needed to establish benefits and value of data re-use” (Mark Thorley, 2003)
– Qualidata survey of qualitative data re-use (2000). 44% respondents used colleague's data rather than acquiring archived data via a dissemination service (33%)
1 Interim Report, 20 October 2002
GRADE project introduction• GRADE is part of JISC Digital Repositories Programme• GRADE will investigate and report on the technical and cultural
issues around the reuse of geospatial data within the context of discipline-based repositories
• Investigative in nature, not building a geospatial repository• Particular focus on sharing and reuse of derived
geospatial data• EDINA leading GRADE with consortium partners:
– AHRC Research Centre for Studies in Intellectual Property and Technology Law, School of Law, Edinburgh University
– National Oceanography Centre, Southampton UniversityVariety of other associate partners including
NCGDAP, BADC, Ordnance Survey, Geog depts, HEASCs
Project Work Programme
• June 05 – Apr 07• Budget £160k• 5 discrete work packages
Digital rights issues - when we consider the reuse of derived geospatial data concerns over data ownership, IPR and copyright are commonplace
Debate over institutional repository – one size fits all? Cultural aspects of allegiance to discipline not institution
Interoperability issues – how could a geospatial repository interact within JISC IE, how could it make its assets available to the GRID and eScience community
Establish user based evidence for the requirements and functionality of a repository capable of managing licensed geospatial assets
Investigate and make an assessment of informal mechanisms for geospatial data sharing
Example of derived data scenario
Derived Data ExampleOS Landline
Digitise coastline positions
Input
Processing
Processing
Output
ESRI Shapefile and tables of retreat
Ground surveyHistoric OS Maps
2001 Orthophotos
Scan Scan
Geo-reference Geo-reference
Accuracy assessment
Planimetric correction
GPS survey
Calculation of cliff retreat
Source: Use case provision of derivedgeospatial data as part of the GRADE project
in scoping digital repositories (draft report)
Example of more informal geospatial data sharing
Progress to Date
• Compendium of examples of derived geospatial data highlighting copyright issues– Basis upon which legal team can understand issues in building a
framework for data sharing respects licensing conditions– Focus our thoughts on broader concepts of copyright inheritance,
degrees of derived data
• Literature review providing global snapshot of geospatial data and repositories
• Developed a first demonstrator, built on existing open source repository software – ready to invite interaction, aim is to elicit feedback on user requirements
• Identified technical issues that we as a geographic community have not yet focussed upon. We presented these to OGC Technical Committee in November ……..
Issues – Content Packaging• Consider a geospatial data asset deposited into a
repository, it’s more than one file:– GML and associated schema!– proprietary vector format plus cartographic representation detail– geodatabase– raster with header file– Data set metadata and IPR info
• What is best method to package data?• In eLibrary world the Metadata Encoding and
Transmission Standard (METS) and IMS content package (IMS CP) and MPEG-21 DIDL for repository objects
• What direction is the GI industry taking with content packaging?
Issues – GML for archiving?
• If content packaging is about asking best method to package data, next question is about content being packaged.
• “Permanent access” requirements:– profiles and application schemas widely understood and
supported, avoid requiring “digital archaeology”– Role of GML : current focus is as transfer format
• Assessing formats for preservation: sustainability v. quality v. functionality
• How to handle proprietary formats?– Spatial databases pose special challenge
Issues – Persistent Identifiers
• Once a geospatial data asset is deposited within a repository, there is a need to be able to persistently identify this asset
• Particular repository softwares use particular schemes e.g. Fedora uses ‘info’ URI scheme
• Requirement to ensure identifier is actionable
• We are thinking about OpenURL Resolvers and perhaps Digital Object Identifier (DOI) for handle schemes
• What direction is GI industry taking with persistent identifiers?
What can JISC GWG do for GRADE?
• Take interest• Share your experiences
– examples of research data that should be shared and made available for reuse ………….
• Provide input on demonstrators and guidance on user requirements ………
Thank you
http://edina.ac.uk/projects/grade