metadata and data management activities at csiro marine research, australia kim finney & tony...

29
Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart http://www.marine.csiro.au/datacentre/

Upload: delphia-annabel-hill

Post on 28-Dec-2015

220 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart

Metadata and Data Management activities at CSIRO Marine Research, Australia

Kim Finney & Tony Rees

Divisional Data Centre

CSIRO Marine Research, Hobart

http://www.marine.csiro.au/datacentre/

Page 2: Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart

The Australian MarLIN Connection

• Great Minds Think Alike !!!

– Almost simultaneous emergence of UK and Australian MarLIN projects.

– Different emphases but many overlapping problems.

• Why Are We Here ?

– To exchange ideas, make some new data friends and hopefully leverage off of UK developments that can also address Oz marine data issues.

• Who Are We ?

– CSIRO Division of Marine Research (CMR), an Australian Commonwealth Government research agency. Approximately 300 staff. One of a number of such agencies (others include AIMS, GBRMPA)

Page 3: Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart

Orientation information ...

RV FranklinOceanographic research vessel

FRV Southern SurveyorFisheries research vessel

CMR

16 million2 kmocean territory

Page 4: Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart

CMR - Data Centre

• Established in 1997

– 12 staff (multidisciplinary),

– service Division and two ships,

– focal point for promoting data management culture within CMR,

• Data Management Strategy

– developed in 1997,

– Outlines actions that CMR must take to move its data management practices into the 21st Century,

– Covers policy, technology issues, data handling procedures, standards development/adoption - available on Data Centre web site.

Page 5: Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart

What Are Some Of The Issues We Face ?

• Corporate knowledge of datasets held (internal & external sourced).

• Purchase & sharing of externally sourced data.

• Access & re-use of data generated by individuals.

• Data archiving for re-use.

• Coordination of external data exchange/data provision.

• Data pricing policies.

• Divisional use of WWW & database technology.

• Conformance with national & international standards (data exchange, data processing, data documentation)

• Contribution to national data management issues & activities.

• Data management tools (availability, development for re-use, divisional software libraries)

• Integration of data, records, publications and financial systems

Page 6: Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart

Divisional Data Policies

What Is Our Approach ?

E.CommerceModule

Data LicensingModule

Basic WWW Metadata Directory

HyperlinkedData Files

HyperlinkedPublications

HyperlinkedDatabases

Standards

Page 7: Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart

RMIHTTP

Divisional Data Policies

E.CommerceModule

Data LicensingModule

Basic WWW Metadata Directory

HyperlinkedData Files

HyperlinkedPublications

HyperlinkedDatabasesStandards

Development Of CMR’s Research Database

Network ProtocolClient Server

Servlet(Database

AccessProgram)

ORACLEDatabase

(Java Applet, or

Browser)

Yet to beincluded

VideoDataData

Catalogue

Conceptual/

PhysicalDeploym

ent

ProjectInforma

tion ModelSource

s

GISSource

s

DeviceSource

s

Time Series Data Types

ProfileData

Types

PhotoData

CatchData

ModelData

ImageData

Meteorological

Sedimen

Sample

Data

Spatial Option

J D

B C

Page 8: Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart

VideoDataData

Catalogue

{long table

indexing all

features in the

database}

Conceptual/

PhysicalDeployme

nt

ProjectInformati

on ModelSources

GISSources

DeviceSources

Time Series Data Types

Profile

DataTypes

PhotoData

CatchData

ModelData

ImageData

Meteorological

Data

Sediment

Data

SampleData

Page 9: Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart

Concluding Remarks

Page 10: Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart

MarLIN - Marine Laboratories Information Network

and CAAB - Codes for Australian Aquatic Biota

Page 11: Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart

Situation at CMR pre-MarLIN

Centrally-held data

Derived products

CMR -producedreference works

& guides

Scientific publications

project/ voyage/ persondetails

Supporting information

CAAB taxonomic database

Externally sourced data

Indexes and catalogues

Dispersed data

(numerous dispersed resources)

Page 12: Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart

MarLIN metadatabase as at July 1999:showing pointers/links to ( ) or information sourced from ( )

Centrally-held data

Derived products

CMR -producedreference works

& guides

Scientific publications

project/ voyage/ persondetails

Supporting information

CAAB taxonomic database

Externally sourced data

Indexes and catalogues

Dispersed data

Page 13: Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart

MarLIN design questions ...

• How to make data querying, entry and maintenance easily user-accessible (but maintain metadata standards)?

– use www interfaces, but moderate user entries and updates

• What information to store, in what manner?

– use ANZLIC and “Blue Pages” elements, plus additional ones as deemed useful for Divisional needs

• What metadata standards, thesauri, etc. to follow?

– mostly follow ANZLIC & “Blue Pages”, with some extensions & replacements

• How to handle taxon-level information?

– store taxonomic codes in MarLIN, referenced to scientific and common names from Division’s “CAAB” taxonomic database

• What about subject-based searching?

– use “MarLIN subject categories”, developed from ASFA (R) scheme

Page 14: Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart

MarLIN metadatabase implementation

• Oracle database, with www front end and HTML forms/JAVA interfaces

– www used for searching and metadata submission/ metadata update, also for most administrative functions

• Relational design

– common aspects to numerous records (e.g. project, voyage, person information) stored in separate tables

• Data entry and update is via user logon (restricted to users on CMR computer domain)

– enterer details, time, etc. are automatically logged and added to record on submission

• “Submitted” records reside in separate (parallel) tables until approved by database administrator

• Nightly script runs to generate CMR’s “Blue Pages” entries from MarLIN metadata records

Page 15: Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart

MarLIN metadata elements# = “Blue Pages” extension to ANZLIC standard, * = new element added for MarLIN

Dataset ... Title * Identifier/Short Title # Data Type Custodian Organisation * Contributors * Acknowledgements # References * Publication Date Abstract * Author's Comments On-Line Links (Data, Graphics, Documentation) Location Keywords Bounding Coordinates

Subject Categories and Search Words * MarLIN Subject Categories # Habitat Keywords # Taxonomy Keywords * CAAB Species Codes # Parameters Measured # Equipment Used # Blue Pages Themes ANZLIC Search Words

Project, vessel and voyage details # Originating Project Name * Project Details # Platform/Vessel Name * Voyage Identifier * Voyage Details

Data Currency and Status Date range (Beginning and End Dates) Progress Maintenance

Data Access Stored Data Format(s) * Stored Data Volume * Stored Data Location * Specific Data Location * Specific Software Requirements * Stored Data Documentation Available Format Type(s) Access Constraints

Data Quality Data Source, Processing, and Quality Control * GIS Datum and scale used (if relevant) Logical Consistency Report Positional Accuracy Parameter Accuracy Completeness

Contact point Contact Person and Details

Metadata Information * Related MarLIN Datasets Additional Metadata * Metadata Availability Metadata Created On/By... (date, person) * Metadata Last Updated On/By... (date, person)

Page 16: Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart

Aspects of MarLIN “Search” interface ...

Page 17: Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart

Example search results

Lists of titles

Summary information

Links to voyage tracks

Page 18: Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart
Page 19: Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart

External MarLIN linkages (July 1999)

Hyperlinks todocuments,data, etc.

Selected detailsexported to ...

Online link back to ...

Internet search engines

“Blue Pages”HTML documents

(many organisations’ records)

MarLIN database(CMR’s records)

Blue Pages search facility

MarLIN search facility

Page 20: Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart

MarLIN continuing development ...

• Incorporate “live” links to other databases e.g. CAAB, CMR corporate databases, library systems

• Increase data coverage, try to maintain currency and consistency of entries

• Continue to “sell the concept” for users to document their own data

• Make a “view” of MarLIN records visible to ASDD

• Possible future links with metadata systems based on other standards, using “crosswalks”

• MarLIN v.2 to be developed in c. 12 months … closely integrated with new Divisional data storage system (with parallel development of interfaces etc., automated retrieval of data as well as metadata)

Page 21: Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart

MarLIN present ( ) and future ( ) operation

Centrally-held data

Derived products

CMR -producedreference works

& guides

Scientific publications

project/ voyage/ persondetails

Supporting information

CAAB taxonomic database

Externally sourced data

Indexes and catalogues

Dispersed data

Page 22: Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart

CAABCodes for Australian Aquatic Biota

http://www.marine.csiro.au/caab/

Page 23: Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart

Example CAAB codes

(hammerhead sharks)

(dogfishes)

Page 24: Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart

CAAB rationale/ historic reasons for existence

• Taxonomists needed a tool for organising specimen collections and supporting information

• Field biologists needed a tool for rapid data entry (to include categories corresponding to “non orthodox groups”)

• Data custodians needed a system for storing taxon-related information in a long-term, stable form (independent of future name changes)

• Use of “intelligent” codes permits rapid human- or computer-based sorting of taxa, and retrieval of supporting information

Page 25: Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart

CAAB implementation

• CAAB has 47 “major categories” (e.g. fish, mammals, Algae - Phaeophyta, angiosperms), each with up to 999,999 available codes for allocation to Australian aquatic taxa

• Coverage of Australian fish species (c.4,500) is essentially complete, also some smaller groups (marine reptiles and mammals)

• Other categories - populated on “as needs” basis (e.g. 300 molluscs, 350 crustaceans, 60 angiosperms - plus ongoing additions)

• 2-digit prefix (category code) and 3-digit family code are machine-sortable - e.g.:

– 37 = fish 37 001 = fish family 1 37 001001 = fish family 1 species 1

– families are in contiguous blocks, e.g. families 37 005 to 37 024 are all types of sharks

• Numeric code is attached to taxon, independent of changes of scientific or common name (gives relative stability for data storage)

• Master CAAB database stores taxon/voucher specimen details, present and any previous scientific names, common names, comments and other information

Page 26: Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart

Present usage of CAAB information

CMR -produced reference works & guides

CAAB taxonomic database

CAAB - generated species lists

Other organisations’

databases

CMR databases (including MarLIN)

used in ...generates ...

Quoted in ...

Page 27: Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart

Intended future CAAB operation

Links to on-line

information

CAAB taxonomic database

CAAB species lists - on-line generation

CAAB www interface

CAAB taxon-level report

Additional search facilities - e.g. MarLIN, other CMR databases, ITIS, www, etc.

Users’ databases

Page 28: Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart

CAAB continuing tasks ...

• Taxon-level information from other local databases to be incorporated into CAAB (coverage will gradually be extended to most groups of aquatic organisms)

• Database structure will be improved to suit external www user access to the database

• Species common names to be handled in a structured way, permitting user-definable output formats, more comprehensive searching, etc.

• Hyperlinks will be incorporated, to electronic versions of available maps, images, etc. as available

• On-line links to other databases from CAAB will be enabled (and vice versa)

Page 29: Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart

Selected data and metadata developments elsewhere in Australia

On-line data, data products, and summaries

Collection-based information

On-line references

Other metadata systems