metadata and data management activities at csiro marine research, australia kim finney & tony...
TRANSCRIPT
Metadata and Data Management activities at CSIRO Marine Research, Australia
Kim Finney & Tony Rees
Divisional Data Centre
CSIRO Marine Research, Hobart
http://www.marine.csiro.au/datacentre/
The Australian MarLIN Connection
• Great Minds Think Alike !!!
– Almost simultaneous emergence of UK and Australian MarLIN projects.
– Different emphases but many overlapping problems.
• Why Are We Here ?
– To exchange ideas, make some new data friends and hopefully leverage off of UK developments that can also address Oz marine data issues.
• Who Are We ?
– CSIRO Division of Marine Research (CMR), an Australian Commonwealth Government research agency. Approximately 300 staff. One of a number of such agencies (others include AIMS, GBRMPA)
Orientation information ...
RV FranklinOceanographic research vessel
FRV Southern SurveyorFisheries research vessel
CMR
16 million2 kmocean territory
CMR - Data Centre
• Established in 1997
– 12 staff (multidisciplinary),
– service Division and two ships,
– focal point for promoting data management culture within CMR,
• Data Management Strategy
– developed in 1997,
– Outlines actions that CMR must take to move its data management practices into the 21st Century,
– Covers policy, technology issues, data handling procedures, standards development/adoption - available on Data Centre web site.
What Are Some Of The Issues We Face ?
• Corporate knowledge of datasets held (internal & external sourced).
• Purchase & sharing of externally sourced data.
• Access & re-use of data generated by individuals.
• Data archiving for re-use.
• Coordination of external data exchange/data provision.
• Data pricing policies.
• Divisional use of WWW & database technology.
• Conformance with national & international standards (data exchange, data processing, data documentation)
• Contribution to national data management issues & activities.
• Data management tools (availability, development for re-use, divisional software libraries)
• Integration of data, records, publications and financial systems
Divisional Data Policies
What Is Our Approach ?
E.CommerceModule
Data LicensingModule
Basic WWW Metadata Directory
HyperlinkedData Files
HyperlinkedPublications
HyperlinkedDatabases
Standards
RMIHTTP
Divisional Data Policies
E.CommerceModule
Data LicensingModule
Basic WWW Metadata Directory
HyperlinkedData Files
HyperlinkedPublications
HyperlinkedDatabasesStandards
Development Of CMR’s Research Database
Network ProtocolClient Server
Servlet(Database
AccessProgram)
ORACLEDatabase
(Java Applet, or
Browser)
Yet to beincluded
VideoDataData
Catalogue
Conceptual/
PhysicalDeploym
ent
ProjectInforma
tion ModelSource
s
GISSource
s
DeviceSource
s
Time Series Data Types
ProfileData
Types
PhotoData
CatchData
ModelData
ImageData
Meteorological
Sedimen
Sample
Data
Spatial Option
J D
B C
VideoDataData
Catalogue
{long table
indexing all
features in the
database}
Conceptual/
PhysicalDeployme
nt
ProjectInformati
on ModelSources
GISSources
DeviceSources
Time Series Data Types
Profile
DataTypes
PhotoData
CatchData
ModelData
ImageData
Meteorological
Data
Sediment
Data
SampleData
Concluding Remarks
MarLIN - Marine Laboratories Information Network
and CAAB - Codes for Australian Aquatic Biota
Situation at CMR pre-MarLIN
Centrally-held data
Derived products
CMR -producedreference works
& guides
Scientific publications
project/ voyage/ persondetails
Supporting information
CAAB taxonomic database
Externally sourced data
Indexes and catalogues
Dispersed data
(numerous dispersed resources)
MarLIN metadatabase as at July 1999:showing pointers/links to ( ) or information sourced from ( )
Centrally-held data
Derived products
CMR -producedreference works
& guides
Scientific publications
project/ voyage/ persondetails
Supporting information
CAAB taxonomic database
Externally sourced data
Indexes and catalogues
Dispersed data
MarLIN design questions ...
• How to make data querying, entry and maintenance easily user-accessible (but maintain metadata standards)?
– use www interfaces, but moderate user entries and updates
• What information to store, in what manner?
– use ANZLIC and “Blue Pages” elements, plus additional ones as deemed useful for Divisional needs
• What metadata standards, thesauri, etc. to follow?
– mostly follow ANZLIC & “Blue Pages”, with some extensions & replacements
• How to handle taxon-level information?
– store taxonomic codes in MarLIN, referenced to scientific and common names from Division’s “CAAB” taxonomic database
• What about subject-based searching?
– use “MarLIN subject categories”, developed from ASFA (R) scheme
MarLIN metadatabase implementation
• Oracle database, with www front end and HTML forms/JAVA interfaces
– www used for searching and metadata submission/ metadata update, also for most administrative functions
• Relational design
– common aspects to numerous records (e.g. project, voyage, person information) stored in separate tables
• Data entry and update is via user logon (restricted to users on CMR computer domain)
– enterer details, time, etc. are automatically logged and added to record on submission
• “Submitted” records reside in separate (parallel) tables until approved by database administrator
• Nightly script runs to generate CMR’s “Blue Pages” entries from MarLIN metadata records
MarLIN metadata elements# = “Blue Pages” extension to ANZLIC standard, * = new element added for MarLIN
Dataset ... Title * Identifier/Short Title # Data Type Custodian Organisation * Contributors * Acknowledgements # References * Publication Date Abstract * Author's Comments On-Line Links (Data, Graphics, Documentation) Location Keywords Bounding Coordinates
Subject Categories and Search Words * MarLIN Subject Categories # Habitat Keywords # Taxonomy Keywords * CAAB Species Codes # Parameters Measured # Equipment Used # Blue Pages Themes ANZLIC Search Words
Project, vessel and voyage details # Originating Project Name * Project Details # Platform/Vessel Name * Voyage Identifier * Voyage Details
Data Currency and Status Date range (Beginning and End Dates) Progress Maintenance
Data Access Stored Data Format(s) * Stored Data Volume * Stored Data Location * Specific Data Location * Specific Software Requirements * Stored Data Documentation Available Format Type(s) Access Constraints
Data Quality Data Source, Processing, and Quality Control * GIS Datum and scale used (if relevant) Logical Consistency Report Positional Accuracy Parameter Accuracy Completeness
Contact point Contact Person and Details
Metadata Information * Related MarLIN Datasets Additional Metadata * Metadata Availability Metadata Created On/By... (date, person) * Metadata Last Updated On/By... (date, person)
Aspects of MarLIN “Search” interface ...
Example search results
Lists of titles
Summary information
Links to voyage tracks
External MarLIN linkages (July 1999)
Hyperlinks todocuments,data, etc.
Selected detailsexported to ...
Online link back to ...
Internet search engines
“Blue Pages”HTML documents
(many organisations’ records)
MarLIN database(CMR’s records)
Blue Pages search facility
MarLIN search facility
MarLIN continuing development ...
• Incorporate “live” links to other databases e.g. CAAB, CMR corporate databases, library systems
• Increase data coverage, try to maintain currency and consistency of entries
• Continue to “sell the concept” for users to document their own data
• Make a “view” of MarLIN records visible to ASDD
• Possible future links with metadata systems based on other standards, using “crosswalks”
• MarLIN v.2 to be developed in c. 12 months … closely integrated with new Divisional data storage system (with parallel development of interfaces etc., automated retrieval of data as well as metadata)
MarLIN present ( ) and future ( ) operation
Centrally-held data
Derived products
CMR -producedreference works
& guides
Scientific publications
project/ voyage/ persondetails
Supporting information
CAAB taxonomic database
Externally sourced data
Indexes and catalogues
Dispersed data
CAABCodes for Australian Aquatic Biota
http://www.marine.csiro.au/caab/
Example CAAB codes
(hammerhead sharks)
(dogfishes)
CAAB rationale/ historic reasons for existence
• Taxonomists needed a tool for organising specimen collections and supporting information
• Field biologists needed a tool for rapid data entry (to include categories corresponding to “non orthodox groups”)
• Data custodians needed a system for storing taxon-related information in a long-term, stable form (independent of future name changes)
• Use of “intelligent” codes permits rapid human- or computer-based sorting of taxa, and retrieval of supporting information
CAAB implementation
• CAAB has 47 “major categories” (e.g. fish, mammals, Algae - Phaeophyta, angiosperms), each with up to 999,999 available codes for allocation to Australian aquatic taxa
• Coverage of Australian fish species (c.4,500) is essentially complete, also some smaller groups (marine reptiles and mammals)
• Other categories - populated on “as needs” basis (e.g. 300 molluscs, 350 crustaceans, 60 angiosperms - plus ongoing additions)
• 2-digit prefix (category code) and 3-digit family code are machine-sortable - e.g.:
– 37 = fish 37 001 = fish family 1 37 001001 = fish family 1 species 1
– families are in contiguous blocks, e.g. families 37 005 to 37 024 are all types of sharks
• Numeric code is attached to taxon, independent of changes of scientific or common name (gives relative stability for data storage)
• Master CAAB database stores taxon/voucher specimen details, present and any previous scientific names, common names, comments and other information
Present usage of CAAB information
CMR -produced reference works & guides
CAAB taxonomic database
CAAB - generated species lists
Other organisations’
databases
CMR databases (including MarLIN)
used in ...generates ...
Quoted in ...
Intended future CAAB operation
Links to on-line
information
CAAB taxonomic database
CAAB species lists - on-line generation
CAAB www interface
CAAB taxon-level report
Additional search facilities - e.g. MarLIN, other CMR databases, ITIS, www, etc.
Users’ databases
CAAB continuing tasks ...
• Taxon-level information from other local databases to be incorporated into CAAB (coverage will gradually be extended to most groups of aquatic organisms)
• Database structure will be improved to suit external www user access to the database
• Species common names to be handled in a structured way, permitting user-definable output formats, more comprehensive searching, etc.
• Hyperlinks will be incorporated, to electronic versions of available maps, images, etc. as available
• On-line links to other databases from CAAB will be enabled (and vice versa)
Selected data and metadata developments elsewhere in Australia
On-line data, data products, and summaries
Collection-based information
On-line references
Other metadata systems