niso/dcmi webinar: metadata for managing scientific research data
DESCRIPTION
TRANSCRIPT
![Page 1: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/1.jpg)
Metadata for Managing Scientific Research Data
NISO/DCMI Webinar: August 22, 2012
Jane Greenberg, Professor and Director of the SILS Metadata Research [email protected]
![Page 2: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/2.jpg)
Overview▪ Why should we care?▪ What is data?▪ What is metadata’s role w.r.t data?▪ Selected metadata standards▪ Challenges, opportunities, and jumping in▪ Concluding comments▪ Q&A
![Page 3: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/3.jpg)
BIG stuff▪ Digital data deluge (Hey & Trefethen, 2003)
▪ Big data (New York Times)
▪ The fourth paradigm (Jim Gray, 2007)
Just as important▪ The long tail (Heidorn, 2008)
▪ CODATA/Data-at-Risk Task Group▪ Scholarly communications, data citation
Technological affordances for improving and advancing science
Why should we care?
2008
![Page 4: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/4.jpg)
Cultural shift toward data sharing
▪ National and international policies – US NSF and NIH [1, 2]– OECD (Organisation for Economic Co-operation and
Development) [3]– INSPIRE Infrastructure for Spatial Information in the European
Community EU Commission [4]– UK Medical Research Council [5]
Dryad “enables scientists to validate published findings, explore new analysis methodologies, repurpose data for research questions unanticipated by the original authors, and perform synthetic studies.” (http://datadryad.org/)
![Page 5: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/5.jpg)
Overview▪ Why should we care?
▪ What is data?▪ What is metadata’s role w.r.t data?▪ Selected metadata standards▪ Challenges, opportunities, and jumping in▪ Concluding comments▪ Q&A
![Page 6: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/6.jpg)
Data▪ No single agreed upon definition▪ One person’s data is another person’s
information ▪ Data often implies the “raw” stuff lacking
context– Scholarly context, written assessment
▪ “Essence of science” (Greenberg, et al, 2009)
▪ What is science?– The Archaeology Data Service (ADS)
archaeologydataservice.ac.uk
![Page 7: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/7.jpg)
DataI know it when I see it
By example: Traditional observations, numbers, and measures stored in spreadsheets and databases, fossils, phylogenetic trees, and herbarium samples (White, 2008)
Other disciplines▪ Bioinformatics: Gene
expressions, DNA transcription to RNA translation
▪ Geology, agriculture, surveillance, and historical manuscript research: Hyperspectral remote sensing
quantity type
3162 Plain Text
476 Microsoft Excel
308 Adobe Portable Document Format
302 Comma-separated values
252 Nexus
153 Microsoft Excel OpenXML
108 Microsoft Word
80 Zip file
62 JPEG image
45 Microsoft Word OpenXML
40 Extensible Markup Language
35 Hypertext Markup Language
21 Rich Text Format
16 FASTA sequence file
15 Tag Image File Format
14 Postscript Files
2 Video Quicktime
2 Mathematica Notebook
1 Microsoft Powerpoint
(email w/R. Scherle, July 2012)
The Dryad Repository
![Page 8: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/8.jpg)
Overview▪ Why should we care?▪ What is data?
▪ What is metadata’s role w.r.t data?▪ Selected metadata standards▪ Challenges, opportunities, and jumping in▪ Concluding comments▪ Q&A
![Page 9: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/9.jpg)
Metadata defined……data about data
…….information about data
▪“Metadata or ‘data about data’ describes the content, quality, condition, and other characteristics of data.” (FGDC Metadata WG, 1998)
▪Structured information about an object (data) that facilitates functions associated with the object. (Greenberg, 2002, 2003, 2009)
![Page 10: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/10.jpg)
Discover ManageControl rights
Identify versions
Certify authenticity
Indicate status
Mark conent strucure
Situate geospatially
Describe processes
Typical functions
![Page 11: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/11.jpg)
Overview▪ Why should we care?▪ What is data?▪ What is metadata’s role w.r.t data?
▪ Selected metadata standards▪ Challenges, opportunities, and jumping in▪ Concluding comments▪ Q&A
![Page 12: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/12.jpg)
Metadata for Scientific Research Data
It g
ets
mes
sy r
eally
qu
ickl
y
![Page 13: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/13.jpg)
Metadata for Scientific Research Data
Descriptive– General to granular
▪Value (addressing a topic, “aboutness”)– Topical (ontologies, subject heading lists/thesauri,
taxonomies)
▪Named entities– Name authority files (people, organizations,
geographical jurisdictions, structures, and events)
▪Geo-spatial (coordinates)
▪Temporal data (ISO 8601/ W3CDTF, or …)
![Page 14: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/14.jpg)
Given the messiness…
“I cannot tell you exactly what metadata standards, vocabularies, etc. to use…”
![Page 15: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/15.jpg)
Examining metadata schemes
Objectives and principles
Domains Architectural layout
• Objectives
• Principles
• Discipline
• Genre
• Format
• Structural design
• Extent
• Granularity
Metadata Objectives and principles, Domain, and Architectural Layout (MODAL) framework
(Greenberg, 2005; Willis, et al, JASIST 2012)
![Page 16: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/16.jpg)
Simple schemes[6]
Objectives and principles
Domains Architectural layout
• Interoperability• Easy to
generate, lower barrier to produce
• Multi-disciplinary
• Any genre or format
• Primarily flat• Minimal with
means to extend
• General (not granular)
Dublin Core Metadata Element Set (DCMES) ver.1.1
US MARC bibliographic format
• Need training • Primarily flat• Extensible
DataCite • Primarily flat
![Page 17: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/17.jpg)
Dublin Core Application Profile-Dryad [7]
![Page 18: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/18.jpg)
DataCite example, ver.2.2 [8] National Institute for Environmental Studies and Center for Climate System Research Japan
![Page 19: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/19.jpg)
US MARC bibliographic format: World Ocean Circulation Experiment global data (Moss Landing Marine Labs and the Monterey Bay Aquarium Research Institute Library) [9]
![Page 20: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/20.jpg)
Simple/moderate schemes
Objectives and principles
Domains Architectural layout
Interoperability balanced w/specific needs
Generation requires more expertise
Greater domain focus
Genera diversity within a domain
Primarily flat Extensibility—
via connecting Slightly more
granular
Darwin Core
Access to Biological Collections Data (ABCD)
• Not as flat
Ecological Metadata Language
DCMI Terms • Graph approach
![Page 21: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/21.jpg)
Wieczorek, et al. (2012). Darwin Core: An Evolving Community-Developed Biodiversity Data Standard. PLoS One. 2012; 7(1): e29715: doi: 10.1371/journal.pone.0029715.
![Page 22: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/22.jpg)
<?xml version='1.0' encoding='UTF-8'?> <DataSets xmlns='http://www.tdwg.org/schemas/abcd/2.06'> <DataSet>
<TechnicalContacts> <TechnicalContact> <Name>Gerd MÃŒller</Name> <Email>[email protected]</Email> </TechnicalContact> </TechnicalContacts> <ContentContacts> <ContentContact> <Name>A Another</Name> <Email>[email protected]</Email> </ContentContact> </ContentContacts> <Metadata> <Description> <Representation language='en'> <Title>PonTaurus collection</Title> </Representation> </Description> <RevisionData> <DateModified>2001-03-01T00:00:00</DateModified> </RevisionData> </Metadata> <Units> <Unit> <SourceInstitutionID>BGBM</SourceInstitutionID> <SourceID>PonTaurus</SourceID> <UnitID>1136</UnitID> </Unit> </Units> </DataSet> </DataSets>
Access to Biological Collections Data (ABCD) (A minimum record)
![Page 23: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/23.jpg)
Properties in the /terms/ namespace
abstractaccessRightsaccrualMethodaccrualPeriodicityaccrualPolicyalternativeaudienceavailablebibliographicCitationconformsTocontributorcoveragecreatedcreatordatedateAccepteddateCopyrighteddateSubmitteddescription
educationLevelextentformathasFormathasParthasVersionidentifierinstructionalMethodisFormatOfisPartOfisReferencedByisReplacedByisRequiredByissuedisVersionOflanguagelicensemediatormedium
modifiedprovenancepublisherreferencesrelationreplacesrequiresrightsrightsHoldersourcespatialsubjecttableOfContentstemporaltitletypevalid
![Page 24: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/24.jpg)
Complex schemes
Objectives and principles
Domains Architectural layout
Interoperability level
Generation requires greater expertise
• Genre focus• Format
variation
Hierarchical Extensive Granular
FGDC
DDI
Content Standard for Digital Geospatial Metadata (CSDGM)/FGDC
Data Document Initiative (DDI)
1. Identification Information (M)2. Data Quality Information 3. Spatial Data Organization Information4. Spatial Reference Information5. Entity and Attribute Information6. Distribution Information7. Metadata Reference Information (M)
1. Concept2. Collecting3. Processing Archiving4. Distribution Archiving5. Discovery6. Analysis7. Repurposing
![Page 25: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/25.jpg)
Summary for descriptive schemes
▪ Simple: Interoperable, Easy to generate/low barrier, generally multidisciplinary, genera/format agnostics, primarily flat, general (not granular), 15-25 properties
▪ Simple/moderate: Interoperability balanced w/specific needs, generation requires more expertise, greater domain focus, extensible--via connecting to other schemes, more granular, more properties
▪ Complex: Interoperable level, generation requires expertise, genera focus/format variation, hierarchical, granular, and extensive (100+ properties)
![Page 26: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/26.jpg)
Value schemes
(addressing a topic,
“aboutness”)
Topical (ontologies,
subject heading
lists/thesauri,
taxonomies)
EXAMPLE
DDI Vocabularies
•Analysis Unit
•Character Set
•Commonality Type Coded
•Lifecycle Event Type
•Response Unit
•Software Package
•Summary Statistic Type
•Time Method
Named entities (people, organizations, geographical jurisdictions, structures, and events)» LC Authorities» Virtual International Authority File (VIAF)» Open Researcher and Contributor ID (ORCID)
» Gazetteers» Getty Thesaurus of Geographical Names
Geo-spatial coordinatesISO 19111
Temporal data
- Dates ISO 8601/
W3CDTF
- Periods
CODE lists- Mime type- Language- Geo.- Etc.
![Page 27: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/27.jpg)
Overview▪ Why should we care?▪ What is data?▪ What is metadata’s role w.r.t data?▪ Selected metadata standards
▪ Challenges, opportunities, and jumping in▪ Concluding comments▪ Q&A
![Page 28: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/28.jpg)
Challenges and opportunities
▪ Stop here
Challenges Opportunities
Workflow/When to generate the metadata?
Educate scientists early (Qin, 2009)Integrate into social setting w/Center for Embedded Networked Sensing(CENS) (Borgman, Mayernik, etc., 2009-current; Mayernik’s dissertation, 2011)
Methods for generating metadata (labor intensive)
Use automatic techniques as much as possible, leverage human expertise (Dryad, DataOne Excel project)
Too many standardsWhich one do I use?
Don’t panic, join communities, look for examples. (If you can’t find them?)
Do I need to implement my metadata as linked data.
No. Explore and develop a best practice. Pursue a 2 pronged approach (Greenberg, et al, 2009)
![Page 29: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/29.jpg)
Jumping in…
1. DCMI/NISO Seminars !!
2. DCMI Science and Metadata Community (http://wiki.dublincore.org/index.php/DCMI_Science_And_Metadata)
3. Digital Curation Center (DCC) (http://www.dcc.ac.uk/)
4. The Research Data Management Training, or MANTRA project (http://datalib.edina.ac.uk/mantra/)
5. DataONE workshops and tutorials (www.dataone.org/)
![Page 30: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/30.jpg)
Overview▪ Why should we care?▪ What is data?▪ What is metadata’s role w.r.t data?▪ Selected metadata standards▪ Challenges, opportunities, and jumping in
▪ Concluding comments▪ Q&A
![Page 31: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/31.jpg)
Concluding comments▪ Standards are guidelines; no police
– Aim for reasonable quality
▪ KISS: Keep it simple stupid– What’s vital; what will aid reuse?
▪ Help to move the practice forward– Share what you learn
▪ Nothing new/it’s all new– Data documentation since ancient times – SILOS; let’s break them down (Willis, et al, 2012)– Greater connectivity than ever– Cross-disciplinary approaches for problem solving
![Page 32: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/32.jpg)
Overview▪ Why should we care?▪ What is data?▪ What is metadata’s role w.r.t data?▪ Selected metadata standards▪ Challenges, opportunities, and jumping in▪ Concluding comments
▪ Q&A
![Page 33: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data](https://reader034.vdocuments.net/reader034/viewer/2022051411/5454365faf795987748b77a5/html5/thumbnails/33.jpg)
Footnotes[1] NSF Data Sharing Policy: http://www.nsf.gov/bfa/dias/policy/dmp.jsp.
[2] NIH Data Sharing Policy: http://grants.nih.gov/grants/policy/data_sharing/.
[3] ORGANISATION FOR ECONOMIC CO-OPERATION AND DEVELOPMENT/Data and Metadata Reporting and Presentation Handbook: http://www.oecd.org/std/37671574.pdf.
[4] The INSPIRE Infrastructure for Spatial Information in the European Community): http://inspire.ec.europa.eu/index.cfm/pageid/48. directive released 15 May 2007 and will be implemented in various stages, with full implementation required by 2019, and aims to create a European Union (EU) spatial data infrastructure.
[5] UK medical research council: http://www.mrc.ac.uk/Ourresearch/Ethicsresearchguidance/datasharing/index.html.
[6] The DCMI Glossary (scroll down for “schema” entry): http://dublincore.org/documents/usageguide/glossary.shtml#schema.
[7] Dublin Core Example: Data from: Divergence time estimation using fossils as terminal taxa and the origins of Lissamphibia (Dryad repository): http://datadryad.org/resource/doi:10.5061/dryad.8120?show=full.
[8] National Institute for Environmental Studies and Center for Climate System Research Japan—animation data (DataCite): http://schema.datacite.org/meta/kernel-2.2/example/datacite-metadata-sample-v2.2.xml.
[9] US MARC bibliographic format: World Ocean Circulation Experiment global data (Moss Landing Marine Labs and the Monterey Bay Aquarium Research Institute Library): http://mlml.kohalibrary.com/cgi-bin/koha/opac-detail.pl?biblionumber=9282.