2016 -12-14 gbif data publishing. gbif seminar in bergen
TRANSCRIPT
![Page 1: 2016 -12-14 GBIF data publishing. GBIF seminar in Bergen](https://reader031.vdocuments.net/reader031/viewer/2022030308/58ee7c861a28ab5c3b8b4655/html5/thumbnails/1.jpg)
DatapublishingandtheDarwinCoredatastandard
![Page 2: 2016 -12-14 GBIF data publishing. GBIF seminar in Bergen](https://reader031.vdocuments.net/reader031/viewer/2022030308/58ee7c861a28ab5c3b8b4655/html5/thumbnails/2.jpg)
GBIF provides a data publishing infrastructure
![Page 3: 2016 -12-14 GBIF data publishing. GBIF seminar in Bergen](https://reader031.vdocuments.net/reader031/viewer/2022030308/58ee7c861a28ab5c3b8b4655/html5/thumbnails/3.jpg)
GBIFprovidesaservicefordatadiscovery
globalregistry dataportal
thatisdependentonresolvablestableiden0fiersforefficientfunc0onality
![Page 4: 2016 -12-14 GBIF data publishing. GBIF seminar in Bergen](https://reader031.vdocuments.net/reader031/viewer/2022030308/58ee7c861a28ab5c3b8b4655/html5/thumbnails/4.jpg)
Research institute
Biodiversity ConservationBiodiversity
AnalysisGBIF portal
Global information systems
Scientific Research
MULTIPLE-PURPOSE DATA SERVICES
![Page 5: 2016 -12-14 GBIF data publishing. GBIF seminar in Bergen](https://reader031.vdocuments.net/reader031/viewer/2022030308/58ee7c861a28ab5c3b8b4655/html5/thumbnails/5.jpg)
Darwin Core data exchange standard
![Page 6: 2016 -12-14 GBIF data publishing. GBIF seminar in Bergen](https://reader031.vdocuments.net/reader031/viewer/2022030308/58ee7c861a28ab5c3b8b4655/html5/thumbnails/6.jpg)
WHAT IS BIODIVERSITY DATA?
Digital text or multimedia data record detailing facts about the instance of occurrence of an organism, i.e. on the what, where, when, how and by whom of the occurrence and the recording.
![Page 7: 2016 -12-14 GBIF data publishing. GBIF seminar in Bergen](https://reader031.vdocuments.net/reader031/viewer/2022030308/58ee7c861a28ab5c3b8b4655/html5/thumbnails/7.jpg)
BIODIVERSITY DATA TYPES
http://www.gbif.org/publishing-data/summary#datatypes
Checklists(oftaxonnames)
Occurrences
Metadata(datasetdescripCon)
![Page 8: 2016 -12-14 GBIF data publishing. GBIF seminar in Bergen](https://reader031.vdocuments.net/reader031/viewer/2022030308/58ee7c861a28ab5c3b8b4655/html5/thumbnails/8.jpg)
BIODIVERSITY DATA TYPES – SAMPLE DATA
http://www.gbif.org/newsroom/news/sample-based-data
Samples
IntroducConoftheEventcoreinMarch-October2015
![Page 9: 2016 -12-14 GBIF data publishing. GBIF seminar in Bergen](https://reader031.vdocuments.net/reader031/viewer/2022030308/58ee7c861a28ab5c3b8b4655/html5/thumbnails/9.jpg)
DATA STANDARDS
Slide source: GB23 Nodes Madagascar October 2015 & iDigBio Florida January 2015 - http://www.tdwg.org/standards/
ABCD Access to Biological Collection Data (2005) DwC Darwin Core (2009) AC Audubon Core Multimedia Resources Metadata Schema (2013) NCD Natural Collection Descriptions (Draft 2008) EML Ecological Metadata Language (Ecological Society of America)
![Page 10: 2016 -12-14 GBIF data publishing. GBIF seminar in Bergen](https://reader031.vdocuments.net/reader031/viewer/2022030308/58ee7c861a28ab5c3b8b4655/html5/thumbnails/10.jpg)
Darwin Core – a vocabulary of terms
WieczorekJ,BloomD,GuralnickR,BlumS,DöringM,DeGiovanniR,RobertsonT,andVieglaisD(2012)DarwinCore:AnEvolvingCommunity-DevelopedBiodiversityDataStandard.PLoSONE7(1):e29715.(doi:10.1371/journal.pone.0029715)
![Page 11: 2016 -12-14 GBIF data publishing. GBIF seminar in Bergen](https://reader031.vdocuments.net/reader031/viewer/2022030308/58ee7c861a28ab5c3b8b4655/html5/thumbnails/11.jpg)
h[p://rs.tdwg.org/dwc/terms/
Record-levelTermsdcterms:type|dcterms:modified|dcterms:language|dcterms:rights|dcterms:rightsHolder|dcterms:accessRights|dcterms:bibliographicCitaCon|dcterms:references|ins2tu2onID|collec2onID|datasetID|ins2tu2onCode|collec2onCode|datasetName|ownerInsCtuConCode|basisOfRecord|informaConWithheld|dataGeneralizaCons|dynamicProperCesOccurrenceoccurrenceID|catalogNumber|recordNumber|recordedBy|individualCount|organismQuanCty|organismQuanCtyType|sex|lifeStage|reproducCveCondiCon|behavior|establishmentMeans|occurrenceStatus|preparaCons|disposiCon|associatedMedia|associatedReferences|associatedSequences|associatedTaxa|otherCatalogNumbers|occurrenceRemarksOrganismorganismID|organismName|organismScope|assocoatedOccurrences|associatedOrganisms|previousIdenCficaCons|organismRemarksMaterialSample|LivingSpecimen|PreservedSpecimen|FossilSpecimenmaterialSampleIDEvent|HumanObserva2on|MachineObserva2oneventID|parentEventID|fieldNumber|eventDate|eventTime|startDayOfYear|endDayOfYear|year|month|day|verbaCmEventDate|habitat|samplingProtocol|sampleSizeValue|sampleSizeUnit|samplingEffort|fieldNotes|eventRemarksLoca2onloca2onID|higherGeographyID|higherGeography|conCnent|waterBody|islandGroup|island|country|countryCode|stateProvince|county|municipality|locality|verbaCmLocality|verbaCmElevaCon|minimumElevaConInMeters|maximumElevaConInMeters|verbaCmDepth|minimumDepthInMeters|maximumDepthInMeters|minimumDistanceAboveSurfaceInMeters|maximumDistanceAboveSurfaceInMeters|locaConAccordingTo|locaConRemarks|verbaCmCoordinates|verbaCmLaCtude|verbaCmLongitude|verbaCmCoordinateSystem|verbaCmSRS|decimalLa2tude|decimalLongitude|geodeCcDatum|coordinateUncertaintyInMeters|coordinatePrecision|pointRadiusSpaCalFit|footprintWKT|footprintSRS|footprintSpaCalFit|georeferencedBy|georeferencedDate|georeferenceProtocol|georeferenceSources|georeferenceVerificaConStatus|georeferenceRemarksGeologicalContextgeologicalContextID|earliestEonOrLowestEonothem|latestEonOrHighestEonothem|earliestEraOrLowestErathem|latestEraOrHighestErathem|earliestPeriodOrLowestSystem|latestPeriodOrHighestSystem|earliestEpochOrLowestSeries|latestEpochOrHighestSeries|earliestAgeOrLowestStage|latestAgeOrHighestStage|lowestBiostraCgraphicZone|highestBiostraCgraphicZone|lithostraCgraphicTerms|group|formaCon|member|bedIden2fica2oniden2fica2onID|idenCfiedBy|typeStatus|idenCficaConQualifier|dateIdenCfied|idenCficaConReferences|idenCficaConVerificaConStatus|idenCficaConRemarksTaxontaxonID|scien2ficNameID|acceptedNameUsageID|parentNameUsageID|originalNameUsageID|nameAccordingToID|namePublishedInID|taxonConceptID|scien2ficName|acceptedNameUsage|parentNameUsage|originalNameUsage|nameAccordingTo|namePublishedIn|namePublishedInYear|higherClassificaCon|kingdom|phylum|class|order|family|genus|subgenus|specificEpithet|infraspecificEpithet|taxonRank|verbaCmTaxonRank|scienCficNameAuthorship|vernacularName|nomenclaturalCode|taxonomicStatus|nomenclaturalStatus|taxonRemarksResourceRela2onship(AuxiliaryTerms)resourceRela2onshipID|resourceID|relatedResourceID|relaConshipOfResource|relaConshipAccordingTo|relaConshipEstablishedDate|relaConshipRemarksMeasurementOrFact(AuxiliaryTerms)measurementID|measurementType|measurementValue|measurementAccuracy|measurementUnit|measurementDeterminedDate|measurementDeterminedBy|measurementMethod|measurementRemarks
![Page 12: 2016 -12-14 GBIF data publishing. GBIF seminar in Bergen](https://reader031.vdocuments.net/reader031/viewer/2022030308/58ee7c861a28ab5c3b8b4655/html5/thumbnails/12.jpg)
DARWIN CORE ARCHIVE (DWC-A) v DwC-A publish DwC records including terms
from DwC-A extensions. v Simple text based format. v Zipped single file archive.
occurrence.txt
![Page 13: 2016 -12-14 GBIF data publishing. GBIF seminar in Bergen](https://reader031.vdocuments.net/reader031/viewer/2022030308/58ee7c861a28ab5c3b8b4655/html5/thumbnails/13.jpg)
DARWIN CORE ARCHIVE
A Darwin Core Archive (DwC-A) is the text representation of data formatted to Darwin Core. A DwC-A is a compressed file containing a minimum of three files.
http://rs.tdwg.org/dwc/terms/guides/text/index.htm
![Page 14: 2016 -12-14 GBIF data publishing. GBIF seminar in Bergen](https://reader031.vdocuments.net/reader031/viewer/2022030308/58ee7c861a28ab5c3b8b4655/html5/thumbnails/14.jpg)
STAR SCHEMA EXAMPLE - OCCURRENCE
Media
OccurrenceCore
Geographical
DeterminaCon
meta.xml
EML.xml
+
DwCArchiveOccurrence
Germplasm
![Page 15: 2016 -12-14 GBIF data publishing. GBIF seminar in Bergen](https://reader031.vdocuments.net/reader031/viewer/2022030308/58ee7c861a28ab5c3b8b4655/html5/thumbnails/15.jpg)
STAR SCHEMA EXAMPLE - CHECKLIST
Literature
TaxonCore
DescripCon
Occurrences
meta.xml
EML.xml
+
DwCArchiveChecklist
Vernacular
DistribuCon
Types
![Page 16: 2016 -12-14 GBIF data publishing. GBIF seminar in Bergen](https://reader031.vdocuments.net/reader031/viewer/2022030308/58ee7c861a28ab5c3b8b4655/html5/thumbnails/16.jpg)
STAR SCHEMA EXAMPLE - EVENT
EventCore
Occurrences
MeasurementorFact
meta.xml
EML.xml
+
DwCArchiveSamplesRelevé
![Page 17: 2016 -12-14 GBIF data publishing. GBIF seminar in Bergen](https://reader031.vdocuments.net/reader031/viewer/2022030308/58ee7c861a28ab5c3b8b4655/html5/thumbnails/17.jpg)
DATA NORMALIZATION
What is data normalization? Reasons to normalize a database Normal forms
http://www.essentialsql.com/get-ready-to-learn-sql-database-normalization-explained-in-simple-english/, http://databases.about.com/od/specificproducts/a/normalization.htm, http://www.dotnet-tricks.com/Tutorial/sqlserver/756N210512-Database-Normalization-Basics.html
"Datanormaliza,onistheprocessofreducingdatatoitscanonicalform.Forinstance,Databasenormaliza0onistheprocessoforganizingthefieldsandtablesofarela0onaldatabasetominimizeredundancyanddependency"(Wikipedia)."Denormaliza,onistheprocessofaGemp0ngtoop,mizethereadperformanceofadatabasebyaddingredundantdataorbygroupingdata"(Wikipedia).
![Page 18: 2016 -12-14 GBIF data publishing. GBIF seminar in Bergen](https://reader031.vdocuments.net/reader031/viewer/2022030308/58ee7c861a28ab5c3b8b4655/html5/thumbnails/18.jpg)
Publish your biodiversity data with GBIF
![Page 19: 2016 -12-14 GBIF data publishing. GBIF seminar in Bergen](https://reader031.vdocuments.net/reader031/viewer/2022030308/58ee7c861a28ab5c3b8b4655/html5/thumbnails/19.jpg)
PUBLISH DATA IN GBIF da
ta p
ublis
hing
Step 1: data holding research institutes seek endorsement as an approved data publisher.
Step 2: datasets are identified and converted to standard Darwin Core format.
Step 3: datasets can be published directly from the data node and/or with the assistance from a national GBIF node.
Citizen science data platforms also publish in GBIF.
![Page 20: 2016 -12-14 GBIF data publishing. GBIF seminar in Bergen](https://reader031.vdocuments.net/reader031/viewer/2022030308/58ee7c861a28ab5c3b8b4655/html5/thumbnails/20.jpg)
Datapublishingguidelines
h[p://www.gbif.org/resources?f[0]=gr_purpose%3A955
![Page 21: 2016 -12-14 GBIF data publishing. GBIF seminar in Bergen](https://reader031.vdocuments.net/reader031/viewer/2022030308/58ee7c861a28ab5c3b8b4655/html5/thumbnails/21.jpg)
WHAT IS DATA PUBLISHING?
“Publishing” refers to making biodiversity datasets publicly accessible and discoverable, in a standardized form, via an access point, typically a web address (a URL).
IPT
![Page 22: 2016 -12-14 GBIF data publishing. GBIF seminar in Bergen](https://reader031.vdocuments.net/reader031/viewer/2022030308/58ee7c861a28ab5c3b8b4655/html5/thumbnails/22.jpg)
TheGBIFIntegrateddataPublishingToolkit(IPT)isafreeopensourcesorwaretoolwri[eninJavathatisusedtopublishandsharebiodiversitydatasetsthroughtheGBIFnetwork.
h[p://www.gbif.org/ipt
IPTUserManual:
h[ps://github.com/gbif/ipt/wiki/IPT2ManualNotes.wiki
RobertsonT,DöringM,GuralnickR,BloomD,WieczorekJ,BraakK,OteguiJ,RussellL,DesmetP(2014).TheGBIFintegratedpublishingtoolkit:FacilitaCngtheefficientpublishingofbiodiversitydataontheinternet.PLoSOne9(8).doi:10.1371/journal.pone.0102623
![Page 23: 2016 -12-14 GBIF data publishing. GBIF seminar in Bergen](https://reader031.vdocuments.net/reader031/viewer/2022030308/58ee7c861a28ab5c3b8b4655/html5/thumbnails/23.jpg)
DATA PUBLISHING METHODS
![Page 24: 2016 -12-14 GBIF data publishing. GBIF seminar in Bergen](https://reader031.vdocuments.net/reader031/viewer/2022030308/58ee7c861a28ab5c3b8b4655/html5/thumbnails/24.jpg)
DATA PUBLISHING LANDSCAPE
DiGIR(2001),BioCASE
(2001),TapirLink(2007)inusefor
publishingbiodiversitydata
Ideaforsimple,compressedtext-basedfileforpublishing
introducedatTDWG
GBIFintroducesIPT1.0
GBIFredevelopsIPTwithlessmemory
requirements
GBIFintroducesIPT2.0
IPTmorethan100installaConsandservingmore
than800datasets
Nodesandaggregators
(includingGBIFNorway)begintoinstallanduse
IPTs
Demo/testEventcore
developedbyGBIFandEU
BON
2007 2008 2009 2010 2011 2012 20142013 2015
Eventcoreisreleasedforuse(October2015).
DatasetDOIswithDataCite(March2015).IPTbecomesthe
dominantdata-publishingsoluConinGBIF.
![Page 25: 2016 -12-14 GBIF data publishing. GBIF seminar in Bergen](https://reader031.vdocuments.net/reader031/viewer/2022030308/58ee7c861a28ab5c3b8b4655/html5/thumbnails/25.jpg)
Researchgrade:anobservaConmusthavemedia,coordinates,adate,andpassqualitymetrics,butnowthecommunityIDmustbefinerthanfamily.
NeedsID:AnyobservaConthatcouldbecome“Research”gradebutneedsmoreiden0fica0ons.
Casual:AnyobservaConthatcannotbecome“Research”grade.
Morethan17millionNorwegianoccurrencerecordspublishedinGBIF
![Page 26: 2016 -12-14 GBIF data publishing. GBIF seminar in Bergen](https://reader031.vdocuments.net/reader031/viewer/2022030308/58ee7c861a28ab5c3b8b4655/html5/thumbnails/26.jpg)
DATA PUBLISHING LANDSCAPE: STATUS 2016
TheconCnuedGBIFcommitmenttoimprovingaccesstobiodiversitydata.
RefinementandexpansionofstandardsandpublishingsoQware.
Evolvingsocialnorms.
MostdatasCllpublishedwithsimpleoccurrencecore.
Portalsdonotcontainthefeaturestosupportricherdata.
ManyinsCtuConssCllneedconvincingtopublishbiodiversitydata.
http://www.gbif.org/page/82104
![Page 27: 2016 -12-14 GBIF data publishing. GBIF seminar in Bergen](https://reader031.vdocuments.net/reader031/viewer/2022030308/58ee7c861a28ab5c3b8b4655/html5/thumbnails/27.jpg)
NodeteamatNHM,UniversityofOsloDagEndresen,NodemanagerChrisCanSvindseth,Databasemanager
FridtjofMehlum,ResearchdirectorEinarTimdal,AssociateprofessorGeirSøli,AssociateprofessorVidarBakken,Consultant
Artsdatabanken,Trondheim
WouterKochNilsValland
NTNUUniversityMuseumAndersFinstad,GBIFSciencecommiGee
ResearchCouncilofNorway
PerBacke-Hansen,Headofdelega0on
Contactusat:[email protected]