The BARCODE Data Standard
David E. Schindel, Executive SecretaryNational Museum of Natural History
Smithsonian [email protected]; http://www.barcoding.si.edu
202/633-0812; fax 202/633-2938
BARCODE Data Standard is:
A set of required elements for a reserved Keyword (‘BARCODE’) in GenBank
A set of sequence quality requirements
Required or recommneded formats for data interoperability with:– Voucher specimens in biorepositories– Georeferenced data– Taxonomic literature
An Internal ID System for All Animals
Typical Animal Cell
Mitochondrion
DNA
mtDNA
D-Loop
ND5
H-strand
ND4
ND4L
ND3COIII
L-strand
ND6
ND2
ND1
COII
Small ribosomal RNA
ATPase subunit 8
ATPase subunit 6
Cytochrome b
COICOI
The Mitochondrial Genome
Non-COI regions for other taxaLand plants:– Chloroplast matK and rbcL approved Nov 09
– 70-75% resolving ability, higher in angiosperms– Non-coding plastid and nuclear regions being
explored
Fungi:– CBOL Working Group met this week in Amsterdam– Agreed to recommend ITS; 72% effective
Protists:– CBOL Working Group July meeting, Berlin
USER
/GenBank
Key
Mirroring
Update Channel
Private Records
BARCODE Record Flow Chart
BARCODE Records in GenBank
Submission of BARCODE Records to EBI and DDBJ
Required Elements for BARCODE
Taxonomic identification to species
Voucher specimen ID in standard format
Name of barcode region
Length, quality, 2 trace files
Forward/reverse primer sequences, names
Country/Ocean/Sea of origin
Highly Recommended Elements
Latitude/longitude
Name of Collector
Collection date
Name of identifier
Traditional Taxonomy
GSC Minimum Standards
(MI*)
Traditional GenBank
Voucher specimen ID XXX XXXSpecies ID XXX X X
Identified by XXXDNA sequence XXX XXXGene region XXXGeographic origin (country, ocean) XXX XLatitude/Longitude XXX XXX
Collection date, collector name XXX XXX
Trace files XXX XXPrimer information X XX
Barcode Sequence
Voucher Specimen
Species Name
Specimen Metadata
Literature citation
BARCODE Records in INSDC
Indices - Catalogue of Life - GBIF/ECAT
Nomenclators - Zoo Record - IPNI - NameBank
Publication links - New species
GeoreferenceHabitat
Character setsImages
BehaviorOther genes
Trace files Primers
Databases - Provisional sp.
Record in BOLD
Compliance with Standard (1)1.37 million records in BOLD
514,390 BARCODE records in INSDC
395,774 have ordinal name plus Barcode Index Number for taxonomic ID– Rapid data release versus time for annotation– Exposure to data theft, risk of misidentification– Added value of Linnean name– Incidence of misidentifications in GenBank– Danger of circular reasoning
Taxonomic Identification
The genus and species combination that can be found in:– a taxonomic index such as Catalog of Life,
Zoological Record or IPNI;– a taxonomic treatment of a previously
published species name; or– a published description of the species; or
A provisional label for a potential new species;
Rod Page’s ‘Dark Taxa’
R. Page, iPhylo blogspot, 12 April 2011
Taxonomic Content in iBOL Data
iBOL ‘Phase 1’
Org name: Order + BIN
Tentative Name: blank
GenBank ‘Phase 0’
Tentative name is in BOLD, unreleased
iBOL ‘Phase 2’
Org name: Order + BIN
Tentative Name: blank
GenBank ‘Phase 1’
Org name = Order + BIN plus
Tentative name
GenBank ‘Phase 2’
Org name = sp. name
Unique identifier for the voucher specimen
In standardized format based on Darwin Core:
Institutional acronym:Collection code:Specimen number
Institutional acronym:Specimen number
personal:Collection code:Specimen number
GTI/CBOL/iBOL Workshop, 7 November 2009
Compliance with Standard (2)514,390 BARCODE records in INSDC– Traces, primers, length, country, and presence
of voucherID checked by GenBank
99.9% have entry for /specimen_voucher
13,151 have formatted voucher from 38 institutions– 20 confirmed in biorepositories– 11 unconfirmed– 7 unlisted
Darwin Core TripletStructured Link to Vouchers
Institutional Acronym
Collection Code
Catalog ID
: :
NHM LEP 123456: :
personal DHJanzen SRNP12345: :
AMNHIcelandic Institute of Natural History, Akureyri Division Akureyri Iceland
AMNH American Museum of Natural History New York USA
UNL Universidad Autónoma de Nuevo León Monterrey, Nuevo León Mexico
UNL University of Nebraska State Museum Lincoln, Nebraska USA
UNLCentro de Estratigrafia e Paleobiologia da Universidade Nova de Lisboa Monte de Caparica Portugal
ZMK Zoological Musem, Kristiania Oslo Norway
ZMK Zoologisches Museum der Universität Kiel Kiel Germany
ZMK Zoological Museum, Copenhagen Copenhagen Denmark
CBOL/GBIF/NCBI Registry of Biorepositories
www.biorepositories.org