Building breeding databases in GDR, Genome Database for Rosaceae
Sook Jung, Taein Lee, Stephen Ficklin, Kate Evans, Cameron Peace and Dorrie
Main
GDR: Genome database for RosaceaeGenomic, Genetic and Breeding data
Other databases:CottonGen, Citrus Genome Database, Cool Season Food
Legume Database, Genome Database for Vaccinium
Using open source tools for an efficient and flexible database construction (Chado, Tripal, Drupal)
Chado, with the recent Natural Diversity Module, allows integration of complex biological data from widely different projects and species
Introduction
Part I: How to store data using Chado
Part II: Demo of GDR Breeding Database
Outline
Chado: Modular, Generic and Ontology-driven schema
Feature
Feature_idNameUniquenameType_idOrganism_idresidues
Feature_relationship
Feature_relationship_idSubject_idObject_idType_id
Featureprop
Featureprop_idFeature_idType_idValuerank
cvterm
cvterm_idNamedefinitioncv_idDbxref_id
gene, mRNA, marker, QTL, etc
Abc-mRNApart_of
Abc-gene
Repeat_motif
Product_size
Subject_id
object_id
cv
cv_idNamedefinition
Sequence Ontology, Gene Ontology, etc
Storing Stock (from samples to population; pedigree)
stock
stock_idNameUniquenameType_idOrganism_idresidues
stock_relationship
Feature_relationship_idSubject_idObject_idType_id
stockprop
stockprop_idstock_idType_idvalue
cvterm
cvterm_idNamedefinitioncv_idDbxref_id
Population, cultivar,
breeding line, clone, sample,
etc
Gala-001sample_o
fGala
Description,population_si
ze
Subject_id
object_id
GalaMaternal_parent_
ofSonya
pedigree
Storing phenotype data (from measurements to projects)
stock
Feature_idNameUniquenameType_idOrganism_idresidues
nd_experiment
Nd_experiment_idNd_geolocation_idType_id phenotype
phenotype_idUniquenamevalueattr_id
cvterm
cvterm_idNamedefinitioncv_idDbxref_id
PhenotypingGenotypingCross_experiment
project
Featureprop_idFeature_idType_idvalue
NE_stockNE_phenoty
pe
project_relationship
NE_project
Genotypic data integrated with genomic/genetic data
nd_experiment
Nd_experiment_idNd_geolocation_idType_id
genotype
genotype_idnameUniquenamedescription
NE_genotype
feature_genotype
Feature
Feature_idNameUniquenameType_idOrganism_idresidues
project
stock
uniquename: CPSCT038_190|192 description: 190:192
Uniquename:CPSCT038Type:microsatellite
map
Explore sequences around marker in GBrowse
Relationship between genotype and phenotype(haplotype and haplotype effect)
nd_experiment
Nd_experiment_idNd_geolocation_idType_id
genotype
genotype_idnameUniquenamedescription
NE_genotype
feature_genotype
Feature
Feature_idNameUniquenameType_idOrganism_idresidues
project
stock
uniquename: MA_H3|H4bdescription: H3|H4b
Uniquename:MaType:MTL
map
phenotype
phenotype_idUniquenamevalueattr_id
NE_phenotype
phenstatement
phenstatement_idType_idGenotype_idphenotype_idEnvironmentpub
attr_id: crisp value: 2.2
Germplasm with H3|H4b alleles of MA locus hasvalue of 2.2 for crisp
Data Management (Browse, Search and Download)
Data Conversion (Generate Input files for Pedimap)
Decision Support Cross Assist Trait Locus Warehouse Marker Converter
GDR Breeding Database Demo
10
Phenotypic Data Search
11
12
Genotypic Data Search
o A web interface to generate a list of parents and the number of seedlings to get the progeny with desired traits
o Methods “Phenotype” (uses only phenotypic
information of individuals in the dataset), “+Pedigree” (uses both phenotypic and
pedigree information) “+Ped+DNA” (uses phenotypic, pedigree
information and information provided by DNA-based functional genotypes).
Cross Assist
Step 1: Select Method
Step 2: Select target number and trait thresholds
Step 3: Filter results by data completeness, required number of seedlings, and parentage
Future Development
o Data RosBreed QTLs and their genome positions More breeding data and DNA based functional
genotypes More re-sequencing data
o Functionality Data management: online data submission and
editing Viewing data on screen and generating report pages Decision support tools
o Cross Assist: o to accommodate more complex situations
(selfing, cross compatibility, etc)o To upload users’ own data
o Further develop more tools
Natural diversity module working groupNaama Menda, Seth Redmond, Robert M. Buels, Maren Friesen, Yuri Bendana, Lacey-Anne Sanderson, Hilmar Lapp, Taein Lee, Bob MacCallum, Kirstin E. Bett, Scott Cain, Dave Clements, Lukas A. Mueller and Dorrie MainMain Lab team
All Project CoPIs (tfGDR, RosBreed and CottonGen)Funding SourcesUSDA NIFA SCRI, NSF Plant Genome Program, USDA-ARS, Washington Tree Fruit Research Commission, Cotton Incorporated, Washington State University, Clemson University, University of Florida, Boyce Thompson Institute, North Carolina State University
Taein LeeStephen Ficklin Chun-Huai ChengPing Zheng Anna BlendaSushan RuDorrie Main Jing Yu
Acknowledgement
Thank You!Any Questions?