integrating phenotypic data with genomic, genetic and genotypic data using chado sook jung, taein...

13
Integrating Phenotypic Data With Genomic, Genetic and Genotypic Data Using Chado Sook Jung, Taein Lee, Stephen Ficklin, Jing Yu, Dorrie Main

Upload: philippa-craig

Post on 18-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Integrating Phenotypic Data With Genomic, Genetic and Genotypic Data Using Chado Sook Jung, Taein Lee, Stephen Ficklin, Jing Yu, Dorrie Main

Integrating Phenotypic Data With Genomic, Genetic and Genotypic Data Using Chado

Sook Jung, Taein Lee, Stephen Ficklin, Jing Yu, Dorrie Main

Page 2: Integrating Phenotypic Data With Genomic, Genetic and Genotypic Data Using Chado Sook Jung, Taein Lee, Stephen Ficklin, Jing Yu, Dorrie Main

Outline

Introduction of GDR and CottonGen Chado the generic schema Storing Stock Data Storing Phenotypic Data (trait, dataset, etc) Storing Genotypic Data Integration with genetic and genomic Data Conclusion

Page 3: Integrating Phenotypic Data With Genomic, Genetic and Genotypic Data Using Chado Sook Jung, Taein Lee, Stephen Ficklin, Jing Yu, Dorrie Main

Database projects of Main lab Major databases with genomic, genetic, phenotypic and genotypic data

1. GDR: Genome Database for Rosaceae

Genomic. Gemetoc and Breeding data (Private data and data from RosBreed project)

• Fruit and Nut, Sat, 12 PM

• Computer Demo, Mon, 1:35 PM

• P0946, RosBreed BIM System, Mon, 10-11:30 AM

2. CottonGen: Replaced CottonDB and Cotton Marker Database

• Cotton Genome Initiative, Sun, 3:50 PM

• Computer Demo, Mon, 1:50 PM

Other databases:

Citrus Genome Database, Cool season food legume database, Genome database for Vacciniium

Built using Chado schema and Tripal (Drupal front end for Chado)

Tripal presentation, GMOD workshop, Wed 11:50 AM

Page 4: Integrating Phenotypic Data With Genomic, Genetic and Genotypic Data Using Chado Sook Jung, Taein Lee, Stephen Ficklin, Jing Yu, Dorrie Main

Chado: Modular, Generic and Ontology-driven schema

natural diversity

general

cv

pub

organism map

geneticmage

companalysis

sequence

stock

phenotype

Page 5: Integrating Phenotypic Data With Genomic, Genetic and Genotypic Data Using Chado Sook Jung, Taein Lee, Stephen Ficklin, Jing Yu, Dorrie Main

Publication

Page 6: Integrating Phenotypic Data With Genomic, Genetic and Genotypic Data Using Chado Sook Jung, Taein Lee, Stephen Ficklin, Jing Yu, Dorrie Main

Chado: Modular, Generic and Ontology-driven schema

FeatureFeature_idNameUniquenameType_idOrganism_idresidues

Feature_relationship

Feature_relationship_idSubject_idObject_idType_id

Featureprop

Featureprop_idFeature_idType_idValuerank

cvtermcvterm_idNamedefinitioncv_idDbxref_id

gene, mRNA, marker, QTL, etc

Abc-mRNApart_of

Abc-gene

Repeat_motifProduct_size

Subject_id

object_id

cv

cv_idNamedefinition

Sequence Ontology, Gene Ontology, etc

Page 7: Integrating Phenotypic Data With Genomic, Genetic and Genotypic Data Using Chado Sook Jung, Taein Lee, Stephen Ficklin, Jing Yu, Dorrie Main

Storing Stock (from samples to population; pedigree)

stockstock_idNameUniquenameType_idOrganism_idresiduesstock_relationship

Feature_relationship_idSubject_idObject_idType_id

stockprop

stockprop_idstock_idType_idvalue

cvtermcvterm_idNamedefinitioncv_idDbxref_id

Population, cultivar, breeding line, clone,

sample, etc

Gala-001sample_of

GalaDescription,

population_size

Subject_id

object_id

stockcollection

stockcollction_idNameuniquenameType_idContact_id

GalaMaternal_parent_of

Sonya

pedigree

stock center

Page 8: Integrating Phenotypic Data With Genomic, Genetic and Genotypic Data Using Chado Sook Jung, Taein Lee, Stephen Ficklin, Jing Yu, Dorrie Main

Storing phenotype data (from measurements to projects)

stockFeature_idNameUniquenameType_idOrganism_idresidues

nd_experiment

Nd_experiment_idNd_geolocation_idType_id phenotype

phenotype_idUniquenamevalueattr_id

cvtermcvterm_idNamedefinitioncv_idDbxref_id

PhenotypingGenotypingCross_experiment

project

Featureprop_idFeature_idType_idvalue

NE_stockNE_phenotype

project_relationship

Nd_geolocationNd_geolocation_idDescriptionLatitudeLongitudeGeodetic_datum

NE_project

Page 9: Integrating Phenotypic Data With Genomic, Genetic and Genotypic Data Using Chado Sook Jung, Taein Lee, Stephen Ficklin, Jing Yu, Dorrie Main

Storing phenotype data (enabling comparison among datasets)

stockFeature_idNameUniquenameType_idOrganism_idresidues

phenotype

phenotype_idUniquenamevalueattr_id

cvtermcvterm_idNamedefinitioncv_idDbxref_id

Nd_experiment

cvphenotype_idUniquenamevalueattr_id

cvtermprop

cvtermprop_idcvterm_idType_idValuerank

attr_id: SkinCol_0 value: 2

RB(cv), SkinCol_0(cvterm)

value rank

Orange 1

Orange-red 2

Pink-red 3

Red 4

Dark red 5

If skin_color_harvest is 1-10 In Standard(cv), we can store the value in standard descriptor again

attr_id: Skin_color_harvest value: 4

Page 10: Integrating Phenotypic Data With Genomic, Genetic and Genotypic Data Using Chado Sook Jung, Taein Lee, Stephen Ficklin, Jing Yu, Dorrie Main

Genotypic data integrated with genomic/genetic data

nd_experiment

Nd_experiment_idNd_geolocation_idType_id

genotype

genotype_idnameUniquenamedescription

NE_genotype

feature_genotype

FeatureFeature_idNameUniquenameType_idOrganism_idresidues

project

stock

uniquename: CPSCT038_190|192 description: 190:192

Uniquename:CPSCT038Type:microsatellite

mapExplore sequences around marker in GBrowse

Page 11: Integrating Phenotypic Data With Genomic, Genetic and Genotypic Data Using Chado Sook Jung, Taein Lee, Stephen Ficklin, Jing Yu, Dorrie Main

Relationship between genotype and phenotype(haplotype and haplotype effect)

nd_experiment

Nd_experiment_idNd_geolocation_idType_id

genotype

genotype_idnameUniquenamedescription

NE_genotype feature_genotype

FeatureFeature_idNameUniquenameType_idOrganism_idresidues

project

stock

uniquename: MA_H3|H4bdescription: H3|H4b

Uniquename:MaType:MTL

map

phenotype

phenotype_idUniquenamevalueattr_id

NE_phenotype

phenstatement

phenstatement_idType_idGenotype_idphenotype_idEnvironmentpub

attr_id: crisp value: 2.2

Germplasm with H3|H4b alleles of MA locus hasvalue of 2.2 for crisp

Page 12: Integrating Phenotypic Data With Genomic, Genetic and Genotypic Data Using Chado Sook Jung, Taein Lee, Stephen Ficklin, Jing Yu, Dorrie Main

Flexibility and generic characteristic of Chado enables us to store and integrate complex biological data from widely different projects and species

The ontology-driven characteristic makes adding new data types relatively easy.

Performance issue mostly resolved by the use of materialized views

Conclusion

Page 13: Integrating Phenotypic Data With Genomic, Genetic and Genotypic Data Using Chado Sook Jung, Taein Lee, Stephen Ficklin, Jing Yu, Dorrie Main

Natural diversity module working group

Naama Menda, Seth Redmond, Robert M. Buels, Maren Friesen, Yuri Bendana, Lacey-Anne Sanderson, Hilmar Lapp, Taein Lee, Bob MacCallum, Kirstin E. Bett, Scott Cain, Dave Clements, Lukas A. Mueller and Dorrie Main

Main Lab team

All Project CoPIs (tfGDR, RosBreed and CottonGen)

Funding Sources

USDA NIFA SCRI, NSF Plant Genome Program, USDA-ARS, Washington Tree Fruit Research Commission, Cotton Incorporated, Washington State University, Clemson University, University of Florida, Boyce Thompson Institute, North Carolina State University

Acknowledgement

Taein Lee Stephen Ficklin Chun-Huai Cheng Ping Zheng Anna Blenda Sushan RuDorrie Main Jing Yu