church nhgri 2012
TRANSCRIPT
![Page 1: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/1.jpg)
@deannachurch
Deanna M. Church, NCBI
The Evolution of Genome Data
![Page 2: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/2.jpg)
Collins FS et al, 1998
Throughput: 500 Mb/yearCost: < $0.25 per base
Variation: 100,000 SNPs mapped
![Page 3: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/3.jpg)
1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 20110
20,000
40,000
60,000
80,000
100,000
120,000
140,000
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
GenBank Base PairsUsers (Average)
Twenty Two Years of Growth:NCBI Data and User Services
Bas
e P
airs
(M
illio
ns)
Users/W
eekday
BLAST
EntrezGenBank at NCBIdbEST
3D StructureNetwork Entrez
WWWdbSTS
BankItGenomesTaxonomy
OMIMGeneMapCn3DUniGene
PubMedPSI-BLASTVASTePCR
Microbial GenomesPHI-BLASTCGAP
Human GenomeLinkOutLocusLinkRefSeqdbSNP
PubMed CentralBLINKMapViewerGEOGeneRIFs
WGSHLA HaplotypesHuman Genome-TPA
dbMHCBookShelfHuman Genome- Transcripts Alignments
Entrez GenesMouse Composite GenomeGnomon
PubChemTrace ArchiveCCDSCancer ChromosomesEnvironmental Samples
Public AccessInfluenza Seqs.GenSATGeneTests
Genome-Wide Association Studies dbGapEntrez Portal
Seq Read ArchiveUniSTSRefSeqGeneGenome Reference Consortium
Discovery InitiativeEntrez SensorsPrimer BLAST
PeptidomeBioSystemsFlu H1N1
dbVarEpigenomicsMyNCBI1000 Genomes Project
ClinVarGTRGenome Remapping ServicePubMed HealthCloneDBGenome Decoration Page
![Page 4: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/4.jpg)
Steve Sherry, NCBI
2010
10
20
30
40
50
60
STR & IndelSNPAmbiguous mapping
Millions of rs-idsNCBI dbSNP database growth
human variations
Non-redundant annotations
25
50
75
100
125
150
175
1000 Genomes
Other projects
HapMap
TSC
Millions of submissionsSubmissions
by project
dbSNP build 135. November 2011
20001999 20112005
![Page 5: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/5.jpg)
Kidd et al, 2007 APOBEC cluster
BLACK: DeletionWhite: Insertion
![Page 6: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/6.jpg)
http://www.ncbi.nlm.nih.gov/dbvar
![Page 7: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/7.jpg)
![Page 8: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/8.jpg)
Church et al., 2011 PLoS
http://genomereference.org
![Page 9: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/9.jpg)
Distributed data
Genome not in INSDC Database
Old Assembly Model
GRC Beginnings
![Page 10: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/10.jpg)
Build sequence contigs based on contigs defined in TPF.
Check for orientation consistenciesSelect switch pointsInstantiate sequence for further analysis
Switch point
Consensus sequence
![Page 11: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/11.jpg)
![Page 12: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/12.jpg)
http://genomereference.org
![Page 13: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/13.jpg)
![Page 14: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/14.jpg)
Community Input
![Page 15: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/15.jpg)
Distributed data
Genome not in INSDC Database
Old Assembly Model
Centralized Data
![Page 16: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/16.jpg)
Large-Scale Variation Complicates Genome Assembly
Sequences from haplotype 1Sequences from haplotype 2
Old Assembly model: compress into a consensus
New Assembly model: represent both haplotypes
![Page 17: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/17.jpg)
NCBI36 (hg18)
UGT2B17 Region
![Page 18: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/18.jpg)
AC074378.4AC079749.5
AC134921.2AC147055.2
AC140484.1AC019173.4
AC093720.2AC021146.7
NCBI36 NC_000004.10 (chr4) Tiling Path
Xue Y et al, 2008
TMPRSS11E TMPRSS11E2
GRCh37 NC_000004.11 (chr4) Tiling Path
AC074378.4AC079749.5
AC134921.1AC147055.2
AC093720.2AC021146.7
TMPRSS11E
GRCh37: NT_167250.1 (UGT2B17 alternate locus)
AC074378.4AC140484.1
AC019173.4AC226496.2
AC021146.7
TMPRSS11E2
UGT2B17 Region
![Page 19: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/19.jpg)
GRCh37 (hg19)
http://genomereference.org
7 alternate haplotypesat the MHC
Alternate loci released as:FASTA
AGPAlignment to chromosome
UGT2B17 MHC MAPT
![Page 20: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/20.jpg)
![Page 21: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/21.jpg)
Assembly (e.g. GRCh37)
Primary Assembly
Non-nuclear assembly unit
(e.g. MT)
ALT 1
ALT 2
ALT 3
ALT 4
ALT 5
ALT 9
ALT 6
ALT 7ALT
8
PAR
Genomic Region(MHC)
Genomic Region
(UGT2B17)Genomic
Region(MAPT)
![Page 22: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/22.jpg)
Richa Agarwala
MHC Alternate locus
Alignment to chr6
![Page 23: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/23.jpg)
![Page 24: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/24.jpg)
Oh No! Not a new version of the human genome!
http://genomereference.org
![Page 25: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/25.jpg)
![Page 26: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/26.jpg)
Assembly (e.g. GRCh37.p5)
Primary Assembly
Non-nuclear assembly unit
(e.g. MT)
ALT 1
ALT 2
ALT 3
ALT 4
ALT 5
ALT 9
ALT 6
ALT 7ALT
8
PAR
…
Genomic Region(MHC)
Genomic Region
(UGT2B17)Genomic
Region(MAPT)
Patches
Genomic Region(ABO)
Genomic Region(SMA)
Genomic Region
(PECAM1)
![Page 27: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/27.jpg)
TBC1D3C TBC1D3
TBC1D3C
TBC1D3H
Myo19 region (17q21)
![Page 28: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/28.jpg)
70 Fix PATCHES: Chromosome will update in GRCh38
71 Novel PATCHES: Additional sequence added
(adds >1 Mb of novel sequence to the assembly)
(adds >800K of novel sequence to the assembly)
Releasing patches quarterly
![Page 29: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/29.jpg)
Distributed data
Genome not in INSDC Database
Old Assembly Model
Centralized Data
Updated Assembly Model
Genome in INSDC DatabaseGenome not in INSDC Database
![Page 30: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/30.jpg)
GenBank
Data Archives
Data in a common format Data in a single location (and mirrored) Most quality checked prior to deposition Robust data tracking mechanism (accession.version) Data owned by submitter
![Page 31: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/31.jpg)
Data tracking
ABC14-1065514J1GapsPhase LengthDate
FP565796.1 1 121-Oct-2009
FP565796.2 1 014-Oct-2010
FP565796.3 3 007-Nov-2010
![Page 32: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/32.jpg)
Mouse chrX: 34,800,000-34,890,000
NC_000086.123456 CM001013.17 2
![Page 33: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/33.jpg)
Mouse chrX: 35,000,000-36,000000
X
MGSCv3 MGSCv36
![Page 34: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/34.jpg)
hg19GRCh37
mm8MGSCv37
NCBIM37
danRer5Zv7
What’s in a name?
![Page 35: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/35.jpg)
By any other name…
chr21:8,913,216-9,246,964
![Page 36: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/36.jpg)
Zv7 chr21:8,913,216-9,246,964 X Mouse Build 36 chrX
By any other name…
![Page 37: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/37.jpg)
http://www.ncbi.nlm.nih.gov/genome/assembly
GRCh37hg19
![Page 38: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/38.jpg)
![Page 39: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/39.jpg)
Assembly (e.g. GRCh37.p5)GCA_000001405.6 /GCF_000001405.17
Primary Assembly
GCA_000001305.1/GCF_000001305.13
ALT 1
GCA_000001315.1/GCF_000001315.1
ALT 2
GCA_000001325.1/GCF_000001325.2
ALT 3
GCA_000001335.1/GCF_000001335.1
ALT 4
GCA_000001345.1/GCF_000001345.1
ALT 5
GCA_000001355.1/GCF_000001355.1
ALT 6
GCA_000001365.1/GCF_000001365.2
ALT 7
GCA_000001375.1/GCF_000001375.1
ALT 8
GCA_000001385.1/GCF_000001385.1
ALT 9
GCA_000001395.1/GCF_000001395.1
PatchesGCA_000005045.5GCF_000005045.4
Non-nuclear assembly unit
(e.g. MT)
GCA_000006015.1/GCF_000006015.1
![Page 40: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/40.jpg)
GenBank RefSeq vs
Submitter Owned RefSeq Owned
Redundancy Non-RedundantUpdated rarely Curated
INSDC Not INSDC
BRCA183 genomic records31 mRNA records27 protein records
3 genomic records 5 mRNA records1 RNA record5 protein records
![Page 41: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/41.jpg)
![Page 42: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/42.jpg)
RefSeq for Assemblies
Typical assembly edits
Addition of non-nuclear (e.g. MT) assembly units
Removal of contamination
Drop unlocalized/unplaced scaffoldsMask contamination that is placed on chromosome
![Page 43: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/43.jpg)
http://www.ncbi.nlm.nih.gov/genome
![Page 44: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/44.jpg)
Understanding relationships between assemblies using alignments
First Pass
Second Pass
Reciprocal best hit
Non-reciprocal, duplicative hits
![Page 45: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/45.jpg)
![Page 46: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/46.jpg)
No second pass alignments in GRCh37.p5
NCBI36
GRCh37.p5
http://www.ncbi.nlm.nih.gov/tools/gbench/
![Page 47: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/47.jpg)
Genome Data is MORE than just the Genome
![Page 48: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/48.jpg)
Genome Data is MORE than just the Genome
ATGCGTGCAAAATGCAGTGAGT
ATGCGTGCAAAATGCAGTGAGT
ATGCGTGCAAAATGCAGTGAGT
ATGCGTGCAAAATGCAGTGAGT
NM_000336.2:c.800C>T
![Page 49: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/49.jpg)
ATGCGTGCAAAATGCAGTGAGT
ATGCGTGCAAAATGCAGTGAGT
ATGCGTGCAAAATGCAGTGAGT
ATGCGTGCAAAATGCAGTGAGT
NM_000336.2:c.800C>TNC_000001.10:g.(?_20700513)_(21062644_?)del
![Page 50: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/50.jpg)
http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes
![Page 51: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/51.jpg)
http://www.ncbi.nlm.nih.gov/education/
http://www.youtube.com/NCBINLM @NCBI http://www.facebook.com/ncbi.nlm
![Page 52: Church nhgri 2012](https://reader036.vdocuments.net/reader036/viewer/2022062405/5550449eb4c9058f768b4cad/html5/thumbnails/52.jpg)
Thanks!
For Slides: Francoise Thibaud-Nissen Evan Eichler Steve Sherry
The Genome Reference ConsortiumThe Genome Center at Washington University The Wellcome Trust Sanger InstituteThe European Bioinformatics InstituteThe National Center for Biotechnology Information
Church group at NCBIValerie SchneiderNathan BoukHsiu-Chuan ChenPeter MericVictor AnanievChao ChenJohn LopezJohn GarnerTim HefferonCliff Clausen
NCBI