lecture 23. genomic futures - schatzlab
TRANSCRIPT
![Page 1: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/1.jpg)
Lecture 23. Genomic FuturesMichael Schatz
April 20, 2020JHU 600.749: Applied Comparative Genomics
![Page 2: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/2.jpg)
![Page 3: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/3.jpg)
![Page 4: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/4.jpg)
Part I. Metagenomics
![Page 5: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/5.jpg)
Your second genome?
Are We Really Vastly Outnumbered? Revisiting the Ratio of Bacterial to Host Cells in HumansSender et al (2016) Cell. http://doi.org/10.1016/j.cell.2016.01.013
Human body:~10 trillion cells
Human brain:~3.3 lbs
Microbiome~100 trillion cells
Total mass:~3.3 lbs
![Page 6: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/6.jpg)
Pre-PCR: Gram-StainingGram staining differentiates bacteria by the chemical and physical properties of their cell walls by detecting peptidoglycan, which is present in the cell wall of Gram-positive bacteria
![Page 7: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/7.jpg)
16S rRNA
The 16S rRNA gene is a section of prokaryotic DNA found in all bacteria and archaea. This gene codes for an rRNA, and this rRNA in turn makes up part of the ribosome.
The 16S rRNA gene is a commonly used tool for identifying bacteria for several reasons. First, traditional characterization depended upon phenotypic traits like gram positive or gram negative, bacillus or coccus, etc. Taxonomists today consider analysis of an organism's DNA more reliable than classification
based solely on phenotypes. Secondly, researchers may, for a number of reasons, want to identify or classify only the bacteria within a given environmental or medical sample. Thirdly, the 16S rRNA gene is relatively short at 1.5 kb, making it faster and cheaper to sequence than many other unique bacterial genes.
http://greengenes.lbl.gov/cgi-bin/JD_Tutorial/nph-16S.cgi
![Page 8: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/8.jpg)
![Page 9: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/9.jpg)
![Page 10: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/10.jpg)
![Page 11: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/11.jpg)
16S versus shotgun NGS
16S
Fast (minutes – hours)Directed analysis
Cheap per sample Family/Genus Identification
NGS
Slower (hours to days)Whole Metagenome
More expensive per sampleSpecies/Strain IdentificationGenes presence/absence
Variant analysis
Eukaryotic hostsCan ID fungi, viruses, etc.
![Page 12: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/12.jpg)
Kraken
Kraken: ultrafast metagenomic sequence classification using exact alignmentsWood and Salzberg (2014) Genome Biology. DOI: 10.1186/gb-2014-15-3-r46
![Page 13: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/13.jpg)
Global Ocean Survey
![Page 14: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/14.jpg)
Metasub
Geospatial Resolution of Human and Bacterial Diversity with City-Scale MetagenomicsAfshinnekoo et al (2016) Cell Systems. http://dx.doi.org/10.1016/j.cels.2015.01.001
![Page 15: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/15.jpg)
![Page 16: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/16.jpg)
Microbes and Human Health
“MICROBE DIET Mice fed microbes from obese people tend to gain fat. Microbes from lean people protect mice from excessive weight gain, even when animals eat a high-fat, low-fiber diet.”
Gut Microbiota from Twins Discordant for Obesity Modulate Metabolism in MiceRidaura et al (2013) Science. doi: 10.1126/science.1241214
![Page 17: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/17.jpg)
Microbes and Human Health
The human microbiome: at the interface of health and diseaseCho & Blaser (2012) Nature Reviews Genetics. doi:10.1038/nrg3182
![Page 18: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/18.jpg)
Human Microbiome Project
Structure, function and diversity of the healthy human microbiomeThe Human Microbiome Project Consortium (2012) Nature. doi:10.1038/nature11234
![Page 19: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/19.jpg)
Functional composition tends to be more stable than genome composition
Structure, function and diversity of the healthy human microbiomeThe Human Microbiome Project Consortium (2012) Nature. doi:10.1038/nature11234
![Page 20: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/20.jpg)
0 4 8 12 16 20 24 28 32 36 40 44 48
0
25
50
75
100
B0 H0Sa
mpl
eB1_
0_S7
Sam
pleB
2_0_
S8Sa
mpl
eB3_
0_S9
Sam
pleB
4_0_
S10
Sam
pleH
3_0_
S5Sa
mpl
eH4_
0_S6
Sam
pleU
1_0_
S1Sa
mpl
eU2_
0_S2
Sam
pleU
3_0_
S3Sa
mpl
eU4_
0_S4 U0 B4 H4 U4 B8 H8 U8 B1
2H1
2U1
2U1
6H1
6B1
6U2
0H2
0B2
0B2
4H2
4U2
4B2
8H2
8U2
8U3
2H3
2B3
2U3
6H3
6B3
6U4
0H4
0B4
0U4
4H4
4B4
4B1
_48_
S9B2
_48_
S10
B3_4
8_S1
1B4
_48_
S12
H1_4
8_S5
H2_4
8_S6
H3_4
8_S7
H4_4
8_S8
U1_4
8_S1
U2_4
8_S2
U3_4
8_S3
U4_4
8_S4
sample
Rela
tive.
Abun
danc
e
SpeciesListeria monocytogenesAnoxybacillus flavithermusThermus parvatiensisThermus thermophilusGeobacillus stearothermophilusVibrio alginolyticusStaphEpidermidis_d101_6055 BranchPseudomonas fulvaStaphEpidermidis_d99_6057 BranchEnterococcus faeciumPseudomonas sp. URMO17WK12:I11Firmicutes bacterium JGI 0000112−M16Vibrio antiquarius
Pasteurella multocidaEscherichia coliStreptococcus_2055 BranchStreptococcus thermophilusEnterococcus faecalisStaphEpidermidis_d100_6056 BranchStaphylococcus aureusClostridium perfringensGeobacillus_12818 BranchFirmicutes bacterium JGI 0000112−P22Enterococcus sp. GMD5Eothers
0 4 8 12 16 20 24 28 32 36 40 44 48
0
25
50
75
100
B0 H0Sa
mpl
eB1_
0_S7
Sam
pleB
2_0_
S8Sa
mpl
eB3_
0_S9
Sam
pleB
4_0_
S10
Sam
pleH
3_0_
S5Sa
mpl
eH4_
0_S6
Sam
pleU
1_0_
S1Sa
mpl
eU2_
0_S2
Sam
pleU
3_0_
S3Sa
mpl
eU4_
0_S4 U0 B4 H4 U4 B8 H8 U8 B1
2H1
2U1
2U1
6H1
6B1
6U2
0H2
0B2
0B2
4H2
4U2
4B2
8H2
8U2
8U3
2H3
2B3
2U3
6H3
6B3
6U4
0H4
0B4
0U4
4H4
4B4
4B1
_48_
S9B2
_48_
S10
B3_4
8_S1
1B4
_48_
S12
H1_4
8_S5
H2_4
8_S6
H3_4
8_S7
H4_4
8_S8
U1_4
8_S1
U2_4
8_S2
U3_4
8_S3
U4_4
8_S4
sample
Rela
tive.
Abun
danc
e
SpeciesListeria monocytogenesAnoxybacillus flavithermusThermus parvatiensisThermus thermophilusGeobacillus stearothermophilusVibrio alginolyticusStaphEpidermidis_d101_6055 BranchPseudomonas fulvaStaphEpidermidis_d99_6057 BranchEnterococcus faeciumPseudomonas sp. URMO17WK12:I11Firmicutes bacterium JGI 0000112−M16Vibrio antiquarius
Pasteurella multocidaEscherichia coliStreptococcus_2055 BranchStreptococcus thermophilusEnterococcus faecalisStaphEpidermidis_d100_6056 BranchStaphylococcus aureusClostridium perfringensGeobacillus_12818 BranchFirmicutes bacterium JGI 0000112−P22Enterococcus sp. GMD5Eothers
0 4 8 12 16 20 24 28 32 36 40 44 48
0
25
50
75
100
B0 H0Sa
mpl
eB1_
0_S7
Sam
pleB
2_0_
S8Sa
mpl
eB3_
0_S9
Sam
pleB
4_0_
S10
Sam
pleH
3_0_
S5Sa
mpl
eH4_
0_S6
Sam
pleU
1_0_
S1Sa
mpl
eU2_
0_S2
Sam
pleU
3_0_
S3Sa
mpl
eU4_
0_S4 U0 B4 H4 U4 B8 H8 U8 B1
2H1
2U1
2U1
6H1
6B1
6U2
0H2
0B2
0B2
4H2
4U2
4B2
8H2
8U2
8U3
2H3
2B3
2U3
6H3
6B3
6U4
0H4
0B4
0U4
4H4
4B4
4B1
_48_
S9B2
_48_
S10
B3_4
8_S1
1B4
_48_
S12
H1_4
8_S5
H2_4
8_S6
H3_4
8_S7
H4_4
8_S8
U1_4
8_S1
U2_4
8_S2
U3_4
8_S3
U4_4
8_S4
sample
Rela
tive.
Abun
danc
eSpecies
Listeria monocytogenesAnoxybacillus flavithermusThermus parvatiensisThermus thermophilusGeobacillus stearothermophilusVibrio alginolyticusStaphEpidermidis_d101_6055 BranchPseudomonas fulvaStaphEpidermidis_d99_6057 BranchEnterococcus faeciumPseudomonas sp. URMO17WK12:I11Firmicutes bacterium JGI 0000112−M16Vibrio antiquarius
Pasteurella multocidaEscherichia coliStreptococcus_2055 BranchStreptococcus thermophilusEnterococcus faecalisStaphEpidermidis_d100_6056 BranchStaphylococcus aureusClostridium perfringensGeobacillus_12818 BranchFirmicutes bacterium JGI 0000112−P22Enterococcus sp. GMD5Eothers
Listeria in ice cream
![Page 21: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/21.jpg)
Amerithrax Analysis
Bacillus anthracis comparative genome analysis in support of the Amerithrax investigationRasko et al (2011) PNAS. doi: 10.1073/pnas.1016657108
![Page 22: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/22.jpg)
Diagnosing Brain Infections with NGS
Next-generation sequencing in neuropathologic diagnosis of infections of the nervous systemSalzberg et al (2016) Neurol Neuroimmunol Neuroinflamm dx.doi.org/10.1212/NXI.0000000000000251
![Page 23: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/23.jpg)
![Page 24: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/24.jpg)
The Future of Metagenomics
• Applications:– WGS metagenomics in the clinic for anaerobic infections
and high risk patients (NICU etc.)– Surveillance: bioterror agents and epidemiology
• Methods:– Single cell, Hi-C, and long read sequencing– Computational challenges
• Species level binning of large datasets• Plasmid analysis (antimicrobial resistance genes)• Going from associations to specific mechanisms• Functional analysis
![Page 25: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/25.jpg)
Part II:
Genetic Privacy
![Page 26: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/26.jpg)
![Page 27: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/27.jpg)
What are microsatellites• Tandemly repeated sequence motifs
– Motifs are 1 – 6 nt long– So far, min. 8 nt length, min. 3 tandem repeats for our analyses
• Ubiquitous in human genome– >5.7 million uninterrupted microsatellites in hg19
• Extremely unstable– Mutation rate thought to be ~10-3 per generation in humans
• Unique mutation mechanism– Replication slippage during mitosis and meiosis
• May be under neutral selection
cCTCTCTCTCTCTCTCTCTCTCTCTCa è (CT)13
tTTGTCTTGTCTTGTCTTGTCTTGTCTTGTCc è (TTGTC)6
tCAACAACAACAACAACAACAAa è (CAA)7
cCATTCATTCATTCATTa è (CATT)4
Microsatellites: Simple Sequences with Complex EvolutionEllegren (2004) Nature Reviews Genetics. doi:10.1038/nrg1348
![Page 28: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/28.jpg)
Replication slippage• Out-of-phase re-annealing
– Nascent and template strands dissociate and re-anneal out-of-phase
• Loops repaired by mismatch repair machinery (MMR)– Very efficient for small loops– Possible strand-specific repair
• Stepwise process– Nascent strand gains or loses full
repeat units– Typically single unit mutations
• Varies by motif length, motif composition, etc.
Expansion:
Contraction:
Microsatellites: Simple Sequences with Complex EvolutionEllegren (2004) Nature Reviews Genetics. doi:10.1038/nrg1348
![Page 29: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/29.jpg)
lobSTR Algorithm Overview
lobSTR: A short tandem repeat profiler for personal genomesGymrek et al. (2012) Genome Research. doi:10.1101/gr.135780.111
![Page 30: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/30.jpg)
Why should we care about microsatellites?
• Polymorphism and mutation rate variation
• Disease– Huntington’s Disease– Fragile X syndrome– Friedrich’s ataxia
• Mutations as lineage– Organogenesis/embryonic
development– Tumor development
30Phylogenetic fate mappingSalipante (2006) PNAS. doi: 10.1073/pnas.0601265103
![Page 31: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/31.jpg)
![Page 32: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/32.jpg)
![Page 33: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/33.jpg)
Surname Inference
Whose sequence reads are these?
Identifying Personal Genomes by Surname InferenceGymrek et al (2013) Science. doi: 10.1126/science.1229566
![Page 34: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/34.jpg)
Step 1. Profile Y-STRs from the individual’s genome.
![Page 35: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/35.jpg)
Step 2. Search for a surname hit in online genetic genealogy databases.
http://www.ysearch.org
![Page 36: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/36.jpg)
Step 3. Search with additional metadata to narrow down the individual.
http://www.ussearch.com
![Page 37: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/37.jpg)
Surname Inference
It’s Craig Venter!
Identifying Personal Genomes by Surname InferenceGymrek et al (2013) Science. doi: 10.1126/science.1229566
![Page 38: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/38.jpg)
Possible route for identity tracing
● Tracing attacks combine metadata and surname inference to triangulate the identity of an unknown individual.
● With no information, there are roughly 300 million matching individuals in the US, equating to 28.0 bits of
entropy.
● Sex reduces entropy by 1 bit, state of
residence and age reduces to 16, successful surname inference reduces to ~3 bits.
● US population: ~313.9 million individuals
● log2 313,900,000 = 28.226 bits● Sex ~ 1.0 information bits● log2 156,950,000 = 27.226 bits
![Page 39: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/39.jpg)
The risks of big data?
![Page 40: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/40.jpg)
Genomic Futures?
The rise of a digital immune systemSchatz & Phillippy (2012) GigaScience 1:4
![Page 41: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/41.jpg)
Computational Research Landscape• Avoid
• New Illumina/PacBio base callers• Entirely new genome assembler from scratch
• Good• Alignment/Assembly/Analysis methods robust to errors, polyploidy, aneuploidy• Use insights from long-reads to improve analysis of short-reads
• Best• Synthesis of large numbers of samples (“pan-genome assembly”)
and/or multiple data types (“multi-omics”)• Prioritization and interpretation of variations
http://schatz-lab.org
NGM+Sniffles RibbonSURVIVOR AssemblyticsLRSimFALCON
![Page 42: Lecture 23. Genomic Futures - Schatzlab](https://reader033.vdocuments.net/reader033/viewer/2022050207/626da89015820405b424c1aa/html5/thumbnails/42.jpg)
Computational Research Landscape• Avoid
• New Illumina/PacBio base callers• Entirely new genome assembler from scratch
• Good• Alignment/Assembly/Analysis methods robust to errors, polyploidy, aneuploidy• Use insights from long-reads to improve analysis of short-reads
• Best• Synthesis of large numbers of samples (“pan-genome assembly”)
and/or multiple data types (“multi-omics”)• Prioritization and interpretation of variations
http://schatz-lab.org
NGM+Sniffles RibbonSURVIVOR AssemblyticsLRSimFALCON
Also consider starting a company!