discovering the role of cnv in...
TRANSCRIPT
Discovering the role of CNV in populationsg p p
Joseph R. ShawThe School of Public and Environmental Affairs, and
Th C f G i d Bi i f iThe Center for Genomics and Bioinformatics
A sea of CNV(Two people differ in copy number across 0.78% of their genomes)( p p py g )
“The genome is dynamic playing field on which new genes are g y p y g f gcontinually arising via the molecular processes that give rise to duplication events, with their fate (pseudogene or fixation –altered dosage new function sub‐function) regulated by thealtered dosage, new function, sub function) regulated by the population genetic forces that provide fuel for evolution.”
Modified from Lynch 20072
Spontaneous deletion/insertion rates
• Phylogenetic comparisons of non‐functional DNA (psueudogeneloci) suggest excess of deletions over insertions but estimated half‐loci) suggest excess of deletions over insertions, but estimated half‐life’s are extremely long.
R t Si (BP)Rate Size (BP)ORGANISM Deletion Insertion Deletion Insertion Half‐lifeCaenorhabditis elegans 0.034 0.019 166 151 0.25Drosophila melanogastor 0.115 0.028 42 12 0.15Drosophila melanogastor 0.115 0.028 42 12 0.15Birds 0.043 0.007 12 4 2.31Mammals 0.033 0.017 5 6 6.93
• Unbiased estimates of nuclear mutations in C. elegans suggest a 15:4 excess of insertions over deletions and much higher rates.g
3
Mutation influences gene copy number plasticity
SPECIES Birth Death B/DHomo sapiens 0 0049 0 081 0 0605
Rates of origin and loss of duplicate genes
Homo sapiens 0.0049 0.081 0.0605Mus musculus 0.0030 0.134 0.0224Fugu rubripes 0.0043 0.189 0.0228Caenorhabditis elegans 0.0028 0.136 0.0206Caenorhabditis elegans 0.0028 0.136 0.0206Drosophila melanogastor 0.0011 0.229 0.0048
Lynch and Conery 2003Scaled to 1% divergence at silent sites.
• For vertebrates, this results in a 2% rate of duplication per gene per 106 generations; estimated half‐life of 2.5 millions of generations.
• The balance between gene birth and death rates is fairly constant but genes present in redundant copies areconstant, but genes present in redundant copies are continually changing.
4
Consequences of copy number plasticity(a framework for studying environmental contributions to CNV)(a framework for studying environmental contributions to CNV)
• CNV are common
• CNV alter phenotypep yp
• CNV are associated with diseases
What is the evolutionary history of CNV and do these important structural attributes of the genome represent a source variation upon which
l i ll / i t ll d d t l ti fecologically/environmentally dependent selective forces can act?
5
Challenges in applying tests of selection to CNV
• Genotypes can be hard to ypestablish since CNV are often multi‐allelic within populations.populations.
• Test that rely on amino acid substitutions and linkage based tests are not applicablebased tests are not applicable to all CNV.
• High identity duplicates are difficult to assemble and they are often underrepresented in reference genomes. Iskow et al., 2012
6
A genomic model for CNV and the environment
Control Metals
Cyclical Parthenogenesis
a a
Control Metals
Benign Conditions
Stressful Conditions
b b
Daphnia now has well characterized genome, rich in tandem gene duplications, and gene copy number variants (Science, 2011, 331:555‐561). 7
The Daphnia pulex sequence
Gene richness attributed to a compact genomeStructural features Daphnia Arthropods (6) Worm Mouse
Mean (SE)Genome size (Mbp) 200 318 (74) 100 3,450Genome size adj. (Mbp) 175 250 (48) 100 2,600No of protein coding genes 31 000 21 017 (3309) 20 100 27 600No. of protein coding genes 31,000 21,017 (3309) 20,100 27,600Gene length (bp) 2,300 3,650 (249) 3,000 32,000CDS size (bp) 1,360 1,433 (80) 1,300 2,140Exons/gene 6.6 5.2 (0.6) 6 8/g ( )Exon size (bp) 210 297 (35) 200 280Intron size (mean bp) 170 875 (179) 290 2,800Intr > Exon 10% 33% (3%) 33% 85%UTR size (bp) 370 490 (116) 260 --Intergenic size (bp) 4,000 9,160 (2476) 2,400 78,000
* Bold indicates values outside the mean ± 2SE of other arthropod genomes Bold indicates values outside the mean ± 2SE of other arthropod genomes(Science, 2011, 331:555‐561)
8
Nearly half the Daphnia genes have no homologs
* Of 716 highly conserved single copy orthologs, Daphnia is missing only two(Science, 2011, 331:555‐561)
9
Elevated gene count is from gene duplicationDaphnia specific genes are most duplicated
• 50% of Daphnia genes are duplicates (35% in fly) (Science, 2011, 331:555‐561)p g p ( y)• 20% of all genes are within tandem duplicated gene (TDG) clusters• Daphnia counts over 1,000 TDG clusters with 3+ paralogs
10
Age distribution of duplicated genesDuplication rates are elevated in DaphniaDuplication rates are elevated in Daphnia
(and relatively unpunctuated)
D. Pulex C. elegans H. sapiens
Single copy genes 16,285 13,768 15,002 Duplicated genes 14,655 (47%) 6,350 (31%) 7,678 (34%)Total genes 30,940 20,118 22,680Total genes 30,940 20,118 22,680 Birth rate 9.3 3.3 7.3
(Science, 2011, 331:555‐561)Rates scaled to: duplicates/1,000 genes/1% divergence11
Duplicate gene families are functionally related
Pathways enriched with duplicated genes provide evidence that geneevidence that gene families are functionally linked and maintained by selection
• 54 enzymes are lost in arthropods (BLUE)
• 38 enzymes are amplified in crustaceans plus insects (YELLOW)
• 32 enzymes are specifically amplified in Daphnia (RED)
(Science, 2011, 331:555‐561) 12
Man made selection experiment: Sudbury Ontario
• 170 years of industrial mining
• 6000 square miles of forest6000 square miles of forest damaged , over 100 square miles barren
• 7000 lakes decimated from acid and metal stress
13
Some Sudbury Daphnia populations adapt to cadmium
Figures removed at presenters requestFigures removed at presenters request
Genotyping across 30 microsatellite loci reveals two monophyletic clades A (only populations from mining region) and C, and a paraphyleticfrom mining region) and C, and a paraphyletic clade (B) that is closer to to C. Clade A is adapted to cadmium stress, while populations from B and C show no fitness advantage whenfrom B and C show no fitness advantage when exposed to cadmium. There is variation in adaptation within clade A.
14
Array CGH identifies large number of segregating CNV
Figure removed at presenters request
Comparison of genomic DNA from
Figure removed at presenters request
Comparison of genomic DNA from • 12 isolates from adapted
populations• 12 isolates from non‐adapted
populations• Each hybridized against clonal y g
isolates from the reference genome
15
Correlation of CNV with expression and phenotype
• 5% of segregating CNV are significantly correlated with differential expression.
• Only half of these are positively correlated with dosage.
% of CNV Phenotype Tolerance1.30% Assimilation rate Adapted1.20% Assimilation rate Non‐adapted1.20% Assimilation rate Non adapted4.30% Performance reduction (0.5 ug/L) Adapted4.50% Performance reduction (0.5 ug/L) Non‐adapted13.00% Performance reduction (1 ug/L) Adapted3.00% e o a ce educt o ( ug/ ) dapted3.20% Performance reduction (1 ug/L) Non‐adapted
16
A closer look at metallothionein‐1
Figure removed at gpresenters request
Induced expression of the MT‐1 gene is greater inInduced expression of the MT 1 gene is greater in adapted isolates compared to non‐adapted isolates. This difference in transcript levels is due to increased basal expression of the gene in adapted animals.
From Shaw et al, 2007 17
C and T‐variant differ in copy number
Figure removed at presenters requestFigure removed at presenters request
Increased copies of the T‐variant are observed in d t d i l t C i f th C i t t iadapted isolates. Copies of the C‐variant are greater in non‐adapted isolates, but these are lower than observed for the T‐variant in adapted animals.
18
T‐variant located in more dynamic genome region
Scaffold 36adapted: π = 0.028Non‐adapted: π = 0 027
MT1‐T
Non adapted: π = 0.027
MT1‐C
Scaffold 549adapted: π = 0.016Non‐adapted: π = 0.008
MT1 C
19
Access to past populations allows direct evaluation
g DaphniaDaphniaDaphniag Daphniag DaphniaDaphniaDaphniaB
Stressful Conditions
Cyclical Parthenogenesis
Benign Conditions
20
CNV selected for adaptive advantage in cadmium
Figure removed at presenters request
Low and even copy number of the C and T variant are observed in adapted populations prior to mining stress (drift). However, following mining increased copies of the T‐variant (associated with adaptation) are swept to fixationcopies of the T variant (associated with adaptation) are swept to fixation.
21
Summary of findings in Daphnia
• Documented cadmium specific adaptation that associated with the phylogeographic structure.with the phylogeographic structure.
• Observed CNV that segregate among adapted and non‐adapted animals in what appears to complicated, yet f ti ll i ifi tfunctionally significant ways.
• Shown that MT‐1 plays a role in the adaptive response via constitutive up‐regulation of the gene due to increased geneconstitutive up regulation of the gene due to increased gene copy number.
• Demonstrated allele specific changes in MT‐1 copy number h d d f d d hthat segregate adapted from non‐adapted phenotypes through space and time.
22
CollaboratorsJohn Colbourne (University of Birmingham)Susanne Paland (IU, CGB)( , )Don Gilbert (IU, Biology; CGB)Deenie Bugge (Dartmouth College)Stephen Glaholt (IU, SPEA)Norman Yan (York College)Norman Yan (York College)Bill Keller (Luarentian College)Mike Pfrender (Notre Dame)Jeff Dudycha (University of South Carolina)Michael Lynch (IU, Biology)Carol Folt (Dartmouth College)Celia Chen (Dartmouth College)Xin Zhou (BGI)
Our lab groupXin Zhou (BGI)
23
This work benefits from and contributes to the Daphnia Genomics Consortium.24
Direct measure of CNV rates in MA lines
How does stress affect rates and spectra of mutation (including large scale structural variation)?How does stress affect rates and spectra of mutation (including large scale structural variation)?
With cadmium
Buck 4 (non‐adapted) 50 generations
50 generations Sequence 100 lines from each of four
Without cadmium
With cadmium
Buck 4 (non‐adapted)250 sisters per line
Simon 14 (adapted)of four conditions
With cadmium
Without cadmium50 generations
50 generations250 sisters per line
25