biostatistics 666 statistical models in human genetics · biostatistics 666 statistical models in...
TRANSCRIPT
![Page 1: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/1.jpg)
Biostatistics 666Statistical Models in Statistical Models in
Human Genetics
InstructorGonçalo AbecasisGonçalo Abecasis
![Page 2: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/2.jpg)
Course LogisticsCourse Logistics
GradingGrading Office HoursClass Notes
![Page 3: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/3.jpg)
Course Objective
Provide an understanding of statistical gmodels used in gene mapping studies
Survey commonly used algorithms and procedures in genetic analysisprocedures in genetic analysis
![Page 4: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/4.jpg)
Assessment
Weekly Assignmentsy g• About 60% of the final mark
2 Half Term Assessments• About 40% of the final markAbout 40% of the final mark
![Page 5: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/5.jpg)
Academic IntegrityAll i d i di id l All assignments are made on an individual basis – the solutions you provide must represent your own work.p y
Cheating, plagiarism and aiding and b ti th t tit t d iabeting these acts constitutes academic
misconduct and is a serious offense.
See also the School policy on academic conduct.
![Page 6: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/6.jpg)
Office Hours
Please cross out times for which you are yunavailable in the sheet going around
Room M4132School of Public Health II
![Page 7: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/7.jpg)
Class Website
PDF versions of notes and problem sets
http://genome.sph.umich.edu/wiki/666p g p
It is a wiki, so you should be able to add , ycomments, notes and discussion to each lecture.
![Page 8: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/8.jpg)
Course ContentsCourse Contents
Brief Overview
![Page 9: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/9.jpg)
Genetic Mapping
“Compares the inheritance patternof a trait with the inheritance pattern
of chromosomal regions”
Positional Cloning“Allows one to find where a gene isAllows one to find where a gene is,
without knowing what it is.”
![Page 10: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/10.jpg)
Some of the Topics Covered
Maximum Likelihood
Modeling Genes in Populations Modeling Genes in Populations
M d li G i P di Modeling Genes in Pedigrees
![Page 11: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/11.jpg)
Modeling Modeling Genes in Populations Hardy Weinberg Equilibrium
Linkage Disequilibrium Linkage Disequilibrium
The Coalescent The Coalescent
Methods for Haplotypingp yp g
Methods for Handling Shotgun Sequence Data
![Page 12: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/12.jpg)
Modeling Modeling Genes within Pedigrees
El S l i h Elston-Stewart algorithm
Lander-Green algorithm Lander-Green algorithm
Checking Genetic Data for Errorsg
Genetic Linkage Tests
Genetic Association Tests
![Page 13: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/13.jpg)
Let’s Get Started!Let’s Get Started!
The Basics
![Page 14: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/14.jpg)
Today – Primer In Genetics
How information is stored in DNA
How DNA is inherited
Types of DNA variation
Common designs for Genetic studies
![Page 15: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/15.jpg)
DNA – Information Store
Encodes the information required for cells and organisms to function and produce new cells and organisms.
DNA variation is responsible for many individual differences, some of which are ,medically important.
![Page 16: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/16.jpg)
Human Genome
Multiple chromosomes• 22 autosomes
• P t i 2 i i di id l• Present in 2 copies per individual• One maternally and one paternally inherited copy
• 1 pair of sex chromosomes• Females have two X chromosomes• Males have one X chromosome and one Y chromosome
Total of ~3 x 109 bases (each A, C, T or G)
![Page 17: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/17.jpg)
Inheritance of DNATh h bi i “DNA i ” i f d b Through recombination, a new “DNA string” is formed by combining two parental DNA strings
Th h h i i f th t Thus, each chromosome we carry is a mosaic of the two chromosomes carried by our parents
Only a small number of changeovers between the two Only a small number of changeovers between the two parental chromosomes • On average ~1 per Morgan (~108 bases)
Copying of DNA sequences is imperfect and, for typical sequences, the error rate is about 1 per 108 bases copied
![Page 18: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/18.jpg)
Human Variation
Every chromosome is unique …
… but when two chromosomes are compared most of their sequence is identicalq
About 1 per 1,000 bases differs between pairs of human chromosomes
![Page 19: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/19.jpg)
DNA Sequences That Vary… Genes (protein coding sequences, which total <2% of all DNA)
• ~20,000-25,000 in humans
P d Pseudogenes• Ancient genes, inactivated through mutation
Promoters and Enhancers Promoters and Enhancers• Sequences which control gene expression
Repeat DNA Repeat DNA• Often more variable than other types of sequences• Useful for tracking DNA through families or populations
Packaging sequences, “spacer” DNA, etc.
![Page 20: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/20.jpg)
Important Vocabulary … Locus Polymorphism
Allele
Phenotype• Mendelian Traits• Complex Traits Allele
Mutation Linkage
Complex Traits Chromosomal landmarks
• CentromeresLinkage Genetic Marker Genotype
• Telomeres Gene RNA RNA Protein
![Page 21: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/21.jpg)
Data for a Genetic Study
Pedigree• Set of individuals of known relationship
Observed marker genotypes Observed marker genotypes• SNPs, VNTRs, microsatellites
Phenotype data for individuals
![Page 22: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/22.jpg)
Genetic Markers Genetic variants that can be measured conveniently
Typically we characterize them by: Typically, we characterize them by:• Number of Alleles• Frequency of Each Allele
• These are summarized by the heterozygosity
The most commonly used genetic markers are microsatellites and SNPs
![Page 23: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/23.jpg)
Phenotypes
Measured characters of individuals
Mendelian Phenotypes• Completely determined by genes• e.g. Cystic Fibrosis, Retinoblastoma
Complex Phenotypes• Controlled by multiple genes and environmental factors• eg. Diabetes, Inflammatory Bowel Disease
![Page 24: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/24.jpg)
Ultimate Aim of Gene-Mapping Ultimate Aim of Gene Mapping Experiments
Localize and identify variants that control interesting traitsg• Susceptibility to human disease• Phenotypic variation in the populationy
The difficulty The difficulty…• Testing several million variants is impractical…
![Page 25: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/25.jpg)
3 Common Questions
Are there “genes” influencing this trait?• Epidemiological studies
Where are those “genes”?• Li k l i• Linkage analysis
What are those “genes”? What are those genes ?• Association analysis
![Page 26: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/26.jpg)
Is a trait genetic?
Examine distribution of trait in the population and among relatives
E.g. Inflammatory Bowel Disease (Crohn’s)• G l l ti• General population
• 1-3 cases per 1,000 individuals• Twins of affected individuals
• 44% of monozygotic twins also have Crohn’s• 3.8% of dizygotic twins also have Crohn’s
![Page 27: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/27.jpg)
Where are those genes?
Find genetic markers that co-segregate with disease
E.g. D16S3136 E.g. D16S3136co-segregateswith Crohn’s
![Page 28: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/28.jpg)
What are those genes? Identify genetic variants that are associated
with disease…
E.g. Mutations which disrupt NOD2 are much more common in Crohn’s patientsp
Crohn’s Controls• Arg702Trp: 11% 4%• Arg702Trp: 11% 4%• Gly908Arg: 4% 2%• Leu1007fs 8% 4%
![Page 29: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/29.jpg)
Common Designs for Common Designs for Genetic Studies
Parametric Linkage analysis
Allele-sharing methods
Association analysis
![Page 30: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/30.jpg)
Parametric Linkage Analysis Evaluate a specific model and location
• Allele frequencies at disease loci• Probability of disease for each genotype
Potentially very powerful Vulnerable to heterogeneity, model misspecification
![Page 31: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/31.jpg)
Allele Sharing Analysis
Reject null hypothesis that sharing is random at a particular region
Less powerful, but more robust Quantitative trait extensions exist
![Page 32: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/32.jpg)
Association Analysis Simplest case compares frequency of allele among
cases and controls Genome-wide search requires hundreds of thousands q
of markers Typically, focuses on candidate genes
![Page 33: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/33.jpg)
Which Design to Choose?The Right Choice Depends on the Alleles We Seek…
cte
of e
ffec
Rare, high penetrancemutations – use linkage
C l
agni
tude Common, low penetrance
variants – use association
M
Frequency in population
![Page 34: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/34.jpg)
Genetic Linkage Studies Identify variants with relatively large contributions to
disease risk
Require only a coarse measurement of genetic variation• 400 – 800 microsatellites can extract most of the linkage
information in typical pedigrees• Until recently, the only option for conducting whole genome
studies
High-throughput SNP genotyping has already sped up and facilitated these studies• Data analysis methods must select subset of independent
SNPs or model disequilibrium between markers
![Page 35: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/35.jpg)
Genetic Association StudiesId if i i i h l i l ll Identify genetic variants with relatively small individual contributions to disease risk
Require detailed measurement of genetic variation• 10 000 000 t l d ti i t• > 10,000,000 catalogued genetic variants, so …• Until recently, limited to candidate genes or regions
• A hit-and-miss approach…
SNP resources and decreasing assay costs now make it possible to examine 100,000s of p ,markers
![Page 36: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/36.jpg)
Recommended Reading
An introduction to important issues in genetics:
• Lander and Schork (1994)Science 265:2037-48
A good reference on molecular genetics:
• Human Molecular GeneticsTom Strachan and Andrew Read
![Page 37: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading](https://reader030.vdocuments.net/reader030/viewer/2022040211/5e748f49d1c509640c2774d5/html5/thumbnails/37.jpg)
Reading for Next Lecture Will be discussing Hardy-Weinberg equilibrium
• A basic feature of genotypes in human populations
Wigginton, Cutler, Abecasis (2005)A note on exact tests of Hardy-Weinberg equilibrium.Am J Hum Genet 76:887-93
This paper describes an efficient method for testing Hardy-Weinberg equilibrium and includes many importantHardy Weinberg equilibrium and includes many important historical references