MEDG 505Pharmacogenomics
March 17, 2005
A. Brooks-Wilson
Reminder: What is Genomics?
According to http://genomics.ucdavis.edu/what.html:
“Genomics is operationally defined as investigations into the structure and function of very large numbers of genesundertaken in a simultaneous fashion”
Pharmacogenetics• “The study of how genes affect people’s response to medicines” (NIH) • A subset of complex genetics for which the traits relate to drugs• First observed in 1957• Part of “personalized medicine”• 20-95% of variability in drug disposition and effects is thought to be genetic• Non-genetic factors: age, interacting medications, organ function• Drug absorption, distribution, metabolism, excretion• >30 families of genes
Pharmacogenetics: Examples
• Drug metabolism genes• NAT2, isoniazid anti-tuberculosis drug hepatotoxicity• CYP3A5, many drugs• Thiopurine S-methyltransferase (TPMT), 6-thioguanine• Drug targets (receptors)• B2 Adrenergic Receptor, inhaled B agonists for asthma• Drug transporters• P-glycoprotein (ABCB1, MDR1), resistance to anti-epileptic drugs• The examples known today are those that come closest to simple genetic traits
Potential Consequences
• Extended / shortened pharmacological effect• Adverse drug reactions• Lack of pro-drug activation• Increased / decreased effective dose• Metabolism by alternative, deleterious pathways• Exacerbated drug-drug interactions
The Goal of Pharmacogenomics
Picture from Perlegen website: www.perlegen.com
Complex Genetics: Concepts
• Family studies vs. population studies
• Penetrance
• Genetic heterogeneity
• Linkage vs. association
• Haplotypes in family and association studies
• Genetic variation, SNPs
• Genotyping
Types of Genetic Studies
• Family studies– multi-generation families
• Association studies– Case / control (easiest to collect)
Penetrance
• Penetrance = the proportion of carriers who show the phenotype
• Expressivity = severity of the phenotype
Genetic Heterogeneity
• Locus heterogeneity (what we usually refer to when we talk about genetic heterogeneity)
• Allelic heterogeneity
Family Studies Identify Highly Penetrant Mutations
High penetrance disease allele(s)
Availability of suitable families is the limiting factor
Family studies are effective for only a minority of conditions
Association Studies Can Identify Variants with High or Low Penetrance
• Case / control groups
• Not limited to high penetrance alleles
• Amenable to the study of gene-environment interactions
• A preferred approach for the majority of complex genetic disorders
Complex Diseases / Phenotypes
• Multigenic (genetic heterogeneity)• Environmental effects (multiple)• Gene-gene interactions• Gene-environment interactions (for
pharmacogenetic traits: age, alcohol consumption, hepatitis exposure, etc.)
• Association studies will hold up under these complications but family-based linkage studies will not!
Linkage vs. Association
• Linkage is to a locus– different families can be linked to the same
locus but have different disease alleles– how to take advantage of this in proving a gene
is responsible for a disease
• Association is with an allele– done in groups or populations– the allele arose and was propagated in the
population; the haplotype was degraded by recombination
Genetic Markers
SNPs: Substitutions, for example, C / T Most common type of genetic variation Ideal for association mapping over short distances 1 SNP every ~ 200 base pairs in a population 1 SNP every ~1000 base pairs between 2 individuals dbSNP: >10M putative SNPs, > 5M validated SNPs
Microsatellites: (CA)n or other short repeats More polymorphic than SNPs Less common than SNPs 1 polymorphic microsatellite per ~ 100,000 base pairs Best for linkage mapping over long distances, in families
SNPs
• Single Nucleotide Polymorphisms
• Can also use “Indels”, though some investigators throw them away!
• Synonymous, non-synonymous SNPs
• Mutation vs. polymorphism vs. variant or variation
• The 1% definition
SNP Databases
• dbSNP (more than just human)
• Human Genome Variation Database
• At least 11 others!
• ~ 10 million SNPs with minor allele >1%
• ~ 7 million SNPs with minor allele >5%
• ~ 50,000 non-synonymous SNPs in the human genome
Case / Control Studies
1. Collect blood samples from patients and controls, with consent
2. Establish database of clinical and epidemiological data3. Select ‘candidate’ genes of interest for each trait4. Sequence the candidate genes in a small group of
patients5. Genotype selected variants in case / control groups6. Analyze for association with a phenotype7. Analyze for gene-gene and gene-environment
interactions
Genetic, Ethical, Legal and Social (GELS) issues investigations
Linkage Disequilibrium
• The difference between the observed frequency of a haplotype and its expected frequency if all alleles were segregating randomly
• For adjacent loci: A,a B,b• D = PAB - PA x PB
• D is dependent on allele frequencies
• Other related measures also used
Human haplotype blocks . . .
Ancestral chromosomes
Observed pattern of historical recombination in common haplotypes
Rather than
50 kb
. . . Simplify association studies
Gene
Ancestralchromosomes
C
T
A
G
SNP1 SNP2
A disease-causingmutation arises
C
T
A
G
A
G
C
T
A
G
A
GAssociation withnearby SNPs
*
*
Location of mutation
A
G
A
G
A
G
C
T
C
T
C
T
LD and Association
• Direct association– asks about the effect of a variant– if negative, the gene may still be involved!
• Indirect association– uses LD– can be more convincingly negative if
haplotypes are assessed
Haplotype Blocks
• Became clear in October 2001
• 87% of the genome is in blocks ~> 30 kb
• Not all of the genome is in haplotype blocks!
• Average block 22 kb, 11kb in African populations (Gabriel et al, 2002)
• A few common haplotypes at a given locus in a given population
• African populations generally have the greatest number of haplotypes and the shortest haplotype blocks
• Strength of LD and size of blocks varies greatly between regions
How to Generate Haplotypes
• Haplotyping in families
• Physical determination– long-range PCR, separation of molecules– cloning of single molecules– labor intensive
• Estimate haplotype frequencies– Expectation Maximization algorithm, others– generate frequencies for case group, control
group
Tag SNPs
Chromosome copy 1
Chromosome copy 2
Chromosome copy 3
Chromosome copy 4
The HapMap
• Reference map for association studies• Expected to reduce the number of markers required to
conduct effective genome scans for association• 270 samples from 4 populations:
– 30 Yoruban trios (Nigeria)
– 45 unrelated Japanese (Tokyo)
– 45 unrelated Chinese (Beijing)
– 30 U.S. trios (CEPH, N/W European ancestry)
• >400,000 markers genotyped in all samples, nearly 1M in CEPH trios
Strategies
• Candidate gene based studies– hypothesis-driven– must guess (one of) the right gene(s)!!– Current state of the art
• Genome scans– “hypothesis-free”– scans of ~ 1 million markers are now
possible
SNP Discovery is Still Necessary
• Many have been found by multi-read sequence mining
• Directed public SNP discovery in certain sets of genes, e.g.:– SNP500Cancer– Environmental Genome Project (EGP)
• Individuals used usually “unaffected”
SNP Discovery
All exons and regulatory regions of each gene
Identify regulatory regions by comparative genomics
Bi-directional sequencing
Denaturing High Performance Liquid Chromatography (DHPLC)
Other methods
PCR products
Template aliquotting:Robbins Hydra
1
PCR Set-up:Packard Multiprobe II liquid handler
2
PCR and cycle sequencing: MJ Tetrads
3
Purification of PCR Products: Agencourt
4
5
Sequencing: ABI 3700s
6
Cycle Sequencing
SNP Discovery: PolyPhred and Consed
PolyPhred: Debbie Nickerson; Consed, Phil Green
Sample Output
GG
GA
AA
Genotyping, Technology
• Determining the allele(s) present in a particular sample at a particular (SNP) marker
• Many methods
TaqMan (ABI): Uniplex genotyping
TaqMan
TaqMan Output
Homozygous 1,1Heterozygous
Homozygous 2,2
Extended Primer (26-mer)
A C T
Extended Primer (24-mer)
T C T
ACT
Allele 2Same Primer (23-mer)
+Enzyme+ddATP
+dCTP/dGTP/dTTP
A T G A
Allele 1
TCT
Unlabeled Primer (23-mer)
EX
TE
ND
Pri
mer
Alle
le 1
EX
TE
ND
Pri
mer
Alle
le 1
Alle
le 2
EX
TE
ND
Pri
mer
Alle
le 2
MassEXTEND REACTION
Diagram courtesy of Sequenom
* A G
* A G
* T C
* A G
* C T
Sequenom MassARRAY: < 12-plex
Diagram courtesy of Sequenom
Illumina BeadArray System: 1152-plex
• 1152-fold multiplexing
• 0.26 ng of genomic DNA per genotype
• $ 0.05 USD per genotype
Excitation Beam FluorescenceEmission
Photons(out)Fiber Cladding
Photons(in)
Fiber Core
Total Internal Reflection
cladding
Excitation Beam FluorescenceEmission
Photons(out)Fiber Cladding
Photons(in)
Fiber Core
Total Internal Reflection
claddingcladding
Illumina BeadArray SystemA B
D e c o d e r O l i g o 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6D e c o d e h y b 1D e c o d e h y b 2
D e c o d e h y b . 1 D e c o d e h y b . 2
D e c o d e r O l i g o 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6D e c o d e h y b 1D e c o d e h y b 2
D e c o d e h y b . 1 D e c o d e h y b . 2
AG
Address’ Allele Specific Extension
PCR with common primers
P1’P2’
P1P2
Product captureby hybridizationto array
P3’
P3
/\/\/\/
Address
T/C
AG
Address’ Allele Specific Extension
PCR with common primers
P1’P2’
P1P2
Product captureby hybridizationto array
P3’
P3
/\/\/\//\/\/\/
Address
T/C
ParAllele Molecular Inversion Probes: 10,000 Plex
Affymetrix Whole Genome Sampling Analysis: 500,000-plex
Kennedy et al., 2003
Affymetrix: Allele-Specific Hybridization
PM = perfect matchMM = mismatch
DNA Pooling Strategies
• Reduce the number of genotypes and genotyping cost, particularly for whole genome scans
• Pool of case DNAs vs. pool of control DNAs• DNAs must be mixed in precisely equimolar
proportions in the pools!• Requires a quantitative genotyping technique• E.g. 40% in cases vs. 20% in controls• Verify positives by genotyping individual samples