resequencing genomes -- today!
TRANSCRIPT
04/11/23 Confidential
Perlegen At-A-GlancePerlegen At-A-Glance
• San Francisco San Francisco Bay AreaBay Area
• Spun-off from Spun-off from Affymetrix, Inc. Affymetrix, Inc. in March 2001in March 2001
• 95 employees95 employees– Approximately half in Approximately half in
genetics/biology and genetics/biology and half in bioinformatics half in bioinformatics
• Privately-heldPrivately-held
04/11/23 Confidential
Using the Human GenomeUsing the Human Genome
Credits: Thomas Reid
04/11/23 Confidential
95% of one human genome is now 95% of one human genome is now publicly availablepublicly available
One copy of the human genome consists of 3 One copy of the human genome consists of 3 billion bases: billion bases:
AAGGTTCCCCTTAAGGCCCCTTGGTTGGAATTAATTAAGGGGGGCCCCCCTTAAGGAATTCCAA….….
04/11/23 Confidential
One copy of the human genome cost One copy of the human genome cost $100 million to obtain…$100 million to obtain…
Why were we willing to spend so much Why were we willing to spend so much money?money?
04/11/23 Confidential
Variations in DNA sequence affect Variations in DNA sequence affect many aspects of our livesmany aspects of our lives
eye coloreye color heightheight
diseasedisease personalitypersonality
drug drug responseresponse
Inherited traits or phenotypes Inherited traits or phenotypes
04/11/23 Confidential
Any two humans share 99.9% Any two humans share 99.9% the same DNA sequence…the same DNA sequence…
04/11/23 Confidential
Traits are influenced to different Traits are influenced to different degrees by genetics and environmentdegrees by genetics and environment
Cystic Fibrosis AIDS
Environmental contribution
Genetic contribution
04/11/23 Confidential
Most common traits are believed to Most common traits are believed to be about 50-50be about 50-50
Genetics Environment
Diabetes
Heart failure
Schizophrenia
Rheumatoid arthritisObesity
Height
Skin color
Osteoporosis
04/11/23 Confidential
With knowledge of the genetic With knowledge of the genetic component of a trait…component of a trait…
• DiagnosticDiagnostic
• Determine how a patient will respond to a Determine how a patient will respond to a particular drug treatmentparticular drug treatment
• Targets for drug developmentTargets for drug development
• More effective consumer productsMore effective consumer products
• Evaluate the role of lifestyle and enviroment on Evaluate the role of lifestyle and enviroment on the traitthe trait
04/11/23 Confidential
DNA is double-stranded and DNA is double-stranded and connected by very specific pairing of connected by very specific pairing of
the four bases the four bases AA, , GG, , TT, , CC
CC
GG TT
AA
DNA can be “unwound” to single-stranded DNA can be “unwound” to single-stranded form and then can be “wound” again to form and then can be “wound” again to
double-stranded form based on the double-stranded form based on the specificity of base-pairing – called specificity of base-pairing – called
“hybridization”“hybridization”
04/11/23 Confidential
Human DNA variation results fromHuman DNA variation results from errors in DNA replicationerrors in DNA replication
04/11/23 Confidential
DNA variations come in different forms…DNA variations come in different forms…
Single nucleotide polymorphism (SNP)Single nucleotide polymorphism (SNP)
AGCCTAGCCTGGTCACTTCACT AGCCT AGCCTAATCACTTCACT
DeletionDeletion
AGCCTAGCCTGGTCACTTCACT AGCCTTCACT AGCCTTCACT
InsertionInsertion
AGCCTAGCCTGGTCACTTCACT AGCCT AGCCTGGGGTCACTTCACT
Variable number tandem repeat (VNTR)Variable number tandem repeat (VNTR)
CAGCAGCAGCAGCAGCAG CAGCAGCAGCAGCAG CAGCAGCAGCAGCAG
04/11/23 Confidential
The genetic contribution to a trait The genetic contribution to a trait may be due to variation in one may be due to variation in one
gene…a Mendelian traitgene…a Mendelian trait
AA
GG
04/11/23 Confidential
Before the Human Genome Project, Before the Human Genome Project, genes responsible for Mendelian traits genes responsible for Mendelian traits were the only genes we could find…were the only genes we could find…
……but it still took a decade or but it still took a decade or more to find one of these more to find one of these
genesgenes
04/11/23 Confidential
Credits: Brandon Brylawski
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIMhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM
Once the results of the Human Genome Project Once the results of the Human Genome Project began to emerge, the number of Mendelian began to emerge, the number of Mendelian
trait genes discovered increased exponentially trait genes discovered increased exponentially and the time to discovery decreasedand the time to discovery decreased
04/11/23 Confidential
There are currently 8309 genes whose There are currently 8309 genes whose variants are associated with a disorder variants are associated with a disorder
in OMIM… in OMIM…
• Cystic FibrosisCystic Fibrosis
• Huntington’s DiseaseHuntington’s Disease
• Familial Breast CancerFamilial Breast Cancer
• Severe Combined Immunodeficiency DisorderSevere Combined Immunodeficiency Disorder
……knowledge which is used for knowledge which is used for diagnostics and preventative therapies, diagnostics and preventative therapies,
drug development, and gene therapydrug development, and gene therapy
04/11/23 Confidential
But Mendelian traits are the minority But Mendelian traits are the minority and the genetic variants responsible and the genetic variants responsible
for Mendelian disorders are rare in the for Mendelian disorders are rare in the general population…general population…
How many genetic variants have we How many genetic variants have we found that are resonsible for traits and found that are resonsible for traits and
disorders that affect millions of disorders that affect millions of people?people?
Very fewVery few
04/11/23 Confidential
The vast majority of traits are The vast majority of traits are notnot caused by variation in a single gene caused by variation in a single gene
and are called and are called complex traitscomplex traits……
• Probably the result of Probably the result of 10-3010-30 genetic genetic changes spread across the genomechanges spread across the genome
• Any single genetic variant may be Any single genetic variant may be responsible for only a small contribution responsible for only a small contribution to the traitto the trait
04/11/23 Confidential
You may not need to have all of the You may not need to have all of the possible genetic changes to get the possible genetic changes to get the
disease…disease…
An example where 10 genes are involved in a disease…An example where 10 genes are involved in a disease…
Variants (green) in Variants (green) in any 4any 4 of the 10 genes causes disease. of the 10 genes causes disease.
04/11/23 Confidential
What does all that mean?What does all that mean?
• The genetic variants responsible for The genetic variants responsible for common disease are themselves common disease are themselves common in the general populationcommon in the general population
• These genetic variants are found in both These genetic variants are found in both sick and healthy people sick and healthy people
This makes associating these variants This makes associating these variants with the disease extremely difficult and with the disease extremely difficult and
expensiveexpensive
04/11/23 Confidential
Genetic Association StudyGenetic Association Study
If a DNA variant is associated with a trait of interest, If a DNA variant is associated with a trait of interest, “affecteds” will have a different frequency of that “affecteds” will have a different frequency of that
variant than “unaffecteds”variant than “unaffecteds”
AffectedAffected
0.72 purple, 0.28 green
UnaffectedUnaffected
0.56 purple, 0.44 green
04/11/23 Confidential
In order to know, with statistical In order to know, with statistical certainty, that a genetic variant with a certainty, that a genetic variant with a small effect is associated with a trait small effect is associated with a trait requires looking at the DNA of large requires looking at the DNA of large
numbers of peoplenumbers of people
+ = 1,000 people500 people
withthe disease
Cases
500 peoplewithout
the disease
Controls
04/11/23 Confidential
At $100 million per genome, we At $100 million per genome, we certainly cannot sequence the certainly cannot sequence the
genomes for a thousand people for genomes for a thousand people for each trait we are interested in finding each trait we are interested in finding
the genes for… the genes for…
……we just need to look at the variants we just need to look at the variants in the genomes of the 1000 peoplein the genomes of the 1000 people
04/11/23 Confidential
Single Nucleotide Polymorphisms Single Nucleotide Polymorphisms (SNPs)(SNPs)
• SNPs are a frequent form of DNA variation and are scattered randomly across the genome
• Each SNP is characterized by only two Each SNP is characterized by only two basesbases
04/11/23 Confidential
Genotyping “calls” the two variants Genotyping “calls” the two variants that each person carries at one base that each person carries at one base
position in the genomeposition in the genome
GG
GGAA
AA
But to genotype, you need to know the two But to genotype, you need to know the two base variants and genome position of the base variants and genome position of the
SNPsSNPs
04/11/23 Confidential
How many SNPs do we need to find How many SNPs do we need to find across the genome and genotype in across the genome and genotype in
the 1000 people to find the genes the 1000 people to find the genes involved in complex traits? involved in complex traits?
The average cost of a single SNP The average cost of a single SNP genotype for one person is $.50 or $500 genotype for one person is $.50 or $500
for 1000 peoplefor 1000 people
04/11/23 Confidential
There are ~3 million SNPs There are ~3 million SNPs between two people between two people $1.5 billion!$1.5 billion!
04/11/23 Confidential
Look only at SNPs in the known Look only at SNPs in the known functional sequences of the human functional sequences of the human
genome because only functional genome because only functional regions are likely to be associated with regions are likely to be associated with
a traita trait
Minimize the cost of finding the SNPs and Minimize the cost of finding the SNPs and genotyping the SNPs in a Genetic genotyping the SNPs in a Genetic
Association StudyAssociation Study
04/11/23 Confidential
Look at dense set of common SNPs Look at dense set of common SNPs across the whole genome…across the whole genome…
• The important changes in DNA may not lie in The important changes in DNA may not lie in known functional sequences (which comprise less known functional sequences (which comprise less than 3% of the genome)than 3% of the genome)
• Even if all important changes are in known Even if all important changes are in known functional sequences, which do you select for functional sequences, which do you select for research? (You need to have the correct research? (You need to have the correct hypothesis up-front)hypothesis up-front)
• Not all functional sequences have been Not all functional sequences have been discovereddiscovered
04/11/23 Confidential
Discover all the common SNPs by Discover all the common SNPs by looking at the sequence of 25 copies looking at the sequence of 25 copies
of the genome from around the of the genome from around the world world
……but it takes 1 year to sequence one but it takes 1 year to sequence one mammalian genome, so that would mammalian genome, so that would take 25 years, not to mention the take 25 years, not to mention the
cost!cost!
04/11/23 Confidential
Perlegen came up with a faster Perlegen came up with a faster and cheaper way to find the and cheaper way to find the common SNPs compared to common SNPs compared to sequencing, possible only sequencing, possible only
because we had one copy of the because we had one copy of the human genome already known human genome already known
and technology improvements…and technology improvements…
04/11/23 Confidential
Reading Human Genomic Reading Human Genomic Sequence By Using Affymetrix Sequence By Using Affymetrix
DNA ChipsDNA Chips
On a glass chip are On a glass chip are synthesized 62,000 synthesized 62,000
consecutive bases of consecutive bases of known human genomic known human genomic
sequence in single-sequence in single-stranded formstranded form
04/11/23 Confidential
ACGT
T G A T G T G C A G A C A G A C
Take another copy of the human Take another copy of the human genome, label it with a fluorophore, genome, label it with a fluorophore,
and hybridize it to the chipand hybridize it to the chip
3’ 3’
Silanized glass or plastic surface
Cy5 labeled probeor PCR product
3’
3’
5’
Cy3 labeled probeor PCR product
G
C
3’
5’
Array spot
A
T
04/11/23 Confidential
acttgacatacttgacatAAggctgtaggctgtaacttgacatacttgacatCCggctgtaggctgtaacttgacatacttgacatGGggctgtaggctgtaacttgacatacttgacatTTggctgtaggctgta
DNA DNA SynthesizedSynthesizedon Chipon Chip
Detection of DNA Variation By Detection of DNA Variation By Using DNA ChipsUsing DNA Chips
..TGAACTGTA..TGAACTGTATTCCGACAT..CCGACAT..Known genomic sequenceKnown genomic sequence
AAAAGG CCTTGGTTAATTCCCCGGAACCAATTTTACCGGTT
AAAAGG CCTTGGTTAACCCCCCGGAACCAATTTTAACCGGTT
Labeled DNALabeled DNAHybridized to ChipHybridized to Chip
04/11/23 Confidential
How many chips do we have to How many chips do we have to process to discover SNPs from process to discover SNPs from
the 25 genomes?the 25 genomes?
600,000 chips. At 200 600,000 chips. At 200 chips processed per day, chips processed per day, it would take 8.4 years!it would take 8.4 years!
04/11/23 Confidential
Cover 15 million Cover 15 million bases of genomic bases of genomic
DNA on one DNA on one wafer!wafer!
What Perlegen was able to do What Perlegen was able to do successfully, that had never been successfully, that had never been
done before…done before…
5000 wafers to 5000 wafers to find the SNPs in find the SNPs in
25 genomes25 genomes
04/11/23 Confidential
=
Perlegen’s Technological Perlegen’s Technological AdvantageAdvantage
140+ DNA Sequencers/24 hours
1 Perlegen technician
using 3 wafers in 8 hours
04/11/23 Confidential
Human Whole-Genome High-DensityHuman Whole-Genome High-DensityOligonucleotide ArraysOligonucleotide Arrays
Human genome3 billion base pairs
A collection of 223 high-density arrays containing more than10 billion unique oligonucleotides
04/11/23 Confidential
Perlegen finished SNP discovery Perlegen finished SNP discovery across the entire human genome across the entire human genome
for 25 copies of the genome in for 25 copies of the genome in under 2 years in August 2002under 2 years in August 2002
• 1,717,015 common SNPs discovered and common SNPs discovered and confirmedconfirmed
• Had all the assays developed and working Had all the assays developed and working to genotype all the SNPsto genotype all the SNPs
Still, that would require $850 million Still, that would require $850 million for genotyping 1000 people. for genotyping 1000 people.
But we discovered something else… But we discovered something else…
04/11/23 Confidential
SNPsSNPs
SNP
ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…
ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…
ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…
ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…
ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…
SNP SNP
04/11/23 Confidential
SNP SpaceSNP Space
ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…
ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…
ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…
ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…
ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…
04/11/23 Confidential
There’s somethingamazing about SNPs...
SNPs occur in “blocks” !
04/11/23 Confidential
Haplotype PatternHaplotype Pattern
ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…
ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…
ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…
ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…
ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…
04/11/23 Confidential
The number of haplotype patterns is The number of haplotype patterns is limitedlimited
Possible patterns:Possible patterns:26 SNPs X 2 bases 26 SNPs X 2 bases
= 2= 226 26
Observed patterns Observed patterns ==77
1 2 3 41 2 3 4
The majority of The majority of the patterns fall the patterns fall
into only 4 into only 4 classes, which classes, which
can be can be distinguished distinguished
from each other from each other by only 2 SNPSby only 2 SNPS
04/11/23 Confidential
A SNP-Haplotype Map of the Human A SNP-Haplotype Map of the Human GenomeGenome
210,937 SNPs uniquely define haplotypes210,937 SNPs uniquely define haplotypesrepresenting the pattern of DNA variation representing the pattern of DNA variation
spanningspanningthe human genomethe human genome
2.3 billion bases of genomic DNA sequence is2.3 billion bases of genomic DNA sequence iscovered in 175,309 haplotype blockscovered in 175,309 haplotype blocks
13,000 bases is the average haplotype block size13,000 bases is the average haplotype block size
6.5 SNPs is the average number of SNPs per haplotype block
04/11/23 Confidential
The haplotype structure of Chr.21 is The haplotype structure of Chr.21 is available to the publicavailable to the public
http://genome-hg8.cse.ucsc.edu/cgi-bin/hgGateway?db=hg8http://genome-hg8.cse.ucsc.edu/cgi-bin/hgGateway?db=hg8
04/11/23 Confidential
1.7 million genotypes/individual
210,000 genotypes/individual
Genotyping only haplotype-defining Genotyping only haplotype-defining SNPs reduces the number of bases SNPs reduces the number of bases to be looked at in each individualto be looked at in each individual
04/11/23 Confidential
Whole Genome Scanning ApproachWhole Genome Scanning Approach
Looking across the Looking across the entireentire genome in hundreds of genome in hundreds of peoplepeople
• Does Does not not require a hypothesis up frontrequire a hypothesis up front
• Does Does notnot require placing bets on a few require placing bets on a few locationslocations
• Will reveal Will reveal manymany places in the genome that places in the genome that play a role in the disease or traitplay a role in the disease or trait
04/11/23 Confidential
Whole Genome Association Whole Genome Association MethodologyMethodology
500 “affecteds” and 500 “unaffecteds” = 1000 DNA 500 “affecteds” and 500 “unaffecteds” = 1000 DNA samples to assaysamples to assay
210,000 SNP assays per sample210,000 SNP assays per sample
210 million SNP assays per association study210 million SNP assays per association study
$105 million$105 million
04/11/23 Confidential
Genetic Association StudyGenetic Association Study
If a DNA variant is associated with a trait of interest, If a DNA variant is associated with a trait of interest, “affecteds” will have a different frequency of that “affecteds” will have a different frequency of that
variant than “unaffecteds”variant than “unaffecteds”
AffectedAffected
0.72 purple, 0.28 green
UnaffectedUnaffected
0.56 purple, 0.44 green
04/11/23 Confidential
Genetic Association Analysis Using Genetic Association Analysis Using Pooled DNA SamplesPooled DNA Samples
TTGG
SNP 1SNP 1
One tube One tube containing containing
all 500 DNAs all 500 DNAs from the from the
“affecteds”“affecteds”
one assayone assay30% 30% TT70% 70% GG
One tube One tube containing containing
all 500 DNAs all 500 DNAs from the from the
“unaffected“unaffecteds”s”
one assayone assay20% 20% TT80% 80% GG
04/11/23 Confidential
Whole Genome Association Whole Genome Association MethodologyMethodology
All SNP assays per association study using one DNA All SNP assays per association study using one DNA pool of “affecteds” and one DNA pool of “unaffecteds”pool of “affecteds” and one DNA pool of “unaffecteds”
210,000 SNP assays per sample210,000 SNP assays per sample
420,000 SNP assays per association study420,000 SNP assays per association study
$210,000$210,000
04/11/23 Confidential
Association Studies currently Association Studies currently underway at Perlegenunderway at Perlegen
• Genetics of drug response to a highly Genetics of drug response to a highly effective drug with GlaxoSmithClineeffective drug with GlaxoSmithCline– Small percent of patients have adverse Small percent of patients have adverse
reaction reaction
• Genetics of Diabetes Type 2 with a large Genetics of Diabetes Type 2 with a large international consortium of researchersinternational consortium of researchers– Affects 15 million in the U.S. aloneAffects 15 million in the U.S. alone
• Genetics of common traits with UnileverGenetics of common traits with Unilever– Improve effectiveness of beauty productsImprove effectiveness of beauty products