resequencing genomes -- today!

53
05/24/22 Confidential Perlegen At-A-Glance Perlegen At-A-Glance San Francisco San Francisco Bay Area Bay Area Spun-off from Spun-off from Affymetrix, Inc. Affymetrix, Inc. in March 2001 in March 2001 95 employees 95 employees Approximately half Approximately half in genetics/biology in genetics/biology and half in and half in bioinformatics bioinformatics Privately-held Privately-held

Upload: pammy98

Post on 10-May-2015

369 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Resequencing Genomes -- Today!

04/11/23 Confidential

Perlegen At-A-GlancePerlegen At-A-Glance

• San Francisco San Francisco Bay AreaBay Area

• Spun-off from Spun-off from Affymetrix, Inc. Affymetrix, Inc. in March 2001in March 2001

• 95 employees95 employees– Approximately half in Approximately half in

genetics/biology and genetics/biology and half in bioinformatics half in bioinformatics

• Privately-heldPrivately-held

Page 2: Resequencing Genomes -- Today!

04/11/23 Confidential

Using the Human GenomeUsing the Human Genome

Credits: Thomas Reid

Page 3: Resequencing Genomes -- Today!

04/11/23 Confidential

95% of one human genome is now 95% of one human genome is now publicly availablepublicly available

One copy of the human genome consists of 3 One copy of the human genome consists of 3 billion bases: billion bases:

AAGGTTCCCCTTAAGGCCCCTTGGTTGGAATTAATTAAGGGGGGCCCCCCTTAAGGAATTCCAA….….

Page 4: Resequencing Genomes -- Today!

04/11/23 Confidential

One copy of the human genome cost One copy of the human genome cost $100 million to obtain…$100 million to obtain…

Why were we willing to spend so much Why were we willing to spend so much money?money?

Page 5: Resequencing Genomes -- Today!

04/11/23 Confidential

Variations in DNA sequence affect Variations in DNA sequence affect many aspects of our livesmany aspects of our lives

eye coloreye color heightheight

diseasedisease personalitypersonality

drug drug responseresponse

Inherited traits or phenotypes Inherited traits or phenotypes

Page 6: Resequencing Genomes -- Today!

04/11/23 Confidential

Any two humans share 99.9% Any two humans share 99.9% the same DNA sequence…the same DNA sequence…

Page 7: Resequencing Genomes -- Today!

04/11/23 Confidential

Traits are influenced to different Traits are influenced to different degrees by genetics and environmentdegrees by genetics and environment

Cystic Fibrosis AIDS

Environmental contribution

Genetic contribution

Page 8: Resequencing Genomes -- Today!

04/11/23 Confidential

Most common traits are believed to Most common traits are believed to be about 50-50be about 50-50

Genetics Environment

Diabetes

Heart failure

Schizophrenia

Rheumatoid arthritisObesity

Height

Skin color

Osteoporosis

Page 9: Resequencing Genomes -- Today!

04/11/23 Confidential

With knowledge of the genetic With knowledge of the genetic component of a trait…component of a trait…

• DiagnosticDiagnostic

• Determine how a patient will respond to a Determine how a patient will respond to a particular drug treatmentparticular drug treatment

• Targets for drug developmentTargets for drug development

• More effective consumer productsMore effective consumer products

• Evaluate the role of lifestyle and enviroment on Evaluate the role of lifestyle and enviroment on the traitthe trait

Page 10: Resequencing Genomes -- Today!

04/11/23 Confidential

DNA is double-stranded and DNA is double-stranded and connected by very specific pairing of connected by very specific pairing of

the four bases the four bases AA, , GG, , TT, , CC

CC

GG TT

AA

DNA can be “unwound” to single-stranded DNA can be “unwound” to single-stranded form and then can be “wound” again to form and then can be “wound” again to

double-stranded form based on the double-stranded form based on the specificity of base-pairing – called specificity of base-pairing – called

“hybridization”“hybridization”

Page 11: Resequencing Genomes -- Today!

04/11/23 Confidential

Human DNA variation results fromHuman DNA variation results from errors in DNA replicationerrors in DNA replication

Page 12: Resequencing Genomes -- Today!

04/11/23 Confidential

DNA variations come in different forms…DNA variations come in different forms…

Single nucleotide polymorphism (SNP)Single nucleotide polymorphism (SNP)

AGCCTAGCCTGGTCACTTCACT AGCCT AGCCTAATCACTTCACT

DeletionDeletion

AGCCTAGCCTGGTCACTTCACT AGCCTTCACT AGCCTTCACT

InsertionInsertion

AGCCTAGCCTGGTCACTTCACT AGCCT AGCCTGGGGTCACTTCACT

Variable number tandem repeat (VNTR)Variable number tandem repeat (VNTR)

CAGCAGCAGCAGCAGCAG CAGCAGCAGCAGCAG CAGCAGCAGCAGCAG

Page 13: Resequencing Genomes -- Today!

04/11/23 Confidential

The genetic contribution to a trait The genetic contribution to a trait may be due to variation in one may be due to variation in one

gene…a Mendelian traitgene…a Mendelian trait

AA

GG

Page 14: Resequencing Genomes -- Today!

04/11/23 Confidential

Before the Human Genome Project, Before the Human Genome Project, genes responsible for Mendelian traits genes responsible for Mendelian traits were the only genes we could find…were the only genes we could find…

……but it still took a decade or but it still took a decade or more to find one of these more to find one of these

genesgenes

Page 15: Resequencing Genomes -- Today!

04/11/23 Confidential

Credits: Brandon Brylawski

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIMhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM

Once the results of the Human Genome Project Once the results of the Human Genome Project began to emerge, the number of Mendelian began to emerge, the number of Mendelian

trait genes discovered increased exponentially trait genes discovered increased exponentially and the time to discovery decreasedand the time to discovery decreased

Page 16: Resequencing Genomes -- Today!

04/11/23 Confidential

There are currently 8309 genes whose There are currently 8309 genes whose variants are associated with a disorder variants are associated with a disorder

in OMIM… in OMIM…

• Cystic FibrosisCystic Fibrosis

• Huntington’s DiseaseHuntington’s Disease

• Familial Breast CancerFamilial Breast Cancer

• Severe Combined Immunodeficiency DisorderSevere Combined Immunodeficiency Disorder

……knowledge which is used for knowledge which is used for diagnostics and preventative therapies, diagnostics and preventative therapies,

drug development, and gene therapydrug development, and gene therapy

Page 17: Resequencing Genomes -- Today!

04/11/23 Confidential

But Mendelian traits are the minority But Mendelian traits are the minority and the genetic variants responsible and the genetic variants responsible

for Mendelian disorders are rare in the for Mendelian disorders are rare in the general population…general population…

How many genetic variants have we How many genetic variants have we found that are resonsible for traits and found that are resonsible for traits and

disorders that affect millions of disorders that affect millions of people?people?

Very fewVery few

Page 18: Resequencing Genomes -- Today!

04/11/23 Confidential

The vast majority of traits are The vast majority of traits are notnot caused by variation in a single gene caused by variation in a single gene

and are called and are called complex traitscomplex traits……

• Probably the result of Probably the result of 10-3010-30 genetic genetic changes spread across the genomechanges spread across the genome

• Any single genetic variant may be Any single genetic variant may be responsible for only a small contribution responsible for only a small contribution to the traitto the trait

Page 19: Resequencing Genomes -- Today!

04/11/23 Confidential

You may not need to have all of the You may not need to have all of the possible genetic changes to get the possible genetic changes to get the

disease…disease…

An example where 10 genes are involved in a disease…An example where 10 genes are involved in a disease…

Variants (green) in Variants (green) in any 4any 4 of the 10 genes causes disease. of the 10 genes causes disease.

Page 20: Resequencing Genomes -- Today!

04/11/23 Confidential

What does all that mean?What does all that mean?

• The genetic variants responsible for The genetic variants responsible for common disease are themselves common disease are themselves common in the general populationcommon in the general population

• These genetic variants are found in both These genetic variants are found in both sick and healthy people sick and healthy people

This makes associating these variants This makes associating these variants with the disease extremely difficult and with the disease extremely difficult and

expensiveexpensive

Page 21: Resequencing Genomes -- Today!

04/11/23 Confidential

Genetic Association StudyGenetic Association Study

If a DNA variant is associated with a trait of interest, If a DNA variant is associated with a trait of interest, “affecteds” will have a different frequency of that “affecteds” will have a different frequency of that

variant than “unaffecteds”variant than “unaffecteds”

AffectedAffected

0.72 purple, 0.28 green

UnaffectedUnaffected

0.56 purple, 0.44 green

Page 22: Resequencing Genomes -- Today!

04/11/23 Confidential

In order to know, with statistical In order to know, with statistical certainty, that a genetic variant with a certainty, that a genetic variant with a small effect is associated with a trait small effect is associated with a trait requires looking at the DNA of large requires looking at the DNA of large

numbers of peoplenumbers of people

+ = 1,000 people500 people

withthe disease

Cases

500 peoplewithout

the disease

Controls

Page 23: Resequencing Genomes -- Today!

04/11/23 Confidential

At $100 million per genome, we At $100 million per genome, we certainly cannot sequence the certainly cannot sequence the

genomes for a thousand people for genomes for a thousand people for each trait we are interested in finding each trait we are interested in finding

the genes for… the genes for…

……we just need to look at the variants we just need to look at the variants in the genomes of the 1000 peoplein the genomes of the 1000 people

Page 24: Resequencing Genomes -- Today!

04/11/23 Confidential

Single Nucleotide Polymorphisms Single Nucleotide Polymorphisms (SNPs)(SNPs)

• SNPs are a frequent form of DNA variation and are scattered randomly across the genome

• Each SNP is characterized by only two Each SNP is characterized by only two basesbases

Page 25: Resequencing Genomes -- Today!

04/11/23 Confidential

Genotyping “calls” the two variants Genotyping “calls” the two variants that each person carries at one base that each person carries at one base

position in the genomeposition in the genome

GG

GGAA

AA

But to genotype, you need to know the two But to genotype, you need to know the two base variants and genome position of the base variants and genome position of the

SNPsSNPs

Page 26: Resequencing Genomes -- Today!

04/11/23 Confidential

How many SNPs do we need to find How many SNPs do we need to find across the genome and genotype in across the genome and genotype in

the 1000 people to find the genes the 1000 people to find the genes involved in complex traits? involved in complex traits?

The average cost of a single SNP The average cost of a single SNP genotype for one person is $.50 or $500 genotype for one person is $.50 or $500

for 1000 peoplefor 1000 people

Page 27: Resequencing Genomes -- Today!

04/11/23 Confidential

There are ~3 million SNPs There are ~3 million SNPs between two people between two people $1.5 billion!$1.5 billion!

Page 28: Resequencing Genomes -- Today!

04/11/23 Confidential

Look only at SNPs in the known Look only at SNPs in the known functional sequences of the human functional sequences of the human

genome because only functional genome because only functional regions are likely to be associated with regions are likely to be associated with

a traita trait

Minimize the cost of finding the SNPs and Minimize the cost of finding the SNPs and genotyping the SNPs in a Genetic genotyping the SNPs in a Genetic

Association StudyAssociation Study

Page 29: Resequencing Genomes -- Today!

04/11/23 Confidential

Look at dense set of common SNPs Look at dense set of common SNPs across the whole genome…across the whole genome…

• The important changes in DNA may not lie in The important changes in DNA may not lie in known functional sequences (which comprise less known functional sequences (which comprise less than 3% of the genome)than 3% of the genome)

• Even if all important changes are in known Even if all important changes are in known functional sequences, which do you select for functional sequences, which do you select for research? (You need to have the correct research? (You need to have the correct hypothesis up-front)hypothesis up-front)

• Not all functional sequences have been Not all functional sequences have been discovereddiscovered

Page 30: Resequencing Genomes -- Today!

04/11/23 Confidential

Discover all the common SNPs by Discover all the common SNPs by looking at the sequence of 25 copies looking at the sequence of 25 copies

of the genome from around the of the genome from around the world world

……but it takes 1 year to sequence one but it takes 1 year to sequence one mammalian genome, so that would mammalian genome, so that would take 25 years, not to mention the take 25 years, not to mention the

cost!cost!

Page 31: Resequencing Genomes -- Today!

04/11/23 Confidential

Perlegen came up with a faster Perlegen came up with a faster and cheaper way to find the and cheaper way to find the common SNPs compared to common SNPs compared to sequencing, possible only sequencing, possible only

because we had one copy of the because we had one copy of the human genome already known human genome already known

and technology improvements…and technology improvements…

Page 32: Resequencing Genomes -- Today!

04/11/23 Confidential

Reading Human Genomic Reading Human Genomic Sequence By Using Affymetrix Sequence By Using Affymetrix

DNA ChipsDNA Chips

On a glass chip are On a glass chip are synthesized 62,000 synthesized 62,000

consecutive bases of consecutive bases of known human genomic known human genomic

sequence in single-sequence in single-stranded formstranded form

Page 33: Resequencing Genomes -- Today!

04/11/23 Confidential

ACGT

T G A T G T G C A G A C A G A C

Take another copy of the human Take another copy of the human genome, label it with a fluorophore, genome, label it with a fluorophore,

and hybridize it to the chipand hybridize it to the chip

3’ 3’

Silanized glass or plastic surface

Cy5 labeled probeor PCR product

3’

3’

5’

Cy3 labeled probeor PCR product

G

C

3’

5’

Array spot

A

T

Page 34: Resequencing Genomes -- Today!

04/11/23 Confidential

acttgacatacttgacatAAggctgtaggctgtaacttgacatacttgacatCCggctgtaggctgtaacttgacatacttgacatGGggctgtaggctgtaacttgacatacttgacatTTggctgtaggctgta

DNA DNA SynthesizedSynthesizedon Chipon Chip

Detection of DNA Variation By Detection of DNA Variation By Using DNA ChipsUsing DNA Chips

..TGAACTGTA..TGAACTGTATTCCGACAT..CCGACAT..Known genomic sequenceKnown genomic sequence

AAAAGG CCTTGGTTAATTCCCCGGAACCAATTTTACCGGTT

AAAAGG CCTTGGTTAACCCCCCGGAACCAATTTTAACCGGTT

Labeled DNALabeled DNAHybridized to ChipHybridized to Chip

Page 35: Resequencing Genomes -- Today!

04/11/23 Confidential

How many chips do we have to How many chips do we have to process to discover SNPs from process to discover SNPs from

the 25 genomes?the 25 genomes?

600,000 chips. At 200 600,000 chips. At 200 chips processed per day, chips processed per day, it would take 8.4 years!it would take 8.4 years!

Page 36: Resequencing Genomes -- Today!

04/11/23 Confidential

Cover 15 million Cover 15 million bases of genomic bases of genomic

DNA on one DNA on one wafer!wafer!

What Perlegen was able to do What Perlegen was able to do successfully, that had never been successfully, that had never been

done before…done before…

5000 wafers to 5000 wafers to find the SNPs in find the SNPs in

25 genomes25 genomes

Page 37: Resequencing Genomes -- Today!

04/11/23 Confidential

=

Perlegen’s Technological Perlegen’s Technological AdvantageAdvantage

140+ DNA Sequencers/24 hours

1 Perlegen technician

using 3 wafers in 8 hours

Page 38: Resequencing Genomes -- Today!

04/11/23 Confidential

Human Whole-Genome High-DensityHuman Whole-Genome High-DensityOligonucleotide ArraysOligonucleotide Arrays

Human genome3 billion base pairs

A collection of 223 high-density arrays containing more than10 billion unique oligonucleotides

Page 39: Resequencing Genomes -- Today!

04/11/23 Confidential

Perlegen finished SNP discovery Perlegen finished SNP discovery across the entire human genome across the entire human genome

for 25 copies of the genome in for 25 copies of the genome in under 2 years in August 2002under 2 years in August 2002

• 1,717,015 common SNPs discovered and common SNPs discovered and confirmedconfirmed

• Had all the assays developed and working Had all the assays developed and working to genotype all the SNPsto genotype all the SNPs

Still, that would require $850 million Still, that would require $850 million for genotyping 1000 people. for genotyping 1000 people.

But we discovered something else… But we discovered something else…

Page 40: Resequencing Genomes -- Today!

04/11/23 Confidential

SNPsSNPs

SNP

ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…

ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…

ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…

ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…

ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…

SNP SNP

Page 41: Resequencing Genomes -- Today!

04/11/23 Confidential

SNP SpaceSNP Space

ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…

ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…

ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…

ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…

ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…

Page 42: Resequencing Genomes -- Today!

04/11/23 Confidential

There’s somethingamazing about SNPs...

SNPs occur in “blocks” !

Page 43: Resequencing Genomes -- Today!

04/11/23 Confidential

Haplotype PatternHaplotype Pattern

ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…

ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…

ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…

ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…

ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…

Page 44: Resequencing Genomes -- Today!

04/11/23 Confidential

The number of haplotype patterns is The number of haplotype patterns is limitedlimited

Possible patterns:Possible patterns:26 SNPs X 2 bases 26 SNPs X 2 bases

= 2= 226 26

Observed patterns Observed patterns ==77

1 2 3 41 2 3 4

The majority of The majority of the patterns fall the patterns fall

into only 4 into only 4 classes, which classes, which

can be can be distinguished distinguished

from each other from each other by only 2 SNPSby only 2 SNPS

Page 45: Resequencing Genomes -- Today!

04/11/23 Confidential

A SNP-Haplotype Map of the Human A SNP-Haplotype Map of the Human GenomeGenome

210,937 SNPs uniquely define haplotypes210,937 SNPs uniquely define haplotypesrepresenting the pattern of DNA variation representing the pattern of DNA variation

spanningspanningthe human genomethe human genome

2.3 billion bases of genomic DNA sequence is2.3 billion bases of genomic DNA sequence iscovered in 175,309 haplotype blockscovered in 175,309 haplotype blocks

13,000 bases is the average haplotype block size13,000 bases is the average haplotype block size

6.5 SNPs is the average number of SNPs per haplotype block

Page 46: Resequencing Genomes -- Today!

04/11/23 Confidential

The haplotype structure of Chr.21 is The haplotype structure of Chr.21 is available to the publicavailable to the public

http://genome-hg8.cse.ucsc.edu/cgi-bin/hgGateway?db=hg8http://genome-hg8.cse.ucsc.edu/cgi-bin/hgGateway?db=hg8

Page 47: Resequencing Genomes -- Today!

04/11/23 Confidential

1.7 million genotypes/individual

210,000 genotypes/individual

Genotyping only haplotype-defining Genotyping only haplotype-defining SNPs reduces the number of bases SNPs reduces the number of bases to be looked at in each individualto be looked at in each individual

Page 48: Resequencing Genomes -- Today!

04/11/23 Confidential

Whole Genome Scanning ApproachWhole Genome Scanning Approach

Looking across the Looking across the entireentire genome in hundreds of genome in hundreds of peoplepeople

• Does Does not not require a hypothesis up frontrequire a hypothesis up front

• Does Does notnot require placing bets on a few require placing bets on a few locationslocations

• Will reveal Will reveal manymany places in the genome that places in the genome that play a role in the disease or traitplay a role in the disease or trait

Page 49: Resequencing Genomes -- Today!

04/11/23 Confidential

Whole Genome Association Whole Genome Association MethodologyMethodology

500 “affecteds” and 500 “unaffecteds” = 1000 DNA 500 “affecteds” and 500 “unaffecteds” = 1000 DNA samples to assaysamples to assay

210,000 SNP assays per sample210,000 SNP assays per sample

210 million SNP assays per association study210 million SNP assays per association study

$105 million$105 million

Page 50: Resequencing Genomes -- Today!

04/11/23 Confidential

Genetic Association StudyGenetic Association Study

If a DNA variant is associated with a trait of interest, If a DNA variant is associated with a trait of interest, “affecteds” will have a different frequency of that “affecteds” will have a different frequency of that

variant than “unaffecteds”variant than “unaffecteds”

AffectedAffected

0.72 purple, 0.28 green

UnaffectedUnaffected

0.56 purple, 0.44 green

Page 51: Resequencing Genomes -- Today!

04/11/23 Confidential

Genetic Association Analysis Using Genetic Association Analysis Using Pooled DNA SamplesPooled DNA Samples

TTGG

SNP 1SNP 1

One tube One tube containing containing

all 500 DNAs all 500 DNAs from the from the

“affecteds”“affecteds”

one assayone assay30% 30% TT70% 70% GG

One tube One tube containing containing

all 500 DNAs all 500 DNAs from the from the

“unaffected“unaffecteds”s”

one assayone assay20% 20% TT80% 80% GG

Page 52: Resequencing Genomes -- Today!

04/11/23 Confidential

Whole Genome Association Whole Genome Association MethodologyMethodology

All SNP assays per association study using one DNA All SNP assays per association study using one DNA pool of “affecteds” and one DNA pool of “unaffecteds”pool of “affecteds” and one DNA pool of “unaffecteds”

210,000 SNP assays per sample210,000 SNP assays per sample

420,000 SNP assays per association study420,000 SNP assays per association study

$210,000$210,000

Page 53: Resequencing Genomes -- Today!

04/11/23 Confidential

Association Studies currently Association Studies currently underway at Perlegenunderway at Perlegen

• Genetics of drug response to a highly Genetics of drug response to a highly effective drug with GlaxoSmithClineeffective drug with GlaxoSmithCline– Small percent of patients have adverse Small percent of patients have adverse

reaction reaction

• Genetics of Diabetes Type 2 with a large Genetics of Diabetes Type 2 with a large international consortium of researchersinternational consortium of researchers– Affects 15 million in the U.S. aloneAffects 15 million in the U.S. alone

• Genetics of common traits with UnileverGenetics of common traits with Unilever– Improve effectiveness of beauty productsImprove effectiveness of beauty products