acknowledgements

53
Bioinformatics Bioinformatics SNPs and haplotypes SNPs and haplotypes Kristel Van Steen, PhD, ScD Kristel Van Steen, PhD, ScD ([email protected]) ([email protected]) Université de Liege - Institut Montefiore Université de Liege - Institut Montefiore 2008-2009 2008-2009

Upload: edda

Post on 25-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Bioinformatics SNPs and haplotypes Kristel Van Steen, PhD, ScD ([email protected]) Université de Liege - Institut Montefiore 2008-2009. Acknowledgements. Parts of these slides have been adapted or taken over from existing course notes and online material : - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Acknowledgements

BioinformaticsBioinformaticsSNPs and haplotypesSNPs and haplotypes

Kristel Van Steen, PhD, ScDKristel Van Steen, PhD, ScD

([email protected])([email protected])

Université de Liege - Institut MontefioreUniversité de Liege - Institut Montefiore

2008-20092008-2009

Page 2: Acknowledgements

AcknowledgementsAcknowledgements

Parts of these slides have been Parts of these slides have been adapted or taken over from existing adapted or taken over from existing course notes and online material:course notes and online material:

Practical: Heather CordellPractical: Heather Cordell

Slides: Stuart M BrownSlides: Stuart M Brown

Page 3: Acknowledgements

Outline

Practical in R on genetic association analysis

SNPs and Haplotypes A tour in FBAT

Page 4: Acknowledgements

Genetic Association Analysis in Genetic Association Analysis in RR

Page 5: Acknowledgements

Computer Practical Exercise

Heather Cordell

http://www.staff.ncl.ac.uk/heather.cordell/WTACcasecon2007.html

Using R for Case-control association Gene-gene interactions (future class)

Page 6: Acknowledgements

SNPs and HaplotypesSNPs and Haplotypes

A gentle introduction of relevant issues

Page 7: Acknowledgements

•Mutations occur randomly throughout the DNAMutations occur randomly throughout the DNA

•Most have no phenotypic effect (non-coding Most have no phenotypic effect (non-coding regions, equivalent codons, similar AAs)regions, equivalent codons, similar AAs)

•Some damage the function of a protein or Some damage the function of a protein or regulatory elementregulatory element

•A very few provide an evolutionary advantageA very few provide an evolutionary advantage

Mutations create Alleles

Page 8: Acknowledgements

Human AllelesHuman Alleles The OMIM (Online Mendelian Inheritance in Man) The OMIM (Online Mendelian Inheritance in Man)

database at the NCBI tracks all human mutations database at the NCBI tracks all human mutations with known pheontypes.with known pheontypes.

It contains a total of about 2,000 genetic diseases It contains a total of about 2,000 genetic diseases [and another ~11,000 genetic loci with known [and another ~11,000 genetic loci with known phenotypes - but not necessarily known gene phenotypes - but not necessarily known gene sequences]sequences]

It is designed for use by physicians:It is designed for use by physicians: can search by disease namecan search by disease name contains summaries from clinical studiescontains summaries from clinical studies

Page 9: Acknowledgements
Page 10: Acknowledgements

Population GeneticsPopulation Genetics Chromosome pairs segregate and recombine in Chromosome pairs segregate and recombine in

every generation.every generation.

Every allele of every gene has its own Every allele of every gene has its own independent evolutionary history (and future!)independent evolutionary history (and future!)

Frequencies of various alleles differ in different Frequencies of various alleles differ in different sub-populations of people.sub-populations of people.

Page 11: Acknowledgements

SNPsSNPs

Single nucleotide polymorphisms (SNPs) are DNA Single nucleotide polymorphisms (SNPs) are DNA sequence variations occurring when a single sequence variations occurring when a single nucleotide (A, C, T, G) in the genome is altered.nucleotide (A, C, T, G) in the genome is altered.

The inherited allelic variation must have >1% The inherited allelic variation must have >1% population frequency.population frequency.

SNPs can occur in both coding and non-coding SNPs can occur in both coding and non-coding regions, making up 90% of all human genetic regions, making up 90% of all human genetic variationvariation

Frequency: roughly, every 100 to 300 bases along Frequency: roughly, every 100 to 300 bases along the about 3 billion base human genomethe about 3 billion base human genome

Remark: Some definitions include methylated andRemark: Some definitions include methylated and deaminated dinucleotidesdeaminated dinucleotides

Page 12: Acknowledgements

DisDistribution of SNPs and Power

Page 13: Acknowledgements

SNPs are Very CommonSNPs are Very Common SNPs are very common in the human SNPs are very common in the human

population.population. Between any two people, there is an Between any two people, there is an

average of one SNP every 1000 bases.average of one SNP every 1000 bases. Most of these have no phenotypic effectMost of these have no phenotypic effect

only <1% of all human SNPs impact protein only <1% of all human SNPs impact protein function (non-coding regions)function (non-coding regions)

Selection against mis-sense mutations (think Selection against mis-sense mutations (think about what would happen to dominant lethal about what would happen to dominant lethal mutations?)mutations?)

Some are alleles of genes.Some are alleles of genes.

Page 14: Acknowledgements

Why are SNPs Important?Why are SNPs Important?

Alleles of health related genesAlleles of health related genes

Genetic Markers that are linked to every Genetic Markers that are linked to every gene (and to non-transcribed loci that gene (and to non-transcribed loci that may also affect health)may also affect health)

Fast, cheap, accurate genotypesFast, cheap, accurate genotypes

Population diversity & historyPopulation diversity & history

Genetic Association studies in Genetic Association studies in populationspopulations

PharmacogenomicsPharmacogenomics

Page 15: Acknowledgements

Genome Sequencing finds Genome Sequencing finds SNPsSNPs

The Human Genome Project The Human Genome Project involves sequencing DNA cloned involves sequencing DNA cloned from a number of different people.from a number of different people.

Even in a library made from from Even in a library made from from one person’s DNA, the homologous one person’s DNA, the homologous chromosomes have SNPschromosomes have SNPs

This inevitably leads to the This inevitably leads to the discovery of SNPs - any single base discovery of SNPs - any single base sequence differencesequence difference

Page 16: Acknowledgements
Page 17: Acknowledgements

We describe a map of 1.42 million single nucleotide polymorphisms (SNPs) distributed throughout the human genome, providing an average density on available sequence of one SNP every 1.9 kilobases. These SNPs were primarily discovered by two projects: The SNP Consortium and the analysis of clone overlaps by the International Human Genome Sequencing Consortium. The map integrates all publicly available SNPs with described genes and other genomic features. We estimate that 60,000 SNPs fall within exon (coding and untranslated regions), and 85% of exons are within 5 kb of the nearest SNP. Nucleotide diversity varies greatly across the genome, in a manner broadly consistent with a standard population genetic model of human history. This high-density SNP map provides a public resource for defining haplotype variation across the genome, and should help to identify biomedically important genes for diagnosis and therapy.

Page 18: Acknowledgements

GenBank has a dbSNPGenBank has a dbSNP

““As of Mar. 2007 , dbSNP has submissions for As of Mar. 2007 , dbSNP has submissions for 31,035,60731,035,607 human SNPs” human SNPs”

It is possible to search dbSNP by BLAST comparisons It is possible to search dbSNP by BLAST comparisons to a target sequenceto a target sequence

Page 19: Acknowledgements

>gnl|dbSNP|rs1042574_allelePos=51 total len = 101 |taxid = 9606|snpClass = 1 Length = 101

Score = 149 bits (75), Expect = 3e-33 Identities = 79/81 (97%) Strand = Plus / Plus

Query: 1489 ccctcttccctgacctcccaactctaaagccaagcactttatatttttctcttagatatt 1548 ||||||||||||||||||||||||||||||||||||||||||||||| || |||||||||Sbjct: 1 ccctcttccctgacctcccaactctaaagccaagcactttatattttcctyttagatatt 60

Query: 1549 cactaaggacttaaaataaaa 1569 |||||||||||||||||||||Sbjct: 61 cactaaggacttaaaataaaa 81

If a matchingSNP is found, then it can bedirectly located on the Genome map

Page 20: Acknowledgements

LinkageLinkage

Meiosis (sexual cell Meiosis (sexual cell division) involves a division) involves a process of crossing process of crossing over, which gives over, which gives new combinations of new combinations of allelesalleles

Genes that are Genes that are located close to each located close to each other on the other on the chromosome rarely chromosome rarely show recombination show recombination of allelesof alleles

Page 21: Acknowledgements

HapMapHapMap ProjectProject

The The HapMapHapMap Project tests linkage between Project tests linkage between SNPs in various sub-populations.SNPs in various sub-populations.

For a group of linked SNPs recombination may For a group of linked SNPs recombination may be rare over tens of thousands of basesbe rare over tens of thousands of bases

A few "A few "tagtag SNPsSNPs" can be used to identify " can be used to identify genotypes for groups of linked SNPsgenotypes for groups of linked SNPs

Makes it possible to survey the whole genome Makes it possible to survey the whole genome with fewer markers (1/3-1/10th)with fewer markers (1/3-1/10th)

Page 22: Acknowledgements

HaplotypeHaplotype Linkage is common in the human population,

particularly in genetically isolated sub-populations.

A group of alleles for neighboring genes on a segment of a chromosome are very often inherited together.

Such a combination of linked alleles is known as a haplotype.

When linked alleles are shared by members of a population, it is called a linkage disequilibrium.

Page 23: Acknowledgements

Haplotype Map of the Haplotype Map of the Human GenomeHuman Genome

Goals:

• Define patterns of genetic variation across human genome• Guide selection of SNPs efficiently to “tag” common

variants• Public release of all data (assays, genotypes)

Phase I: 1.3 M markers in 269 peoplePhase II: +2.8 M markers in 270 people

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 24: Acknowledgements

HapMap SamplesHapMap Samples

90 Yoruba individuals (30 parent-parent-90 Yoruba individuals (30 parent-parent-offspring trios) from Ibadan, Nigeria (YRI)offspring trios) from Ibadan, Nigeria (YRI)

90 individuals (30 trios) of European descent 90 individuals (30 trios) of European descent from Utah (CEU)from Utah (CEU)

45 Han Chinese individuals from Beijing (CHB)45 Han Chinese individuals from Beijing (CHB)

45 Japanese individuals from Tokyo (JPT)45 Japanese individuals from Tokyo (JPT)

Page 25: Acknowledgements

Recombination hotspots: Recombination hotspots: widespread - LD structurewidespread - LD structure

7q21

Page 26: Acknowledgements

Common HaplotypesCommon Haplotypes For a single locus in a population, For a single locus in a population, 5555 percent of percent of

people may have one version of a haplotype, people may have one version of a haplotype, 3030 percent may have another, percent may have another, 88 percent may percent may have a third, and the rest may have a variety have a third, and the rest may have a variety of less common haplotypes. of less common haplotypes.

These haplotype blocks may contain 5-20 SNPsThese haplotype blocks may contain 5-20 SNPs

Page 27: Acknowledgements

Common HaplotypesCommon Haplotypes All of these halplotypes can be identified by All of these halplotypes can be identified by

genotyping 1-3 "tag SNPs"genotyping 1-3 "tag SNPs"

Tag SNPs that contain most of the information Tag SNPs that contain most of the information about the patterns of human genetic variation about the patterns of human genetic variation are estimated to be about 300,000 to 600,000, are estimated to be about 300,000 to 600,000, which is far fewer than the 10 million common which is far fewer than the 10 million common SNPs.SNPs.

Page 28: Acknowledgements

Applications of HapMapApplications of HapMap

Pick better SNPs for genotyping studyPick better SNPs for genotyping study Choose SNPs with high heterozygosity in Choose SNPs with high heterozygosity in

target populationtarget population Whole genome coverage with reduced Whole genome coverage with reduced

set of "tag SNPs" (capture all "common set of "tag SNPs" (capture all "common variants")variants")

Interpret genotyping resultsInterpret genotyping results What genes are in LD with this SNP?What genes are in LD with this SNP? What coding variants and putative functional What coding variants and putative functional

variants are in LD with this SNP?variants are in LD with this SNP?

Page 29: Acknowledgements

Example: Example: Complement Factor H - AMDComplement Factor H - AMD

rs380390

Page 30: Acknowledgements

SNP TestingSNP Testing

GenotypingGenotyping SNPs are permanent features of genomic DNASNPs are permanent features of genomic DNA May be homozygous or heterozygousMay be homozygous or heterozygous Many different technologies are availableMany different technologies are available

Page 31: Acknowledgements

Genotyping TechnologiesGenotyping Technologies

Sequencing (whole genome or targeted)Sequencing (whole genome or targeted) PCR (allele specific primers)PCR (allele specific primers) Oligonucleotide ligationOligonucleotide ligation Primer extension (incorporate labeled Primer extension (incorporate labeled

nucleotides)nucleotides) Hybridization (microarray)Hybridization (microarray)

Page 32: Acknowledgements

TaqMan - rtPCRTaqMan - rtPCR

•4 oligos must be designed and tested for each SNP•Fast & cheap for lots of samples

Page 33: Acknowledgements

Primer ExtensionPrimer Extension

Page 34: Acknowledgements

Oligonucleotide Ligation (ABI)Oligonucleotide Ligation (ABI)

can multiplex 48 SNPs

Page 35: Acknowledgements

Preliminary data from Affy 10K Preliminary data from Affy 10K SNPSNP

Page 36: Acknowledgements

MicroarraysMicroarrays Screening large numbers of SNP markers on a Screening large numbers of SNP markers on a

sample of genomic DNA is one highly sample of genomic DNA is one highly promising application for microarray promising application for microarray technology.technology.

Many other “high-throughput” SNP genotyping Many other “high-throughput” SNP genotyping technologies are under development.technologies are under development.

Affymetrix 1million SNP product on sale now!Affymetrix 1million SNP product on sale now!

Page 37: Acknowledgements

Comparison of Methods?Comparison of Methods? Array-based methods can cover the whole Array-based methods can cover the whole

genomegenome PCR (& variants) are cheaper for defined PCR (& variants) are cheaper for defined

numbers of SNPs on lots of samplesnumbers of SNPs on lots of samples Whole genome: may be Whole genome: may be too much datatoo much data

false positivesfalse positives privacy concernsprivacy concerns

Whole genome may work for discovery research, Whole genome may work for discovery research, but clinical applications favor targeted assaysbut clinical applications favor targeted assays

Page 38: Acknowledgements

PharmacogenomicsPharmacogenomics The use of DNA sequence The use of DNA sequence

information to measure and information to measure and predict the reaction of individuals predict the reaction of individuals to drugs.to drugs.

Personalized drugsPersonalized drugs

Faster clinical trialsFaster clinical trials

Less drug side effectsLess drug side effects

Page 39: Acknowledgements

There are proteins that chemically activate There are proteins that chemically activate or inactivate drugs.or inactivate drugs.

Other proteins can directly enhance or Other proteins can directly enhance or block a drug's activity.block a drug's activity.

There are also genes that control side There are also genes that control side effectseffects

Some Gene Products Some Gene Products Interact Interact

with Drugswith Drugs

Page 40: Acknowledgements

ExampleExample

10% of African Americans have 10% of African Americans have polymorphic alleles of Glucose-6-polymorphic alleles of Glucose-6-phosphate dehydrogenase that phosphate dehydrogenase that lead to haemolyitic anemia when lead to haemolyitic anemia when they are given the anti-malarial they are given the anti-malarial drug primaquine.drug primaquine.

Page 41: Acknowledgements

These drug response phenotypes are associated These drug response phenotypes are associated with a set of specific gene alleles.with a set of specific gene alleles.

Identify populations of people who show specific Identify populations of people who show specific responses to a drug.responses to a drug.

In early clinical trials, it is possible to identify In early clinical trials, it is possible to identify people who react well and react poorly.people who react well and react poorly.

Collect Drug Response DataCollect Drug Response Data

Page 42: Acknowledgements

Scan these populations with a large number of Scan these populations with a large number of SNP markers.SNP markers.

Find markers linked to drug response Find markers linked to drug response phenotypes.phenotypes.

It is interesting, but not necessary, to identify It is interesting, but not necessary, to identify the exact genes involved.the exact genes involved.

Can work with “associated populations,” does Can work with “associated populations,” does not require detailed information on disease in not require detailed information on disease in family history(pedigree).family history(pedigree).

Make Genetic ProfilesMake Genetic Profiles

Page 43: Acknowledgements

Huge Database ProblemHuge Database Problem Physicians collect tons of dataPhysicians collect tons of data

patient age, sex, weight, blood pressure, patient age, sex, weight, blood pressure, family disease history, date of symptom family disease history, date of symptom onsetonset

Cancer data: tumor size, location, stage, etc. Cancer data: tumor size, location, stage, etc. Data specific to each type of diseaseData specific to each type of disease

Now integrate thousands (or 100K’s) of SNPs Now integrate thousands (or 100K’s) of SNPs that are correlated with some of these clinical that are correlated with some of these clinical factors in complex relationshipsfactors in complex relationships

Page 44: Acknowledgements

Use the ProfilesUse the Profiles

Genetic profiles of new patients can then be Genetic profiles of new patients can then be used to prescribe drugs more effectively & used to prescribe drugs more effectively & avoid adverse reactions.avoid adverse reactions.

Can also speed clinical trials by testing on Can also speed clinical trials by testing on those who are likely to respond well.those who are likely to respond well.

Can "rescue" drugs that don't work well on Can "rescue" drugs that don't work well on everybody, or that have bad side effects on everybody, or that have bad side effects on a few.a few.

Page 45: Acknowledgements

Real World ApplicationsReal World Applications

Most of the major pharmaceutical Most of the major pharmaceutical companies are currently collecting companies are currently collecting pharmacogenomic data in their pharmacogenomic data in their clinical trials.clinical trials.

Data is yet to be published.Data is yet to be published. Genetic indications for drug use are Genetic indications for drug use are

becoming available.becoming available. Plan to sell the drug with the gene Plan to sell the drug with the gene

testtest

Page 46: Acknowledgements

Multi-locus SNP ProfilesMulti-locus SNP Profiles There will be a few hundred to a few There will be a few hundred to a few

thousand SNPs linked to medically thousand SNPs linked to medically important alleles in the next ~10 important alleles in the next ~10 years.years.

Haplotypes will reduce the number Haplotypes will reduce the number that need to be screened (one SNP that need to be screened (one SNP gives information about a group of gives information about a group of linked genes)linked genes)

Some genes will turn out to be Some genes will turn out to be involved in many important pathwaysinvolved in many important pathways

Page 47: Acknowledgements

Will People Want This Will People Want This Information??Information??

Genetic determinism and possible Genetic determinism and possible discrimination.discrimination.

Even a simple test to see what Even a simple test to see what drug you should take could reveal drug you should take could reveal information about your risk of information about your risk of cancer or heart disease. cancer or heart disease.

Page 48: Acknowledgements

A tour in FBAT testingA tour in FBAT testing

Page 49: Acknowledgements

A tour in PythonA tour in Python

Page 50: Acknowledgements

Homework Assignment 4Homework Assignment 4

(R)(R)

check website for exercise and supplementary info: due 28

Oct

Page 51: Acknowledgements

Homework Assignment 6Homework Assignment 6

(FBAT)(FBAT)

check website: due 4 Nov

Page 52: Acknowledgements
Page 53: Acknowledgements