genotyping, linkage mapping and binary data

Post on 20-Jun-2015

372 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Genotyping, Linkage Mapping and

Binary Data

Mohamed Atia OmarPh.D

Genome Mapping Research Dept. – AGERI – ARC

FAO Training , 2014 , Egypt

Genotyping

Overview

What is genotyping ?

The analysis of DNA-sequence variation

Genotype = the genetic constitution of an individual

1.7—2.0 million species

Estimates to 10 million

How much biodiversity

Important Terms

Variation : Any nucleotide change in the genome

Rare Polymorphism: Variation found in < 1% of population

Polymorphism : Variation found in ≥1% of population

Locus: Chromosomal location of a gene

Allele : alternative form of a gene or DNA sequence at a specific chromosomal location (locus)

Heterozygous: Feature of interest is different in both alleles

Homozygous : Feature of interest is identical in each allele

Hemizygous : Only one allele exists (X in Males)

What are the Types of Mutations /

Polymorphisms to be Genotyped?

There are six major classes of genetic variation:

1. Single base changes

2. Simple di-, tri-, tetranucleotide repeats

3. Small insertions or deletions

4. Larger, tandem repeats

5. Multi-gene (Megabase) duplication (CNV)

6. Complex rearrangements

Classes of Mutation

An example of one simple question:

How much variation is there?

What are the most Informative Classes for

Genotyping Studies ?

Polymorphism Type Nickname Heterozygosity

1. Single base changes SNP 1-50%

2. Simple di-, tri-, tetranucleotide repeats STR- short tandem repeats 50-90%

3. Small insertions or deletions INDELS - Insertions or deletion 1-50%

4. Larger, tandem repeats VNTR- variable # of tandem repeat 50-90%

5. Multi-gene (Megabase) duplication CNV - Copy Number Variation 1-50%

6. Complex rearrangements ----------- 1-50%

How many loci should be assayed?

Two strategies for selecting are possible:

• Select a few highly informative markers

• Select numerous, poorly informative, markers randomly

distributed within the genome

To scan the whole genomes…

Not like this……. but like this

Microcentrifuge Tube

96-well plates

384-well plates

Affymetrix genechip

Not like this……. but like thisSetting up

the reactions

Not like this……. but like this

Genotyped loci

10

10

100

100

1,000

1,000

10,000

10,000

100,000

100,000

Genoty

ped indiv

iduals

1,000,000

GWAS

validation and

candidate gene

association

Genome-Wide Association Studies

Plant and

animal

breeding for

selected traits

Candidate region

fine mapping

Fingerprinting, Whole genome scans

Diagnostics

Applications enabled by HTP genotypingDiagnostics, MAS, disease related genes, Domestication traits,

bar coding, industrial protection of genotypes

High Throughput genotyping techniques

Genotyped loci

10

10

100

100

1,000

1,000

10,000

10,000

100,000

100,000

Genoty

ped indiv

iduals

1,000,000

GoldenGate

assay

Infinium BeadChips

iselect

VeraCode

GoldenGate

SNPlex,

GenPlex

TaqMan

Openarrays

iPLEX

Gold

PyroseqSNaP

shot

Invader

TaqMan

BeadChips

Illumina

AB

Sequenom

Targeted GeneChips

Affymetrix

Illumina High-Density 1M-Duo chipIllumina

Affymetrix Genome-Wide Human SNP Array 6.0

Genome-Wide Association Studies

Two main suppliers for GWA: ILLUMINA and AFFYMETRIX

1) Hybridization

– Microarrays

– TaqMan, Molecular Beacons

2) Allele-specific PCR

– FRET

– Intercalating Dyes

3) Primer Extension

– MALDI-TOF (Matrix Assisted Laser Desorption/Ionization Time-of-flight mass spectrometry)

– SNaPshot (Single nucleotide primer extension)

4) Ligation

– Padlock Probes

– Rolling Circle Amplification

5) Endonuclease Cleavage

– RFLP

– PIRA/RFL

5 Basic Methodologies …..

RFLPs (Based on Endonuclease Cleavage)

Differences in DNA sequence generate different recognition sequences and DNA

cleavage sites for specific restriction enzymes

Two different genes will produce different fragment patterns when cut with the same

restriction enzyme due to differences in DNA sequence

Microarray (Based on Hybridization)

Purpose: multiple simultaneous measurements by hybridization of labeled

probe

DNA elements may be:

Oligonucleotides

cDNA’s

Large insert genomic clones

Microarray technologiesDNA microarrays

Ordered arrangement of multiple sets of DNA on solid support

Microarray chip

Affymetrix 100k chip set

Entire genome with 100 000 SNPs (low density).

Affymetrix 500k chip (SNP array 5.0)

Entire genome with 500 000 SNPs (high density)

Affymetrix 1M chip (SNP array 6.0)

Entire genome with 1 000 000 SNPs (very high density)

Organization of a DNA microarray

1.28 cm

Hybridization of a labeled probe to the microarray

Detection of hybridization on microarray

Light from laser

Hybridization intensities on DNA microarray

following laser scanning

A

BBB

(0)

AB

(0.5)

AA

(1)

SNPs

Single Nucleotide Polymorphisms

Change one nucleotide

Insert

Delete

Replace it with a different nucleotide

Many have no phenotypic effect

Some can disrupt or affect gene function

SNP genotyping methods

over 100 different approaches

Ideal SNP genotyping platform:

high-throughput capacity

simple assay design

robust

affordable price

automated genotype calling

accurate and reliable results

Overview of SNP array technology

A little more on SNPs

Most SNPs have only two alleles Easy to automate their

scoring

Becoming extremely popular

Typing Methods Sequencing

Restriction Site

Hybridization

Linkage Mapping

Overview

Types of Maps

Physical Maps

Complete or partially sequenced organisms

Cytogenetic Maps

Breakpoints in disease

Direct binding of probes to chromosome

Genetic Linkage Maps

Markers

What happens in meiosis…

Leads to formation of haploid

gametes from diploid cells

Assortment of genetic loci

Recombination or crossover

What is Linkage?

Linkage is defined genetically: the failure of two genes to assort independently.

Linkage occurs when two genes are close to each other on the same chromosome.

However, two genes on the same chromosome are called syntenic.

Linked genes are syntenic, but syntenic genes are not always linked. Genes farapart on the same chromosome assort independently: they are not linked.

Linkage is based on the frequency of crossing over between the two genes.

Crossing over occurs in prophase of meiosis 1, where homologous chromosomesbreak at identical locations and rejoin with each other.

Applications/Uses of Linkage Maps

Studying genome structure, organization and evolution.

Estimation of gene effects of important agronomic traits.

Tagging genes of interest to facilitate marker assisted

selection (MAS) programs.

Map based cloning

Identify genes responsible for traits.

Plants or Animals

Disease resistance

Meat or Milk Production, …… etc

Genetic Linkage Mapping Steps

Development of The Mapping Population

Genotyping of Mapping Population (Selection of suitable MM).

Linkage Analysis

Map Construction

QTL Identification (in case QTL-Mapping)

Marker-Assisted Selection.

Development of The Mapping Population

Linkage analysis

Linkage : alleles from two loci segregate together in a family.

Recombination fraction (θ): the probability of a marker and a susceptibility

locus segregating independently (recombination).

θ= 0.5 No linkage; θ< 0.5 linked together

1. Chance

2. Preferential Segregation (nonrandom segregation of non-

homologous chromosomes) - hinted at but not shown in humans

3. Linkage - the presence of loci measurably close together on the

same chromosome.

Reasons why alleles at different loci may not assort independently:

ƒParametric Lod-Score

ƒHaseman-Elston Sib-Pair

ƒAffected Sib-Pair and

Affected Relative Pair

ƒAffected Pedigree Member Method

ƒVariance Components Method

Types of Linkage Analysis

Recombination frequency

Ɵ =

A

B

a

b

50% non-rec and 50% rec

Total amount of recombinants

Total amount of recombinants + Total amount of non-recombinants

Theta

100% non-rec 0

0.5

GametesParent

90% non-rec and 10% rec

99% non-rec and 1% rec

0.1

0.01

In double heterozyote:

Cis configuration = mutant alleles of both genes are on the same chromosome = ab/AB

Trans configuration = mutant alleles are on different homologues of the same chromosome = Ab/aB

Genes with recombination frequencies less than 50 percent are on the samechromosome = linked)

Linkage group = all known genes on a chromosome

Two genes that undergo independent assortment have recombination frequency of50 percent and are located on nonhomologous chromosomes or far apart on thesame chromosome = unlinked

Recombination

Recombination between linked genes occurs at the same frequency whether alleles are in cis or trans configuration

Recombination frequency is specific for a particular pair of genes

Recombination frequency increases with increasing distances between genes

No matter how far apart two genes may be, the maximum frequency of

recombination between any two genes is 50 percent.

• Cross-over frequencies can be converted into map units.

• Ex: A 5% cross-over frequency equals 5 map units.

– gene A and gene B cross over 6.0

percent of the time

– gene B and gene C

cross over 12.5 percent

of the time

– gene A and gene C cross over 18.5 percent of the

time

Lod scores

1cM = 1MB

1MB=1000kb

1kb=1000bp

1cM = 1,000,000 bp

58

Genetic Mapping

The map distance (cM) between two genes equals one half the average number of crossovers in that region per meiotic cell

The recombination frequency between two genes indicates how much recombination is actually observed in a particular experiment; it is a measure of recombination

Over an interval so short that multiple crossovers are precluded (~ 10 percent recombination or less), the map distance equals the recombination frequency because all crossovers result in recombinant gametes.

Genetic map = linkage map = chromosome map

59

Gene Mapping: Crossing Over

Crossovers which occur outside the region between two genes will not alter their arrangement

The result of double crossovers between two

genes is indistinguishable from independent assortment of the genes

Crossovers involving three pairs of alleles specify gene order = linear sequence of genes

60

Genetic vs. Physical Distance

Map distances based on recombination

frequencies are not a direct measurement of

physical distance along a chromosome

Recombination “hot spots” overestimate physical

length

Low rates in heterochromatin and centromeres

underestimate actual physical length

Gene Mapping

Mapping function: the relation between genetic map distance and the frequency of recombination

Chromosome interference: crossovers in one region decrease the probability of a second crossover close by

Coefficient of coincidence = observed number of double recombinants divided by the expected number

Interference = 1-Coefficient of coincidence

Genetic distance

Genetic distance =

1 cMorgan = 0.01 recombinants = average of 1Mb (physical distance)

the genetic length over which one crossover occurs in 1% of

meiosis. This distance is expressed in cMorgan.

As double recombinants occur the further two loci are,

the frequency of recombination does not increase

proportionately.

(Assuming that the recombination frequency is uniform along the chromosomes)

Linkage related Concepts

Interference - A crossover in one region usually decreases the probability of a

crossover in an adjacent region.

CentiMorgan (cM) - 1 cM is the distance between genes for which the

recombination frequency is 1%.

Lod Score - a method to calculate linkage distances (to determine the distance

between genes).

Linkage vs. Association

Linkage analyses look for relationship between a marker and disease

within a family (could be different marker in each family)

Association analyses look for relationship between a marker and

disease between families (must be same marker in all families)

Binary Data

Overview

Binary Data definition

Binary data is data whose unit can take on only two

possible states, traditionally termed 0 and +1 in accordance

with the binary numeral system and Boolean algebra.

Levels of Binary Data Storage

Thank You

Any Questions ??

top related