genotyping, linkage mapping and binary data
TRANSCRIPT
Genotyping, Linkage Mapping and
Binary Data
Mohamed Atia OmarPh.D
Genome Mapping Research Dept. – AGERI – ARC
FAO Training , 2014 , Egypt
Genotyping
Overview
What is genotyping ?
The analysis of DNA-sequence variation
Genotype = the genetic constitution of an individual
1.7—2.0 million species
Estimates to 10 million
How much biodiversity
Important Terms
Variation : Any nucleotide change in the genome
Rare Polymorphism: Variation found in < 1% of population
Polymorphism : Variation found in ≥1% of population
Locus: Chromosomal location of a gene
Allele : alternative form of a gene or DNA sequence at a specific chromosomal location (locus)
Heterozygous: Feature of interest is different in both alleles
Homozygous : Feature of interest is identical in each allele
Hemizygous : Only one allele exists (X in Males)
What are the Types of Mutations /
Polymorphisms to be Genotyped?
There are six major classes of genetic variation:
1. Single base changes
2. Simple di-, tri-, tetranucleotide repeats
3. Small insertions or deletions
4. Larger, tandem repeats
5. Multi-gene (Megabase) duplication (CNV)
6. Complex rearrangements
Classes of Mutation
An example of one simple question:
How much variation is there?
What are the most Informative Classes for
Genotyping Studies ?
Polymorphism Type Nickname Heterozygosity
1. Single base changes SNP 1-50%
2. Simple di-, tri-, tetranucleotide repeats STR- short tandem repeats 50-90%
3. Small insertions or deletions INDELS - Insertions or deletion 1-50%
4. Larger, tandem repeats VNTR- variable # of tandem repeat 50-90%
5. Multi-gene (Megabase) duplication CNV - Copy Number Variation 1-50%
6. Complex rearrangements ----------- 1-50%
How many loci should be assayed?
Two strategies for selecting are possible:
• Select a few highly informative markers
• Select numerous, poorly informative, markers randomly
distributed within the genome
To scan the whole genomes…
Not like this……. but like this
Microcentrifuge Tube
96-well plates
384-well plates
Affymetrix genechip
Not like this……. but like thisSetting up
the reactions
Not like this……. but like this
Genotyped loci
10
10
100
100
1,000
1,000
10,000
10,000
100,000
100,000
Genoty
ped indiv
iduals
1,000,000
GWAS
validation and
candidate gene
association
Genome-Wide Association Studies
Plant and
animal
breeding for
selected traits
Candidate region
fine mapping
Fingerprinting, Whole genome scans
Diagnostics
Applications enabled by HTP genotypingDiagnostics, MAS, disease related genes, Domestication traits,
bar coding, industrial protection of genotypes
High Throughput genotyping techniques
Genotyped loci
10
10
100
100
1,000
1,000
10,000
10,000
100,000
100,000
Genoty
ped indiv
iduals
1,000,000
GoldenGate
assay
Infinium BeadChips
iselect
VeraCode
GoldenGate
SNPlex,
GenPlex
TaqMan
Openarrays
iPLEX
Gold
PyroseqSNaP
shot
Invader
TaqMan
BeadChips
Illumina
AB
Sequenom
Targeted GeneChips
Affymetrix
Illumina High-Density 1M-Duo chipIllumina
Affymetrix Genome-Wide Human SNP Array 6.0
Genome-Wide Association Studies
Two main suppliers for GWA: ILLUMINA and AFFYMETRIX
1) Hybridization
– Microarrays
– TaqMan, Molecular Beacons
2) Allele-specific PCR
– FRET
– Intercalating Dyes
3) Primer Extension
– MALDI-TOF (Matrix Assisted Laser Desorption/Ionization Time-of-flight mass spectrometry)
– SNaPshot (Single nucleotide primer extension)
4) Ligation
– Padlock Probes
– Rolling Circle Amplification
5) Endonuclease Cleavage
– RFLP
– PIRA/RFL
5 Basic Methodologies …..
RFLPs (Based on Endonuclease Cleavage)
Differences in DNA sequence generate different recognition sequences and DNA
cleavage sites for specific restriction enzymes
Two different genes will produce different fragment patterns when cut with the same
restriction enzyme due to differences in DNA sequence
Microarray (Based on Hybridization)
Purpose: multiple simultaneous measurements by hybridization of labeled
probe
DNA elements may be:
Oligonucleotides
cDNA’s
Large insert genomic clones
Microarray technologiesDNA microarrays
Ordered arrangement of multiple sets of DNA on solid support
Microarray chip
Affymetrix 100k chip set
Entire genome with 100 000 SNPs (low density).
Affymetrix 500k chip (SNP array 5.0)
Entire genome with 500 000 SNPs (high density)
Affymetrix 1M chip (SNP array 6.0)
Entire genome with 1 000 000 SNPs (very high density)
Organization of a DNA microarray
1.28 cm
Hybridization of a labeled probe to the microarray
Detection of hybridization on microarray
Light from laser
Hybridization intensities on DNA microarray
following laser scanning
A
BBB
(0)
AB
(0.5)
AA
(1)
SNPs
Single Nucleotide Polymorphisms
Change one nucleotide
Insert
Delete
Replace it with a different nucleotide
Many have no phenotypic effect
Some can disrupt or affect gene function
SNP genotyping methods
over 100 different approaches
Ideal SNP genotyping platform:
high-throughput capacity
simple assay design
robust
affordable price
automated genotype calling
accurate and reliable results
Overview of SNP array technology
A little more on SNPs
Most SNPs have only two alleles Easy to automate their
scoring
Becoming extremely popular
Typing Methods Sequencing
Restriction Site
Hybridization
Linkage Mapping
Overview
Types of Maps
Physical Maps
Complete or partially sequenced organisms
Cytogenetic Maps
Breakpoints in disease
Direct binding of probes to chromosome
Genetic Linkage Maps
Markers
What happens in meiosis…
Leads to formation of haploid
gametes from diploid cells
Assortment of genetic loci
Recombination or crossover
What is Linkage?
Linkage is defined genetically: the failure of two genes to assort independently.
Linkage occurs when two genes are close to each other on the same chromosome.
However, two genes on the same chromosome are called syntenic.
Linked genes are syntenic, but syntenic genes are not always linked. Genes farapart on the same chromosome assort independently: they are not linked.
Linkage is based on the frequency of crossing over between the two genes.
Crossing over occurs in prophase of meiosis 1, where homologous chromosomesbreak at identical locations and rejoin with each other.
Applications/Uses of Linkage Maps
Studying genome structure, organization and evolution.
Estimation of gene effects of important agronomic traits.
Tagging genes of interest to facilitate marker assisted
selection (MAS) programs.
Map based cloning
Identify genes responsible for traits.
Plants or Animals
Disease resistance
Meat or Milk Production, …… etc
Genetic Linkage Mapping Steps
Development of The Mapping Population
Genotyping of Mapping Population (Selection of suitable MM).
Linkage Analysis
Map Construction
QTL Identification (in case QTL-Mapping)
Marker-Assisted Selection.
Development of The Mapping Population
Linkage analysis
Linkage : alleles from two loci segregate together in a family.
Recombination fraction (θ): the probability of a marker and a susceptibility
locus segregating independently (recombination).
θ= 0.5 No linkage; θ< 0.5 linked together
1. Chance
2. Preferential Segregation (nonrandom segregation of non-
homologous chromosomes) - hinted at but not shown in humans
3. Linkage - the presence of loci measurably close together on the
same chromosome.
Reasons why alleles at different loci may not assort independently:
ƒParametric Lod-Score
ƒHaseman-Elston Sib-Pair
ƒAffected Sib-Pair and
Affected Relative Pair
ƒAffected Pedigree Member Method
ƒVariance Components Method
Types of Linkage Analysis
Recombination frequency
Ɵ =
A
B
a
b
50% non-rec and 50% rec
Total amount of recombinants
Total amount of recombinants + Total amount of non-recombinants
Theta
100% non-rec 0
0.5
GametesParent
90% non-rec and 10% rec
99% non-rec and 1% rec
0.1
0.01
In double heterozyote:
Cis configuration = mutant alleles of both genes are on the same chromosome = ab/AB
Trans configuration = mutant alleles are on different homologues of the same chromosome = Ab/aB
Genes with recombination frequencies less than 50 percent are on the samechromosome = linked)
Linkage group = all known genes on a chromosome
Two genes that undergo independent assortment have recombination frequency of50 percent and are located on nonhomologous chromosomes or far apart on thesame chromosome = unlinked
Recombination
Recombination between linked genes occurs at the same frequency whether alleles are in cis or trans configuration
Recombination frequency is specific for a particular pair of genes
Recombination frequency increases with increasing distances between genes
No matter how far apart two genes may be, the maximum frequency of
recombination between any two genes is 50 percent.
• Cross-over frequencies can be converted into map units.
• Ex: A 5% cross-over frequency equals 5 map units.
– gene A and gene B cross over 6.0
percent of the time
– gene B and gene C
cross over 12.5 percent
of the time
– gene A and gene C cross over 18.5 percent of the
time
Lod scores
1cM = 1MB
1MB=1000kb
1kb=1000bp
1cM = 1,000,000 bp
58
Genetic Mapping
The map distance (cM) between two genes equals one half the average number of crossovers in that region per meiotic cell
The recombination frequency between two genes indicates how much recombination is actually observed in a particular experiment; it is a measure of recombination
Over an interval so short that multiple crossovers are precluded (~ 10 percent recombination or less), the map distance equals the recombination frequency because all crossovers result in recombinant gametes.
Genetic map = linkage map = chromosome map
59
Gene Mapping: Crossing Over
Crossovers which occur outside the region between two genes will not alter their arrangement
The result of double crossovers between two
genes is indistinguishable from independent assortment of the genes
Crossovers involving three pairs of alleles specify gene order = linear sequence of genes
60
Genetic vs. Physical Distance
Map distances based on recombination
frequencies are not a direct measurement of
physical distance along a chromosome
Recombination “hot spots” overestimate physical
length
Low rates in heterochromatin and centromeres
underestimate actual physical length
Gene Mapping
Mapping function: the relation between genetic map distance and the frequency of recombination
Chromosome interference: crossovers in one region decrease the probability of a second crossover close by
Coefficient of coincidence = observed number of double recombinants divided by the expected number
Interference = 1-Coefficient of coincidence
Genetic distance
Genetic distance =
1 cMorgan = 0.01 recombinants = average of 1Mb (physical distance)
the genetic length over which one crossover occurs in 1% of
meiosis. This distance is expressed in cMorgan.
As double recombinants occur the further two loci are,
the frequency of recombination does not increase
proportionately.
(Assuming that the recombination frequency is uniform along the chromosomes)
Linkage related Concepts
Interference - A crossover in one region usually decreases the probability of a
crossover in an adjacent region.
CentiMorgan (cM) - 1 cM is the distance between genes for which the
recombination frequency is 1%.
Lod Score - a method to calculate linkage distances (to determine the distance
between genes).
Linkage vs. Association
Linkage analyses look for relationship between a marker and disease
within a family (could be different marker in each family)
Association analyses look for relationship between a marker and
disease between families (must be same marker in all families)
Binary Data
Overview
Binary Data definition
Binary data is data whose unit can take on only two
possible states, traditionally termed 0 and +1 in accordance
with the binary numeral system and Boolean algebra.
Levels of Binary Data Storage
Thank You
Any Questions ??