structural, functional genome, transcriptome, proteome, metabolome, interactome genomics
TRANSCRIPT
• Structural, functional • Genome, Transcriptome, Proteome,
Metabolome, Interactome
www.the-scientist.com
Genomics
“What's the Difference? Well, as a rule, genetics is the study of single genes in isolation. Genomics is the study of all the genes in the genome and the interactions among them and their environment(s).
Analogy 1If genomics is like a garden, genetics is like a single plant. If the plant isn’t flowering, you could study the plant itself (genetics) or look at the surroundings to see if it is too crowded or shady (genomics) – both approaches are probably needed to find out how to make your plant blossom.”
http://www.genomebc.ca/education/articles/genomics-vs-genetics/
Genomics or Genetics?
Structural genomics for plant breeders and applied geneticists = molecular markers
• How many genes determine important traits?• Where these genes are located?• How do the genes interact? • What is the role of the environment in the phenotype?
• Molecular breeding: Gene discovery, characterization, and selection using molecular tools
• Molecular markers are a key implement in the molecular breeding toolkit
Genomics and Molecular Markers
Markers are based on polymorphisms
• Amplified fragment length polymorphism• Restriction fragment length polymorphism• Single nucleotide polymorphism
• The polymorphisms become the alleles at marker loci
• The marker locus is not necessarily a gene: the polymorphism may be in the dark matter, in a UTR, in an intron, or in an exon
• Non-coding regions may be more polymorphic
What is a Molecular Marker
• Changes in the nucleotide sequence of genomic DNA that can be transmitted to the descendants.
• If these changes occur in the sequence of a gene, it is called a mutant allele. The most frequent allele is called the wild type.
• A DNA sequence is polymorphic if there is variation among the individuals of the population.
DNA Mutations & Polymorphisms
5’ – AGCTGAACTCGACCTCGCGATCCGTAGTTAGACTAG -3’Wildtype
5’ – AGCTGAACTCGGCCTCGCGATCCGTAGTTAGACTAG -3’Substitution(transition: A G
5’ – AGCTCAACTCGACCTCGCGATCCGTAGTTAGACTAG -3’Substitution(transversion: G C)
5’ – AGCTAACTCGACCTCGCGATCCGTAGTTAGACTAG -3’Deletion(single bp)
C
5’ – AGCTTCGCGATCCGTAGTTAGACTAG -3’Deletion(DNA segment)
CAACTCGACC
Types of DNA Mutations (1)
5’ – AGCTGAACTCGACCTCGCGATCCGTAGTTAGACTAG - 3’Wildtype
5’ – AGCTGAACTACGACCTCGCGATCCGTAGTTAGACTAG - 3’Insertion(single bp)
5’ – AGCTGAACTAGTCTGCCCGACCTCGCGATCCGTAGTTAGACTAG -3’Insertion(DNA segment)
5’ – AGCAGTTGACGACCTCGCGATCCGTAGTTAGACTAG -3’Inversion
Tranposition: 5’ – AGCTCGACCTCGCGATCCGTAGTTATGAACGACTAG - 3’
Types of DNA Mutation (2)
A way of dealing with the • Large number of genes per genome• Huge genome size• Technical challenges and cost of whole genome sequencing
The search for DNA polymorphisms was not driven by a desire to complicate things, but rather by the low number of naked eye polymorphisms (NEPs)
Markers may be linked to target genesMarkers in target genes are perfect markers What is a perfect marker for a gene deletion?
Why Use Markers?
• Polymorphisms can be visualized at the metabolome, proteome, or transcriptome level but for a number of reasons (both technical and biological) DNA-level polymorphisms are currently the most targeted
• Regardless of whether it is a “perfect” or a “linked” DNA marker, there are two key considerations that need to be addressed in order for the researcher/user to visualize the underlying genetic polymorphism
• Applications in Mapping and Marker Assisted Breeding
DNA Markers
1. Finding and understanding the genetic basis of the DNA-level polymorphism, which may be as small as a single nucleotide polymorphism (SNP) or as large as an insertion/deletion (INDEL) of thousands of nucleotides
2. Detecting the polymorphism via a specific assay or "platform". The same DNA polymorphism may be amenable to different detection assays
Key steps for DNA Markers
1. Establish evolutionary relations: homoeology, synteny and orthology • Homoeology: Chromosomes, or chromosome segments, that are similar in
terms of the order and function of the genetic loci. Homoeologous chromosomes may occur within a single allopolyploid
individual (e.g. the A, B, and D genomes in wheat) May also be found in related species (e.g. the 1A, 1B, 1D series of wheat
and the 1H of barley)• Orthology: Refers to genes in different species which are so similar in sequence
that they are assumed to have originated from a single ancestral gene.• Synteny:
Classically refers to linked genes on same chromosome Also used to refer to conservation of gene order across species
2. Associations due to linkage or pleiotropy• Identify markers that can be used in marker assisted selection
3. Locate genes for qualitative and quantitative traits• Map-based cloning strategies
Applications of Marker Maps
Polymorphisms vs. assays
An ever-increasing number of technology platforms have been, and are being, developed to deal with these two key considerations
These platforms lead to a bewildering array of acronyms for different types of molecular markers. To add to the complexity, the same type of marker may be assayed on a variety of platforms
Ideal marker is one that targets the causal polymorphism (perfect marker). Not always available though…..
Polymorphism Detection Issues
Labeled 3’ TGGCTAGCT 5’Probe 3’ TGGCTAGCT 5’ |||||||||Target 1 5’-CCTAACCGATCGACTGAC-3’ 2 5’-GGATTGGCTAGCTGACTG-3’
Restriction Fragment Length Polymorphism (RFLP)
• RFLPs (Botstein et al. 1980) are differences in restriction fragment lengths caused by a SNP or INDEL that create or abolish restriction endonuclease recognition sites.
• RFLP assays are based on hybridization of a labeled DNA probe to a Southern blot (Southern 1975) of DNA digested with a restriction endonuclease
RFLP Steps
Allele A
Allele a
A
T
C
G
A
T
T
A
T
A
C
G
G
C
G
C
A
T
A
T
T
A
T
A
C
G
A
T
T
A
G
C
T
A
A
T
C
G
C
G
G
C
A
T
T
A
A
T
C
G
A
T
T
A
T
A
C
G
G
C
G
C
A
T
A
T
T
A
G
C
C
G
A
T
T
A
G
C
T
A
A
T
C
G
C
G
G
C
A
T
T
A
A a aa A Aaa A
Ind 1 Ind 2 Ind 5Ind 3 Ind 4 Ind 8Ind 6 Ind 7
Co-Dominant RFLP Polymorphism
Restriction Site
Allele A
Allele a
A
T
C
G
A
T
T
A
T
A
C
G
G
C
G
C
A
T
A
T
T
A
T
A
C
G
A
T
T
A
G
C
T
A
A
T
C
G
C
G
G
C
A
T
T
A
A
T
C
G
A
T
T
A
T
A
C
G
G
C
G
C
A
T
A
T
T
A
G
C
C
G
A
T
T
A
G
C
T
A
A
T
C
G
C
G
G
C
A
T
T
A
A a aa A Aaa A
Ind 1 Ind 2 Ind 5Ind 3 Ind 4 Ind 8Ind 6 Ind 7
Dominant RFLP Polymorphisms
Restriction Site
Features of RFLPs
• Co-dominant, unless probe contains restriction site• Locus-specific• Genes can be mapped directly• Supply of probes and markers is unlimited• Highly reproducible• Requires no special instrumentation• Radioactive detection……
Amplified Fragment Length Polymorphism (AFLP)
• Fragment genomic DNA with frequent and rare cutters• AFLPs (Vos et al. 1995) are differences in restriction
fragment lengths caused by SNPs or INDELs that create or abolish restriction endonuclease recognition sites.
• AFLP assays are performed by selectively amplifying a pool of restriction fragments using PCR.
Digestion with 2 restriction enzymes
EcoRI (1/4096) MseI (1/256)
Restriction site adapter ligation
T
A
T
A
5’3’
5’3’
Selective preamplification
C T T
A T G5’
3’
5’
3’
Amplification
AFLP Protocol
AFLP Polymorphisms• Polymorphisms between genotypes may arise from:
– Sequence variation in one or both restriction sites– Sequence variation in the region immediately adjacent to the
restriction sites– Insertions or deletions within an amplified fragment
• Band Detection– Denaturing polyacrylamide gel electrophoresis &
autoradiography or silver staining– Sequencing
Features of AFLPs
• Very high multiplex ratio• Very high throughput• Off-the-shelf technology• Fairly reproducible • Dominant and co-dominant• Radioactive detection but less hazardous options
available• Can convert favourite marker to SCAR
Simple Sequence Repeats (SSR)
• SSRs or microsatellites (Nakamura et al. 1987) are tandemly repeated mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide motifs
• SSR length polymorphisms are caused by differences in the number of repeats
• Assayed by PCR amplification using pairs of oligonucleotide primers specific to unique sequences flanking the SSR
• Detection by autoradiography, silver staining, sequencing…
Repeat Motifs
• AC repeats tend to be more abundant than other di-nucleotide repeat motifs in animals (Beckmann and Weber 1992)
• The most abundant di-nucleotide repeat motifs in plants, in descending order, are AT, AG, and AC (Lagercrantz et al. 1993; Morgante and Oliveri 1993)
• Typically, SSRs are developed for di-, tri-, and tetra-nucleotide repeat motifs
• CA and GA have been widely used in plants • Tetra-nucleotide repeats have the potential to be very highly
polymorphic; however, many are difficult to amplify
SSR Repeats
Simple sequence repeat in hazelnutNote the difference in repeat length AND the consistent flanking sequence
Individual 1 (AC)x9 Individual 2 (AC)x11
51 bp55 bp
Powell et al. 1995. Proc Natl Acad Sci U S A. 92(17): 7759–7763.
Chloroplast SSRs of pine
SSR Protocol
Features of SSRs
• Highly polymorphic• Highly abundant and randomly dispersed• Co-dominant • Locus-specific• High throughput • Can be automated
Diversity Arrays Technology - DArT
http://www.diversityarrays.com/
• 2,500 markers per sample• 94 samples - ~$4,500• ~ 2 cents per datapoint
DArT Analysis
Features of DArT
• Very high multiplex ratio• Very high throughput • Bi-allelic• Dominant marker system• Requires substantial investment • Fairly reproducible• DArT sequences now available
• DNA sequence variations that occur when a single nucleotide (A, T, C, or G) in the genome sequence is altered
Single Nucleotide Polymorphisms (SNP)
Alleles
…..ATGCTCTTACTGCTAGCGC………..ATGCTCTTACTGCTAGCGC………..ATGCTCTTCCTGCTAGCGC………..ATGCTCTTACTGCAAGCGC……
Single Nucleotide
Polymorphisms(SNPs)
Consensus…..ATGCTCTTNCTGCNAGCGC……
Features of SNP
• Highly abundant (1 every 200 bp in barley; Rostoks et al., 2005)
• Locus-specific• Co-dominant and bi-allelic• Basis for high-throughput and massively parallel
genotyping technologies• Genic rather than anonymous marker• Phenotype due to SNP can be mapped directly
SNP Detection Strategy• Locus specific system
– Many samples with few markers• Marker assisted selection in commercial breeding
programmes for key target characters• Addition of characteristic major genes to e.g. mapping
populations and association panels• KASP – buy master mix and synthesise own primers
• Genome wide system– Fewer samples with many markers
• Germplasm characterization, academic and breeding• Genotyping panels for GWAS• Illumina or Affymetrix for higher density arrays, costs↓
• What about bi-parental populations??
Affymetrix Axiom Technology• Two colour ligation based assay• Utilises unique oligonucleotide complementary to
flanking genomic sequence• Automated parallel processing
Wheat SNP Arrays
KASPTM Genotyping
More Information:http
://www.lgcgroup.com/services/genotyping/#.VCMgyPldWJ0
Wheat SNP Resourceswww.cerealsdb.uk.net
Wheat SNP Haplotypeswww.cerealsdb.uk.net
Sequencing Approaches
• RRL – Reduced Representation Library
• RAD-Seq – Restriction Site Associated DNA Sequencing
• GBS – Genotyping by Sequencing
• See Davey et al., (2011) Nature Reviews Genetics 12: 499-510
RADseq: Restriction-site Associated DNA markers
• Uses Illumina sequencing technology• Based on digestion with restriction enzymes. An adapter binds to the restriction
site and up to 5kb fragments are sequenced around the target size.• Bioinformatics work used to find SNPs on the amplified regions
Genotyping by Sequencing
Genotyping by Sequencing
ligation
P1 P2
PstI, MseIBarcode adaptor
Common adaptor
+ +
Pooling and cleanup
PCR enrichment
Library size analysis
Genomic DNA
Illumina sequencing
digestionGP x Morex map
*MR_1276826_1H0.0*MR_104832_1H1.7MR_137377_P5852F51_1H3.2
*MR_15700476.9MR_112662_P2522R48.0
MR_134866_P2478F3912.4MR_1558776_P8419R2014.1*MR_155877615.8
*MR_10722318.9*MR_11860921.1*MR_13627223.2BK_1688877_P193R4923.6
MR_1566497_P5071F5238.0MR_138882_P1622R438.9MR_1561831_P3942F5940.6MR_139179_P5528F2342.0MR_140361_P70F2543.7MR_1458736_P77R5544.6MR_131409_P671F2844.9BK_788008_P113R945.1MR_1560843_P1839F6145.3MR_135195_P8340R3445.9MR_128994_P1295F5846.0BK_2569298_P70F5946.1BK_2478601_P171F3646.2MR_1135837_P125R4546.3MR_144808_P4920F2546.4MR_140562_P405R5246.6MR_10966_P107R2146.8BK_2693165_P231R3347.2MR_1266903_P238R5448.5*MR_156717849.1MR_136889_P1033R749.6MR_1559182_P21590F950.4MR_1561783_P4985F3751.7MR_101181_P7543R853.6*BK_58298854.8*MR_14150456.3MR_120198_P1020R2470.3MR_139962_P433F2472.4*MR_13564575.6MR_1561237_P3808R779.1*MR_11026879.9MR_1558327_P6889F2281.0MR_1039081_P76R4783.3*MR_156642986.7BW_999558_P142F1592.1MR_1562271_P255R1793.4*MR_156934195.9
MR_141931_P2537F61114.0MR_134723_P5971F51115.7*BK_301066116.5*MR_146408_1H118.0
MR_109075_P4268F37_1H133.7MR_132049_P3206R58_1H136.7MR_1036344_P168R24138.4*MR_121539139.2*MR_1563012_1H140.5
1
*MR_1286190.0
*MR_1568534_2HL3.3
*MR_135496_2HL9.8MR_135496_P11711R3311.9MR_1565157_P2236F53_2HL13.0*MR_136074_2HL14.3
*MR_100568819.5MR_130829_P726F6320.1*MR_11604022.7*MR_156018823.5MR_120904_P162F1926.7*BW_149278828.8MR_1559679_P1442F2630.2MR_110436_P648R2432.4MR_102751_P1157R2332.6
*MR_13640740.0*MR_156227841.6*MR_13994842.0*MR_156839544.3MR_138589_P395F2546.5
MR_127779_P10445R350.9MR_1564529_P1596R4552.5
MR_142671_P2664F5957.2MR_122092_P5573F3257.5MR_1558515_P5631R1158.5
*MR_150137470.8
BK_540153_P1310F873.8
MR_141795_P216F5792.6BK_837501_P177F5294.3MR_137965_P6544R3295.5MR_144119_P1286F3996.2BK_932326_P250R2096.4*BK_151982297.3MR_108261_P3875R5797.9MR_143177_P4240F3098.0MR_1559558_P2766R4798.1BW_2039910_P27F1998.2BK_2323017_P183F2098.3MR_151040_P653F5998.6MR_1566116_P3952F6198.8BW_212248_P60F1598.9MR_138866_P5828R1799.0MR_128550_P1407R3299.1MR_138239_P3883F6099.9BW_995640_P66R43100.1BW_1563827_P115F49101.1BW_1880334_P144F33101.5MR_135936_P7299R52102.8MR_1560545_P1288F58103.8MR_1562102_P4921R13115.2MR_144736_P3402R44125.0MR_135631_P5364R50125.6MR_142805_P3716F28127.2MR_117787_P16F20132.5BW_860235_P7039R30135.4MR_13526_P100F27135.6MR_1558263_P1004R40136.6MR_134800_P2632F12138.4*MR_138683139.2*MR_138225140.5BW_941631_P2145R30141.4*MR_1117107142.6*MR_135823145.6
MR_1044900_P92R22156.0
2
*MR_1558729_3HS0.0*MR_122161_3HS1.7*MR_1071682.1MR_1557973_P10720R353.6
*MR_156259016.6
*MR_14759720.1MR_1561375_P1243R1621.4
MR_105908_P2026F1530.9MR_154974_P870R3131.8
MR_135433_P590F1947.8MR_1558131_P463F3849.2MR_1558686_P3301F6052.5MR_1566281_P6984R154.1*BW_167030154.4MR_135476_P1178R5455.1*BW_35281955.8*MR_140762356.2MR_136029_P2340R1956.3BW_1973916_P239F4356.5BK_343652_P3505R656.9*MR_156555457.1BK_1376240_P151R5657.3MR_124120_P1444R2957.4BW_1325607_P128F2657.6MR_103909_P1929R2157.7MR_120303_P1253R4057.8BK_2407688_P416R5857.9MR_141378_P30F2458.0BK_1861733_P2582R2958.1MR_128655_P966R858.2BK_2258647_P145F2458.4MR_120286_P596F2458.5MR_105969_P2010F4758.7MR_141688_P3417F6459.0MR_156480_P824F5759.2*MR_157053259.6BK_833635_P145R3559.7BK_538655_P631F4360.4*MR_151266161.3*MR_156663763.3MR_135723_P5464F5569.5*BW_161678771.4MR_1560072_P3136F1674.6MR_1558760_P710R1376.5*MR_155876077.6*MR_155790679.2MR_126674_P1695F1480.4MR_141625_P2405R3581.3MR_1488714_P64F6181.4MR_1558260_P3059F2781.8MR_1558586_P4265F1482.4MR_1560884_P818R2582.8*BK_284210685.5*MR_134557100.4MR_134626_P4265R35105.6*MR_117030108.9BK_1647625_P177F26117.9MR_1559011_P1779R11120.4MR_148389_P981R22120.6MR_139796_P4223F34124.2MR_116520_P1739R46125.9MR_145473_P1827R34126.8MR_143077_P2526R32127.4MR_139464_P3595R51127.8*MR_138895128.8*MR_137247129.3MR_138554_P3977F53130.8*MR_136112132.0MR_125855_P2720F24134.7MR_1566051_P821F15136.3MR_135524_P4003F58138.3*MR_1570494140.3*MR_1558791_3HL142.2*MR_1568158_3HL145.8
BW_1845219_P144F53159.4
*MR_1559531172.7
3
SNPs vs GbS• SNPs
– Minimal input, don’t even have to isolate DNA– Rapid turn around and data is ready to use– Markers in known genes and generally mapped– More useful in GWAS
• GbS– Now quite cheap and potentially many markers– Rapid generation of sequence output but markers are
anonymous• Find an expert bio-informatician to align your data and, if
possible, align to reference sequence
– More useful in bi-parental mapping studies
Line/Marker 11_21508 ari-eGP sdw1 11_20392Derkado a a a aB83-12/21/5 b b b bDH_001 b b a bDH_002 a a a aDH_003 b b b bDH_004 a a b aDH_005 b b a bDH_006 b b a bDH_007 a a a aDH_008 a a a aDH_009 a a a aDH_010 b b b bDH_011 a a a aDH_012 b b b bDH_013 a a b aDH_014 b b a bDH_015 b b b bDH_016 b b a bDH_018 a a a aDH_019 b b b bDH_020 b b b bDH_021 b b a bDH_022 a a a aDH_023 a a b aDH_024 b b a bDH_026 b b b bDH_027 a a a aDH_028 a a b aDH_029 a a b aDH_030 b b b bDH_031 b b b bDH_032 a a b aDH_033 a a b aDH_034 b b b bDH_035 a a b aDH_036 a a a a
11_2014511_2105611_1022111_21385
11_20210 11_1013211_2137411_2041111_21122
11_10028 11_1009311_20269 11_2028911_20496 11_1066711_20939 11_1079311_10942 11_1104211_11114 11_21490
11_1133211_10046 11_10568
11_1124411_20020 11_2013511_10262 11_2041211_20450 11_20472
11_1080911_10527
11_20361 11_1091411_2140011_21191
11_20062 11_1050911_20723 11_20820
11_1120711_10010 11_1060611_10639 11_2090611_20924 11_11431
11_2007211_20580 11_2150411_20740 11_1082911_20689 11_21151
11_1139811_10751
11_20119 11_2076211_1129211_11470
11_10510 11_10614mlo
mlo07646 mlo04264mlo0255911_1012311_10712
11_10611 11_1106611_1026911_2000711_1061011_11186
af459084_02
MillE
nF
erment
Gw
idthT
ot_Sugars
TG
WG
lucose
Grains
GT
25Sv
HW
E
SN
R
Viscosity
Ferm
_Ext
Yield
Head
GrainN
4H
Marker to Candidate Gene