molecular tools. 1.knowing how many genes determine a phenotype, and where the genes are located, is...

48
Molecular Tools

Upload: lindsey-delmore

Post on 15-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Molecular Tools

1. Knowing how many genes determine a phenotype, and where the genes are located, is a first step in understanding the genetic basis of a phenotype

2. A second step is determining the sequence of the gene, or genes, determining the phenotype and understanding how the expression of the genes is regulated at the transcriptional level 

3. Subsequent steps involve analysis of post-transcriptional events, understanding how the genes fit into metabolic pathways and how these pathways interact with the environment

Steps in Genetic Analysis

1. Complete genome sequences are coming, but aren't yet available for many plants  

2. The trend is sequencing with multiple applications - e.g. whole genomes, specific targets within genomes, or genotyping by sequencing (GBS)

3. Even when complete genome sequence information is available, there will always be reason to study allelic diversity and interactions at specific loci and to compare genome sequences of multiple individuals

4. Molecular tools are starting point

Step 2 & Sequencing Bottleneck

Barley DNA Sequence

• Total sequence is 5,300,000,000 base pairs– 165 % of human genome – Enough characters for 11,000 large novels

• Expressed Genes - 60,000,000 base pairs– Approx 1% of total sequence, like humans– 125 large novels

1. Getting DNAOften a rate limiting step unless massive investment in equipment

2. Cutting the DNA with restriction enzymesReducing complexity

3. Managing the pieces of DNA in vectors; collections of

pieces are maintained in libraries

4. Selecting DNA targets via amplification and/or hybridization

5. Determining nucleotide sequence of the targeted DNA

Molecular Tools for Step 2

genomic DNA:   Order your kit today! One-by-one (artisanal) to high-throughput (DNA from seed chips + robotics)  Leaf segments to cheek swabsKey considerations are • Concentration• Purity• Fragment size

Extracting DNA – Hard way or Easy way

Pioneer Way Monsanto Way

A. genomic DNA  B. cDNA. From mRNA to DNA

Extracting DNA – the cDNA Way

A. genomic DNA  B. cDNAC. synthetic DNA – oligonucleotides

Extracting DNA – the Oligo Way

Restriction enzymes make cuts at defined recognition sites in DNA• A defense system for bacteria, where they attack and degrade the

DNA of attacking bacteriophages• The restriction enzymes are named for the organism from which

they were isolated • Harnessed for the task of systematically breaking up DNA into

fragments of tractable size and for various polymorphism detection assays

• Each enzyme recognizes a particular DNA sequence and cuts in a specified fashion at the sequence

Cutting the DNA – Restriction Enzymes

Cutting the DNA – Restriction Enzymes

Cutting the DNA – Restriction Enzymes

• An enzyme that has a four-base recognition site will cut approximately every 256 bp (44) and more frequently than one with a six base recognition site, which in turn will cut more often than one with an eight base recognition site

• Palindrome recognition sites – the same sequence is specified when each strand of the double helix is read in the opposite direction

Sit on a potato pan, OtisCigar? Toss it in a can, it is so tragicUFO. tofuGolf? No sir, prefer prison flogFlee to me remote elfGnu dungLager, Sir, is regalTuna nut

Cutting the DNA – Restriction Enzymes

Vectors: The role of the vector is to propagate and maintain the DNA fragments generated by the restriction digestion • Efficiency and simplicity of

inserting and retrieving the inserted DNA fragments

• A key feature of the cloning vector is size of the DNA fragment insert that it can efficiently and reliably handle

• Example: the principle of cloning a DNA fragment in a plasmid vector

DNA Libraries – Vectors

Vector  Insert size(kb) 

Plasmid  ~ 1  

Lambda phage  ~ 20  

Bacterial Artificial Chromosomes (BAC)

~ 200

Common vectors and approximate insert sizes

DNA Libraries – Vectors

Libraries are repositories of DNA fragments cloned in their vectors • Libraries can be classified based on the cloning vector – e.g.

plasmid, BAC  • Alternatively, the library can be described in terms of the

source of the cloned DNA fragments – e.g. genomic, cDNA

DNA Libraries

• Total genomic DNA digested and the fragments cloned into an appropriate vector

• In principle, this library should consist of samples of all the genomic DNA present in the organism, including both coding and non-coding sequences

• Ideally, every copy of every gene (or a portion of every sequence) should be represented somewhere in the genomic library

• There are strategies for enriching genomic libraries for specific types of sequences and removing specific types of sequences – e.g. highly repetitive sequences  

DNA Libraries – Genomic

• A cDNA (complementary DNA) library is generated from mRNA transcripts, using the enzyme reverse transcriptase, which creates a DNA complement to a mRNA template

• The cDNA library is based on mRNA: therefore the library will represent only the genes that are expressed in the tissue and/or developmental stage that was sampled  

DNA Libraries – cDNA

• Invented by K.B Mullis in 1983

• Allows in vitro amplification of ANY DNA sequence in large numbers

• Design of two single stranded oligonucleotide primers complementary to motifs on the template DNA.

DNA Amplification: Polymerase Chain Reaction (PCR)

A Polymerase extends the 3’ end of the primer sequence using the DNA strand as a template.

DNA Amplification - PCR

• Each cycle can be repeated multiple times if the 3’ end of the primer is facing the target amplicon. The reaction is typically repeated 25-50 cycles.

• Each cycle generates exponential numbers of DNA fragments that are identical copies of the original DNA strand between the two binding sites.

• The PCR reaction consists of:• A buffer• DNA polymerase (thermostable)• Deoxyribonucleotide triphosphates (dNTPs)• Two primers (oligonucleotides)• Template DNA• Labelling as required

• And has the following steps:• Denaturing: raising the temperature to 94 C to make DNA single stranded• Annealing: lowering the temperature to 35 – 65 C the primers bind to the target sequences

on the template DNA• Elongation: DNA polymerase extends the 3’ ends of the primer sequence. Temperature

must be optimal for DNA polymerase activity.

DNA Amplification - PCR Principles

1st cycle

2nd cycle

DNA Amplification - PCR Exponential

• The choice of what DNA will be amplified by the polymerase is determined by the primers (short pieces of synthesized DNA - oligonucleotides) that prime the polymerase reaction

• The DNA between the primers is amplified by the polymerase: in subsequent reactions the original template, plus the newly amplified fragments, serve as templates

• Steps in the reaction include denaturing the target DNA to make it single-stranded, addition of the single stranded oligonucleotides, hybridization of the primers to the template, and primer extension 

DNA Amplification - PCR Priming

• The PCR process is repeated as necessary until the target fragment is sufficiently amplified that it can be isolated, visualized, and/or manipulated

• A key component of PCR is a thermostable polymerase, such as TAQ polymerase

• PCR can be used to amplify rare fragments from a pool of DNA, generate an abundance of a particular fragment from a single copy from a small sample (e.g. fossil DNA), generate samples of all DNA in a genome, and it is the foundation for many types of molecular markers

DNA Amplification - PCR Application

RT PCR / qRT PCR• Quantitative Real Time PCR

– Measurement of products during each cycle of PCR process

– Use in gene expression studies

TaqMan Assay qRT PCR for SNP detection

(de la Vega et al. 2005)

• Single strand nucleic acids have a natural tendency to find and pair with other single strand nucleic acids with a complementary sequence

• An application of this affinity is to label one single strand with a tag – radioactivity and fluorescent dyes are often used - and then to use this probe to find complementary sequences in a population of single stranded nucleic acids

• For example, if you have a cloned gene – either a cDNA or a genomic clone - you could use this as a probe to look for a homologous sequence in another DNA sample

DNA Hybridisation

• By denaturing the DNA in the sample, and using your labeled single stranded probe you can search the sample for the complementary sequence

• Pairing of probe and sample can be visualized by the label – e.g. on X-ray film or by measuring fluorescence

• The principle of hybridization can be applied to pairing events involving DNA: DNA; DNA: RNA; and protein: antibody

DNA Hybridisation

• Advances in technology have removed the technical obstacles to determining the nucleotide sequence of a gene, a chromosome region, or a whole genome

• The starting point for any sequencing project – be it of a single cloned fragment or of an entire genome - is a defined fragment of DNA 

Sequencing the DNA

1. Start with a defined fragment of DNA

2. Based on this template, generate a population of molecules differing in size by one base of known composition

3. Fractionate the population molecules based on size

4. The base at the truncated end of each of the fractionated molecules is determined and used to establish the nucleotide sequence

Sanger DNA Sequencing

A dideoxy nucleotide lacks a 3' OH and once incorporated, it will terminate strand synthesis. L-1. No free 3' OH

Sanger Sequencing - ddNTPs

deoxinucleotyde (dNTP) dideoxinucleotyde (ddNTP)

• Buffer• DNA polymerase• dNTPs• Labeled primer• Target DNA

ddGTP ddATP

ddTTPddCTP

Decoding DNA – Sanger Sequencing

G A C T

Separate gel lanes Single gel lane

*TTAAGTACATACCTAGTACCACTATATAATG

*TAAGTACATACCTAGTACCACTATATAATG

*TACATACCTAGTACCACTATATAATG

*TACCTAGTACCACTATATAATG

*TAGTACCACTATATAATG

*ACGCTTAAGTACATACCTAGTACCACTATATAATG

*AAGTACATACCTAGTACCACTATATAATG

*AGTACATACCTAGTACCACTATATAATG

*ATACCTAGTACCACTATATAATG

*ACCTAGTACCACTATATAATG

*AGTACCACTATATAATG

*GCTTAAGTACATACCTAGTACCACTATATAATG

*GTACATACCTAGTACCACTATATAATG

*GTACCACTATATAATG

*CGCTTAAGTACATACCTAGTACCACTATATAATG

*CATACCTAGTACCACTATATAATG

*CCTAGTACCACTATATAATG

*CTAGTACCACTATATAATG

Reading Sanger Sequencing

Link: http://www.wellcome.ac.uk/Education-resources/Education-and-learning/Resources/Animation/WTDV026689.htm

Next Generation Sequencing - Illumina

Next Generation Sequencing - Illumina

http://technology.illumina.com/technology/next-generation-sequencing/sequencing-technology.html

https://www.youtube.com/watch?v=womKfikWlxM

PAC Bio

https://www.youtube.com/watch?v=v8p4ph2MAvI

Single Molecule Real Time Sequencing

Sequencing Options

Method Read length Accuracy Reads per run Time per run Cost per 1 million bases (in US$) Advantages Disadvantages

Single-molecule real-time sequencing (Pacific Bio)

5,500 bp to 8,500 bp avg (10,000 bp); maximum read length >30,000 bases

99.999% consensus accuracy; 87% single-read accuracy

50,000 per SMRT cell, or ~400 megabases 30 minutes to 2 hours $0.33–$1.00

Longest read length. Fast. Detects 4mC, 5mC, 6mA.

Moderate throughput. Equipment can be very expensive.

Ion semiconductor (Ion Torrent sequencing)

up to 400 bp 98% up to 80 million 2 hours $1 Less expensive equipment. Fast. Homopolymer errors.

Pyrosequencing (454) 700 bp 99.9% 1 million 24 hours $10 Long read size. Fast. Runs are expensive. Homopolymer errors.

Sequencing by synthesis (Illumina) 50 to 300 bp 98% up to 3 billion

1 to 10 days, depending upon sequencer and specified read length

$0.05 to $0.15

Potential for high sequence yield, depending upon sequencer model and desired application.

Equipment can be very expensive. Requires high concentrations of DNA.

Sequencing by ligation (SOLiD sequencing)

50+35 or 50+50 bp 99.9% 1.2 to 1.4 billion 1 to 2 weeks $0.13 Low cost per base.

Slower than other methods. Have issue sequencing palindromic sequence.

Chain termination (Sanger sequencing) 400 to 900 bp 99.9% N/A 20 minutes to 3 hours $2400

Long individual reads. Useful for many applications.

More expensive and impractical for larger sequencing projects.

• Genomes come in many sizes: the units are base pairs (bp) kilobases (kb) or megabases (Mb)

• The genome of an organism is the complete set of genes specifying how its phenotype will develop (under a certain set of environmental conditions) plus the “dark matter”

• In this sense, then, diploid organisms (like ourselves) contain two genomes, one inherited from our mother, the other from our father

Whole Genome Sequencing (WGS)

• Arabidopsis thaliana has the smallest genome known in the plant kingdom (135 Mb) and for this reason has become a favorite of plant molecular biologists

• Even though Psilotum nudum (the "whisk fern") is a far simpler plant than Arabidopsis (it has no true leaves, flowers, or fruit the genome size is 2.5 x 1011 Mb

• The C value paradox!• Dealing with the C value paradox and whole genome

sequencing

Genome Size and WGS

Technologies for whole genome sequencing are evolving very rapidly and too fast for us to compare and contrast in this class

• Key words • Massively parallel • High throughput• Second generation• Next generation

Automated Sequencing

Technologies for whole genome sequencing are evolving very rapidly and too fast for us to compare and contrast in this class

• Key aspects • Library preparation • Sequencing by synthesis• Signal capture

• Key considerations • Cost • Speed • Read length • Assembly  

Sequencing Developments

RNA Sequencing (RNASeq)• Whole Transcriptome analysis

– Extract mRNA from your tissue of choice– Fragment and convert to cDNA and sequence– Assemble and look for abundance of transcripts

RNASeq and gene structure

Gene 1 Gene 2Primary target

Probe selection

Capture Target

•Exome refers to all of the exons in the genome (~1% barley genome)•Effectively reduces genome from 5300Mb to 60Mb

•Design based on Morex genome assembly v3, flcDNA and RNAseq data

•1 Lane HiSeq2000 – 40Gb = ~600X coverage

Exome Capture

Exome capture: processing

Exome capture: pilot testing

OSU Sequencing Resources

Utilisation of WGS Technologies

The Human Genome Project