last lecture summary. recombinant dna technology dna polymerase (copy dna), restriction...

23
Last lecture summary

Upload: ronald-jennings

Post on 23-Dec-2015

247 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Last lecture summary. recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector

Last lecture summary

Page 2: Last lecture summary. recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector

• recombinant DNA technology• DNA polymerase (copy DNA), restriction endonucleases (cut

DNA), ligases (join DNA)• DNA cloning – vector (plasmid, BAC), PCR

• genome mapping

relative locations of genes are established by following inheritance patterns

visual appearance of a chromosome when stained and examined under a microscope

the order and spacing of the genes, measured in base pairs

sequence map

Page 3: Last lecture summary. recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector

• genetic markers• polymorphic (alternative alleles)• restriction fragment length polymorphisms (RFLPs)

• some restriction sites exist as two alleles

• simple sequence length polymorphisms (SSLPs)• repeat sequences, minisatellites (repeat unit up to 25 bp),

microsatellites (repeat unit of 2-4 bp)

• single nucleotide polymorphisms (SNPs, pron.: “snips”)• Positions in a genome where some individuals have one nucleotide and

others have a different nucleotide

RFLP SSLP

Page 4: Last lecture summary. recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector

New stuff

Page 5: Last lecture summary. recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector

DNA sequencing• Sanger method, chain-termination method,

developed 1974, Nobel prize in chemistry 1980• The key principle: use of dideoxynucleotide triphosphates

(ddNTPs) as DNA chain terminators.

source: http://openwetware.org/wiki/BE.109:Bio-material_engineering/Sequence_analysis

dNTP ddNTP

Page 7: Last lecture summary. recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector

Shotgun sequencing

Page 8: Last lecture summary. recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector

Target DNA

Copies of target DNA

Shotgun (restriction endonuclease)

Sequence each short piece(read, ~1kbp)

Sequence assembly

Consensus

Finalizing (directed read)source: slides by Martin Farach-Colton

contig

Page 9: Last lecture summary. recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector

Problems with repeats in the assembly

source: Brown T. A. , Genomes. 2nd ed. http://www.ncbi.nlm.nih.gov/books/NBK21129/

Page 10: Last lecture summary. recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector

Human genome project (HGP)• Determine the sequence of haploid human

genome• Govermentally funded (DOE)• Began in 1990, working draft published

in 2001, complete in 2003, last chromosome finished in 2006

• Cost: $3 billion• Whose genome was sequenced?

• The “reference genome” is a composite from several people who donated blood samples.

Page 11: Last lecture summary. recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector

Celera - competition begins

• In 1998, a similar privately

funded quest was launched

by the American researcher

Craig Venter and his company

Celera Genomics.• Finish the genome sequencing

within 3 years for $300,000,000.• Celera wanted to patent identified genes.• Celera promised to release data annually (while the HGP

daily). However, Celera would, unlike HGP, not permit free redistribution or scientific use of the data.

• HGP was compelled to release (7.7. 2000) the first draft of the human genome before Celera for this reason.

Page 12: Last lecture summary. recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector

How did it finish?• March 2000 – president Clinton announced that the

genome sequence could not be patented, and should be made freely available to all researchers.

• The statement sent Celera's stock plummeting and dragged down the biotechnology-heavy Nasdaq. The biotechnology sector lost about $50 billion in two days.

• Celera and HGP annouced jointly the draft sequence in 2000.

• The drafts covered about 83% of the genome. • Improved drafts were announced in 2003 and 2005, filling

in to ≈92% of the sequence currently.

Page 13: Last lecture summary. recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector

Human genome – some facts• 3 billions bps, ~20 000 – 25 000 genes• Only 1.1 – 1.4 % of the genome sequence codes for proteins.• State of completion:

• best estimate – 92.3% is complete• problematic unfinished regions: centromeres, telomeres (both contain

highly repetitive sequences), some unclosed gaps• It is likely that the centromeres and telomeres will remain unsequenced

until new technology is developed

Page 14: Last lecture summary. recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector

Databases• Genome is stored in databases

• Primary database (NCBI) – Genebank (http://www.ncbi.nlm.nih.gov/sites/entrez?db=nucleotide)

• Additional data and annotation, tools for visualizing and searching• UCSC (http://genome.ucsc.edu) … University of California – Santa Cruz• Ensembl (http://www.ensembl.org) … EBI+Sanger

• Chromosome• largest #1 = 250 Mbp, smallest #21 = 48 Mbp• http://www.ensembl.org/Homo_sapiens/Location/Genome

Page 15: Last lecture summary. recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector

Hierarchical genome shotgun – HGS• Hierarchical genome shotgun, hierarchical shotgun

sequencing, clone-by-clone sequencing, map-based shotgun sequencing, clone contig sequencing

• Adopted by HGP• Strategy “map first, sequence second”

• Create physical map• Divide chromosomes into smaller fragments.• Order (map) them to correspond to their respective

locations on the chromosomes.• Determine the base sequence of each of the mapped

fragments.

Page 16: Last lecture summary. recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector

http://www.nature.com/scitable/content/idealized-representation-of-the-hierarchical-shotgun-sequencing-48221

Multiplied genomic DNA

BAC fragments 160 kbp

Minimum tiling path (MTP)

BAC to besequenced

Shotgun clones 1 kbp

Sequencing and assembly

restriction endonuclease

clone in BAC

restriction endonuclease

clone in plasmid

Page 17: Last lecture summary. recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector

Minimum tiling path (MTP)• MTP – the lowest possible number of BACs to cover the

sequence.• MTP BACs are selected for sequencing.

http://en.wikipedia.org/wiki/Shotgun_sequencing

Page 18: Last lecture summary. recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector

Hierarchical genome shotgun – HGS1. Map genome

• As genetic markers (landmarks), short tagged sites (STS) were used (200 to 500 base pair DNA sequence that has a single occurrence in the genome)

2. Copy target DNA

3. Make BAC library• cleave randomly (partial cleavage by restriction endonuclease) all target

DNA copies into ~160kbp fragments, clone them in BACs

4. Physically map all BACs

5. Identify a minimum tiling path (MTP) BACs

6. Shotgun sequence only BACs at MTP• Divide BACs into ~1kbp fragments, do plasmid cloning, reconstruct

BAC sequence

7. Fill in gaps between BACS

8. Merge into consensus sequence

Page 19: Last lecture summary. recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector

Coverage• As it was shown, individual nucleotides are read more

than once.• Coverage is the average number of reads representing a

given nucleotide in the reconstructed sequence.• Let’s say that for a source strand of length G = 100 Kbp

we sequence R = 1 500 reads of average legth L = 500. • Thus, we collect N = RL = 750 Kbps of data.• So we have sequenced on average every bp in the source

N/G = 7.5 times.• The coverage is 7.5X• Coverage in HGS adopted by HGP was 8X.

Page 20: Last lecture summary. recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector

Whole genome shotgun – WGS• Adopted by Celera, expensive and time consuming

mapping is skipped, high coverage (20x) needed, new algorithms for assembly, repeats are problematic (HGP data used by Celera)

http://en.wikipedia.org/wiki/Shotgun_sequencing

Page 21: Last lecture summary. recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector

Genome assembly• Can be very computationally intensive when dealt at the

whole genome level.• Major challenges:

• sequence errors – can be corrected by drawing consensus sequence from an alignment of multiple overlapped sequences

• contamination by bacterial vectors – can be removed using filtering programs prior to assembly

• repeats

• Popular programs developed within HGP and still used: PHRED and PHRAP

Page 22: Last lecture summary. recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector

PHRED• Base caller – convert raw data from a sequencing

instrument into sequences and scores, score reflects the likelihood the base is correct/incorrect

ideal case

real case

Page 23: Last lecture summary. recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector

PHRAP• Sequence assembler• Takes PHRED base-call files with quality scores as input.• Aligns individual fragments in a pairwise fashion. The

base quality information is taken into account during the pairwise alignment.