martijn vermaat department of human genetics center for ......alignment methods sequence alignment...

47
Alignment methods Martijn Vermaat Department of Human Genetics Center for Human and Clinical Genetics

Upload: others

Post on 08-Mar-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Alignment methods

Martijn Vermaat

Department of Human Genetics

Center for Human and Clinical Genetics

Page 2: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Alignment methods

Sequence alignment

Assembly vs alignment

Alignment methods

Common issues

Platform specifics

Software

Metagenomics course 1/28 Thursday, 7 February 2013

Page 3: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Sequence alignment

Identifying regions of similarity in sequences

Metagenomics course 2/28 Thursday, 7 February 2013

Page 4: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Sequence alignment

Identifying regions of similarity in sequences

In NGS

• Recovering original nucleotide sequence

• . . . from many short fragments

• . . . using a known reference

Metagenomics course 2/28 Thursday, 7 February 2013

Page 5: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Sequence alignment

Pairwise alignment

Metagenomics course 3/28 Thursday, 7 February 2013

Page 6: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Sequence alignment

Multiple sequence alignment

Metagenomics course 4/28 Thursday, 7 February 2013

Page 7: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Sequence alignment

Global vs local alignment

Metagenomics course 5/28 Thursday, 7 February 2013

Page 8: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Sequence alignment

Structural alignment

Metagenomics course 6/28 Thursday, 7 February 2013

Page 9: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Assembly vs alignment

Alignment methods

Sequence alignment

Assembly vs alignment

Alignment methods

Common issues

Platform specifics

Software

Metagenomics course 7/28 Thursday, 7 February 2013

Page 10: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Assembly vs alignment

Assembly

Metagenomics course 8/28 Thursday, 7 February 2013

Page 11: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Assembly vs alignment

Assembly

Alignment

Metagenomics course 8/28 Thursday, 7 February 2013

Page 12: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Assembly vs alignment

Assembly

• Memory hungry

• Needs high coverage

Metagenomics course 9/28 Thursday, 7 February 2013

Page 13: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Assembly vs alignment

Assembly

• Memory hungry

• Needs high coverage

Alignment

• Easy to do in parallel

• Restricted by reference sequence

• highly polymorphic regions• large insertions

Metagenomics course 9/28 Thursday, 7 February 2013

Page 14: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Alignment methods

Alignment methods

Sequence alignment

Assembly vs alignment

Alignment methods

Common issues

Platform specifics

Software

Metagenomics course 10/28 Thursday, 7 February 2013

Page 15: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Alignment methods

Smith-Waterman

• Generalization of Needleman-Wunsch• Guaranteed optimal alignment

− A C A C A C T A

− 0 0 0 0 0 0 0 0 0

A 0 2 1 2 1 2 1 0 2

G 0 1 1 1 1 1 1 0 1

C 0 0 3 2 3 2 3 2 1

A 0 2 2 5 4 5 4 3 4

C 0 1 4 4 7 6 7 6 5

A 0 2 3 6 6 9 8 7 8

C 0 1 4 5 8 8 11 10 9

A 0 2 3 6 7 10 10 10 12

gap penalty = −1

match = +2

mismatch = −1

Metagenomics course 11/28 Thursday, 7 February 2013

Page 16: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Alignment methods

2-step alignment

Metagenomics course 12/28 Thursday, 7 February 2013

Page 17: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Alignment methods

2-step alignment

Step 1: Find candidate positions

• Use read seeds• Hash table-based or Burrows-Wheeler transform-based

heuristic• Balance between speed and accuracy

Metagenomics course 12/28 Thursday, 7 February 2013

Page 18: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Alignment methods

2-step alignment

Step 2: Align and report

• Complete alignment with Smith-Waterman• Evaluate alignment(s)

Metagenomics course 12/28 Thursday, 7 February 2013

Page 19: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Common issues

Alignment methods

Sequence alignment

Assembly vs alignment

Alignment methods

Common issues

Platform specifics

Software

Metagenomics course 13/28 Thursday, 7 February 2013

Page 20: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Common issues

Insertions and deletions (indels)

Metagenomics course 14/28 Thursday, 7 February 2013

Page 21: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Common issues

Insertions and deletions (indels)

• Local realignment around indels• Per-Base Alignment Qualities (BAQ)

Metagenomics course 14/28 Thursday, 7 February 2013

Page 22: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Common issues

Non-unique alignment

How to report non-unique alignments?

Metagenomics course 15/28 Thursday, 7 February 2013

Page 23: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Common issues

Non-unique alignment

How to report non-unique alignments?

• Discard entirely

• Choose one randomly

• Report all

• with best quality• above some quality

Depends on the tool

Metagenomics course 15/28 Thursday, 7 February 2013

Page 24: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Common issues

Structural variation

• Chromosomal relocation

• Inversion

• Large indels

• Copy-number variation

Use specialized tools

Metagenomics course 16/28 Thursday, 7 February 2013

Page 25: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Common issues

Split-read mapping

• Allow aligned read to be split• For example RNA reads on DNA reference

Metagenomics course 17/28 Thursday, 7 February 2013

Page 26: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Common issues

Split-read mapping

• Allow aligned read to be split• For example RNA reads on DNA reference

Metagenomics course 17/28 Thursday, 7 February 2013

Page 27: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Common issues

Circular alignment

• Circular genome (e.g. bacteria, mitochondria)

Metagenomics course 18/28 Thursday, 7 February 2013

Page 28: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Common issues

Circular alignment

• Circular genome (e.g. bacteria, mitochondria)• Most aligners assume linear reference

Metagenomics course 18/28 Thursday, 7 February 2013

Page 29: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Common issues

Circular alignment

• Circular genome (e.g. bacteria, mitochondria)• Most aligners assume linear reference• Trick: extend reference

Metagenomics course 18/28 Thursday, 7 February 2013

Page 30: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Common issues

Circular alignment

• Circular genome (e.g. bacteria, mitochondria)• Most aligners assume linear reference• Trick: extend reference

• copy first N bases to the end

Metagenomics course 18/28 Thursday, 7 February 2013

Page 31: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Common issues

Circular alignment

• Circular genome (e.g. bacteria, mitochondria)• Most aligners assume linear reference• Trick: extend reference

• copy first N bases to the end• restore alignment to original reference

Metagenomics course 18/28 Thursday, 7 February 2013

Page 32: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Platform specifics

Alignment methods

Sequence alignment

Assembly vs alignment

Alignment methods

Common issues

Platform specifics

Software

Metagenomics course 19/28 Thursday, 7 February 2013

Page 33: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Platform specifics

Paired-end sequencing

Metagenomics course 20/28 Thursday, 7 February 2013

Page 34: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Platform specifics

Paired-end sequencing

• Align reads separately• Choose from non-unique alignments based on pairing

Metagenomics course 20/28 Thursday, 7 February 2013

Page 35: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Platform specifics

Color-space (or SOLiD) reads

• Used by 454, Solexa, SOLiD systems• Di-nucleotide encoding• Needs support from alignment software

Metagenomics course 21/28 Thursday, 7 February 2013

Page 36: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Platform specifics

Color-space (or SOLiD) reads

• Used by 454, Solexa, SOLiD systems• Di-nucleotide encoding• Needs support from alignment software

Metagenomics course 21/28 Thursday, 7 February 2013

Page 37: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Platform specifics

Color-space (or SOLiD) reads

Decoding

Metagenomics course 22/28 Thursday, 7 February 2013

Page 38: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Platform specifics

Error profile

• Homopolymers• CG-content• Positional (example shown)

Metagenomics course 23/28 Thursday, 7 February 2013

Page 39: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Software

Alignment methods

Sequence alignment

Assembly vs alignment

Alignment methods

Common issues

Platform specifics

Software

Metagenomics course 24/28 Thursday, 7 February 2013

Page 40: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Software

Some popular aligners for NGS

Hash table-based

• Eland• MAQ

Metagenomics course 25/28 Thursday, 7 February 2013

Page 41: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Software

Some popular aligners for NGS

Hash table-based

• Eland• MAQ

Burrows-Wheeler Transform-based

• Bowtie• BWA

Metagenomics course 25/28 Thursday, 7 February 2013

Page 42: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Software

Some popular aligners for NGS

Hash table-based

• Eland• MAQ

Burrows-Wheeler Transform-based

• Bowtie• BWA

Split-read alignment

• Tophat• GSNAP• Mosaik

Metagenomics course 25/28 Thursday, 7 February 2013

Page 43: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Software

Viewers

• IGV, Savant, Geneyous, Tablet

Metagenomics course 26/28 Thursday, 7 February 2013

Page 44: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Software

Viewers

• IGV, Savant, Geneyous, Tablet• tview (console-based)

Metagenomics course 26/28 Thursday, 7 February 2013

Page 45: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Software

Viewers

• IGV, Savant, Geneyous, Tablet• tview (console-based)• UCSC Genome Browser, GBrowse (web-based)

Metagenomics course 26/28 Thursday, 7 February 2013

Page 46: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Questions?

Acknowledgements:

Jeroen Laros

Bas E. Dutilh

Metagenomics course 27/28 Thursday, 7 February 2013

Page 47: Martijn Vermaat Department of Human Genetics Center for ......Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software

Questions?

Image sources

cbsu.tc.cornell.edu/ngw2010/day2 lecture1.pdf

en.wikipedia.org/wiki/Sequence alignment

en.wikipedia.org/wiki/Multiple sequence alignment

www.pitt.edu/ mcs2/teaching/biocomp/tutorials/global.html

www.biology-direct.com/content/4/1/30/figure/F3?highres=y

www.genomesunzipped.org/2012/04/guest-post-accurate-identification-of-rna-editing-sites-from-high

-throughput-sequencing-data.php

www.eplantscience.com/botanical biotechnology biology chemistry/biotechnology/genes genetic

engineering/genes nature concept and synthesis/biotech physical nature dna.php

www.pnas.org/content/109/4/1347/F1.expansion.html

omega.rc.unesp.br/mauricio/curso/bibliografia/22/362/Dibase%20Sequencing%20and%20Color%20Space

%20Analysis.pdf

cgrlucb.wikispaces.com/SAMtoolsSpring2012

and some of my own

Metagenomics course 28/28 Thursday, 7 February 2013