tree searching

39
Tree searching Kai Müller

Upload: aquila

Post on 24-Feb-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Tree searching. Kai Müller. Tree searching: exhaustive search. branch addition algorithm. Branch and bound. L min =L (random tree) „search tree“ as in branch addition at each level, if L < L min  go back one level to try another path - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Tree searching

Tree searching

Kai Müller

Page 2: Tree searching

Tree searching: exhaustive search

• branch addition algorithm

Page 3: Tree searching

Branch and bound

• Lmin=L(random tree)• „search tree“ as in branch addition

– at each level, if L < Lmin go back one level to try another path

– if at last level, Lmin=L and go back to first level unless all paths have been tried already

Page 4: Tree searching

Heuristic searches

• stepwise addition– as branch addition, but

on each level only the path that follows the shortest tree at this level is searched

best

Page 5: Tree searching

Star decomposition

Page 6: Tree searching

Branch swappingNNI: nearest neighbour interchanges

SPR: subtree pruning and regrafting

TBR: tree bisection and reconnection

Page 7: Tree searching

Tree inference with many terminals• general problem of getting

trapped in local optima• searches under parsimony:

parsimony ratchet• searches under likelihood:

estimation of– substitution model

parameters– branch lengths– topology

Page 8: Tree searching

Parsimony ratchet1) generate start tree2) TBR on this and the original

matrix3) perturbe characters by

randomly upweighting 5-25%. TBR on best tree found under 2). Go to 2) [200+ times]

4) once more TBR on current best tree & original matrix

5) get best trees from those collected in steps 2) and 4)

Page 9: Tree searching

Bootstrapping• estimates properties of an

estimator (such as its variance) by constructing a number of resamples of the observed dataset (and of equal size to the observed dataset), each of which is obtained by random sampling with replacement from the original dataset

Page 10: Tree searching

Bootstrapping

• variants– FWR (Frequencies within

replicates)– SC (strict consensus)

Page 11: Tree searching

Bootstrapping

Page 12: Tree searching

Bremer support / decay• Bremer support (decay analysis) is the number of extra

steps needed to "collapse" a branch. • searches under reverse constraints: keep trees only that

do NOT contain a given node • Takes longer than bootstrapping: parsimony ratchet

beneficial (~20 iterations)

Page 13: Tree searching

Homoplasie-Indices

• Consistency Index CI = m/s.• m = die kleinste theoretisch mögliche Schrittzahl die

das Merkmal auf einem Baum zeigen könnte• s = Anzahl an tatsächlichen Schritten, die ein Merkmal

auf einem gegebenen Baum zeigt• Merkmale ohne Homoplasie haben also einen CI von 1. • Sobald „überschüssige“ Schritte nötig werden, also z.B.

s = 3, steigt der Homoplasiegehalt und erniedrigt sich der CI, etwa auf 1/3 = 0.33.

13

Page 14: Tree searching

Homoplasie-Indices (2)• Ensemble Consistency Index

– Der Ensemble Consistency Index ist dann 1, wenn alle Merkmale nicht homoplastisch sind, also alle perfekt auf den Baum passen.

• Nachteile des CI– Parsimonie-uninformative Merkmale tragen immer einen CI von 1

bei und erhöhen so den summarischen CI künstlich. – Andererseits kann der CI nie 0 werden. Gerade das wäre aber eine

wünschenswerte Eigenschaft für eine Skala aller denkbaren Homoplasiegrade, die idealerweise von 0 bis 1 reichen sollte.

– Drittens wird der CI bei erhöhter Taxonanzahl kleiner, auch wenn sich nichts Wesentliches an dem Informationsgehalt im Datensatz ändert

14

Page 15: Tree searching

Homoplasie-Indices (3)

• Retention Index (RI)

– Wenn g die größtmögliche Schrittzahl eines Merkmals auf jedem denkbaren Baum ist (die auf einem völlig unaufgelösten „Besen“), dann ist RI = (g-s)/(g-m)

15

Page 16: Tree searching

Homoplasie-Indices (4)

16

char s m g CI RI

1 2 1 2 0.5

0

2 3 1 4 0.3

0.3

Page 17: Tree searching

Overview: treebuilding methods

Page 18: Tree searching

Data types: discrete characters vs. distances

Page 19: Tree searching

Distance methods• observed number vs. actual number of substitutions

Page 20: Tree searching

Distance methods• observed number vs. actual number of

substitutions

Page 21: Tree searching

Types of substitutions

• transitions/transversions

• synonymous/non-synonymous

Page 22: Tree searching

Distance correction

correction

Page 23: Tree searching

Substitution models

• p-distance: uncorrected• substitution models

– characterized by substitution probability matrices:

Page 24: Tree searching

Substitution models

• Jukes-Cantor– oldest (1969), simplest– nucleotide frequencies all identical– nucleotide substitutions all equally likely

Page 25: Tree searching

P(t)

• JC69:– probability of a

substitution after time t

if mean instant. subst. rate = 10^-8 per site per year

Page 26: Tree searching

Distances

• simple considerations & rearrangements of Pij(t) show that the JC-corrected distance when observing a fraction P of differing nucleotides is

Page 27: Tree searching

K2P

• Kimura 2-parameter model– 2 different nucleotide substitution types

• transitions• transversions

– nucleotide frequencies all identical

Page 28: Tree searching

More models

• Felsenstein (1981), F81:– 1 nucleotide substitution

type, 4 base frequencies• HKY85

– 2 different nucleotide substitution types, 4 base frequencies

• GTR– 6 different nucleotide

substitution types, 4 base frequencies

Page 29: Tree searching

Heterogeneity among sites

Page 30: Tree searching

Among site rate variation modelled via gamma distribution

Page 31: Tree searching

Hierarchical relationships among common models

Page 32: Tree searching

Amino acid models

Page 33: Tree searching

Codon models

• GY94, MG94• 61 x 61 matrix

(stop codons ignored)= frequency of codon j

= transition/transversion ratio

= ratio nonsynonymous/synonymous

Page 34: Tree searching

Models getting more "realistic"

• example: covarion models• DNA sites change between „on“ and „off“

states: changes allowed vs. forbidden.– transition rates s01 s10, kappa= proportion of „on“:

Page 35: Tree searching

Additivity of distances

Page 36: Tree searching

Additivity of distances

• condition: triangle-inequality

• four-point-condition

Page 37: Tree searching

Corrected distances are rarely tree additive!

• two approaches try to find the tree that minimizes the error e when fitting the distances on it:

• both are tree search-, 2-step methods1. least-squares-fit criterion:

general: goodness of fit methods

2. minimum evolution• length L of sum of all branches

Page 38: Tree searching

Clustering methods

• 1-step, algorithmic methods– UPGMA

• condition of an ultrametrictree

Page 39: Tree searching

Clustering methods

• neighbor joining– star decomposition

d(pair members new) node:

d(other taxa new node):