semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · december 11, 2020...

77
Semi-algebraic conditions for phylogenetic reconstruction MarinaGarrote-L´opez Seminari de Geometria Algebraica de Barcelona

Upload: others

Post on 18-Aug-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Semi-algebraic conditions for phylogeneticreconstruction

Marina Garrote-Lopez

Seminari de Geometria Algebraica de Barcelona

Page 2: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Phylogenetics

Charles Darwin, 1859

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 3: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Phylogenetics

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 4: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Table of contents

1 Modelling evolution

2 Phylogenetic varieties

3 Phylogenetic reconstruction methods

4 Stochastic phylogenetic regions

5 SAQ: Semi-algebraic quartet reconstruction method

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 5: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Phylogenetic reconstruction

Given an alignment of DNA sequences for some species,

Gorilla AACTTCGAGGCTTACCGCTG

Human AACGTCTATGCTCACCGATG

Chimpanzee AAGGTCGATGCTCACCGATG

Orangutan ATTGTCGCAACTCGTCGACG

our goal is to reconstruct the topology of the phylogenetic tree that relates them.

T12|34 T13|24 T14|23

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 6: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Phylogenetic reconstruction

Given an alignment of DNA sequences for some species,

Gorilla AACTTCGAGGCTTACCGCTG

Human AACGTCTATGCTCACCGATG

Chimpanzee AAGGTCGATGCTCACCGATG

Orangutan ATTGTCGCAACTCGTCGACG

our goal is to reconstruct the topology of the phylogenetic tree that relates them.

T12|34 T13|24 T14|23

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 7: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Modeling Evolution

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 8: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Modeling Evolution

Random variables at the nodesXi ∈ K = A,C,G,T

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 9: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Modeling Evolution

Random variables at the nodesXi ∈ K = A,C,G,T

Distribution at the rootπ = (πA, πC, πG, πT);

∑i∈K πi = 1

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 10: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Modeling Evolution

Random variables at the nodesXi ∈ K = A,C,G,T

Distribution at the rootπ = (πA, πC, πG, πT);

∑i∈K πi = 1

Transition matrices at the edges

Me =

P(A→ A|e) . . . P(A→ T|e)P(C→ A|e) . . . P(C→ T|e)P(G→ A|e) . . . P(T→ G|e)P(T→ A|e) . . . P(T→ T|e)

A transition matrix is a square matrixwith nonnegative entries and rowssumming up to one.

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 11: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Modeling Evolution

Random variables at the nodesXi ∈ K = A,C,G,T

Distribution at the rootπ = (πA, πC, πG, πT);

∑i∈K πi = 1

Transition matrices at the edges

Me =

P(A→ A|e) . . . P(A→ T|e)P(C→ A|e) . . . P(C→ T|e)P(G→ A|e) . . . P(T→ G|e)P(T→ A|e) . . . P(T→ T|e)

A transition matrix is a square matrixwith nonnegative entries and rowssumming up to one.

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 12: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Evolutionary models

Jukes Cantor Model

Me =

ae be be bebe ae be bebe be ae bebe be be ae

,

where 3ae + be = 1.

Kimura Model

Me =

ae be ce debe ae de cece de ae bede ce be ae

,

where ae + be + ce + de = 1.

Strand Symmetric Model

Me =

ae be ce deee fe ge hehe ge fe eede ce be ae

,

where rows sum up to 1.

General Markov Model

Me =

ae be ce deee fe ge heje ke le me

ne oe pe qe

,

where rows sum up to 1.

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 13: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Evolutionary models

Jukes Cantor Model

Me =

ae be be bebe ae be bebe be ae bebe be be ae

,

where 3ae + be = 1.

Kimura Model

Me =

ae be ce debe ae de cece de ae bede ce be ae

,

where ae + be + ce + de = 1.

Strand Symmetric Model

Me =

ae be ce deee fe ge hehe ge fe eede ce be ae

,

where rows sum up to 1.

General Markov Model

Me =

ae be ce deee fe ge heje ke le me

ne oe pe qe

,

where rows sum up to 1.

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 14: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Evolutionary models

Jukes Cantor Model

Me =

ae be be bebe ae be bebe be ae bebe be be ae

,

where 3ae + be = 1.

Kimura Model

Me =

ae be ce debe ae de cece de ae bede ce be ae

,

where ae + be + ce + de = 1.

Strand Symmetric Model

Me =

ae be ce deee fe ge hehe ge fe eede ce be ae

,

where rows sum up to 1.

General Markov Model

Me =

ae be ce deee fe ge heje ke le me

ne oe pe qe

,

where rows sum up to 1.

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 15: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Evolutionary models

Jukes Cantor Model

Me =

ae be be bebe ae be bebe be ae bebe be be ae

,

where 3ae + be = 1.

Kimura Model

Me =

ae be ce debe ae de cece de ae bede ce be ae

,

where ae + be + ce + de = 1.

Strand Symmetric Model

Me =

ae be ce deee fe ge hehe ge fe eede ce be ae

,

where rows sum up to 1.

General Markov Model

Me =

ae be ce deee fe ge heje ke le me

ne oe pe qe

,

where rows sum up to 1.

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 16: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Evolutionary models

Jukes Cantor Model

Me =

ae be be bebe ae be bebe be ae bebe be be ae

,

where 3ae + be = 1.

Kimura Model

Me =

ae be ce debe ae de cece de ae bede ce be ae

,

where ae + be + ce + de = 1.

Strand Symmetric Model

Me =

ae be ce deee fe ge hehe ge fe eede ce be ae

,

where rows sum up to 1.

General Markov Model

Me =

ae be ce deee fe ge heje ke le me

ne oe pe qe

,

where rows sum up to 1.

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 17: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Evolutionary models

Jukes Cantor Model

Me =

ae be be bebe ae be bebe be ae bebe be be ae

,

where 3ae + be = 1.

Kimura Model

Me =

ae be ce debe ae de cece de ae bede ce be ae

,

where ae + be + ce + de = 1.

Strand Symmetric Model

Me =

ae be ce deee fe ge hehe ge fe eede ce be ae

,

where rows sum up to 1.

General Markov Model

Me =

ae be ce deee fe ge heje ke le me

ne oe pe qe

,

where rows sum up to 1.

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 18: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Joint distribution

Definition

The joint distribution ps1,...,sn at the leaves of arooted phylogenetic tree T , which is theprobability that the random variables X1, . . . ,Xn ofthe leaves take the states s1, . . . , sn

ps1...sn = Prob(X1 = s1,X2 = s2, . . . ,Xn = sn).

px1...xn =∑

xv ,v∈Int(T )

πxr∏

e∈E(T )

Me(xpa(e), xch(e)),pA,T,C,C =∑

xr ,x5,x6∈A,C,G,T

πxr ·M1(xr , A) ·M6(xr , x6) ·M5(x6, x5)·

·M2(x5, T) ·M3(x5, C) ·M4(x6, C)

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 19: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Joint distribution

Definition

The joint distribution ps1,...,sn at the leaves of arooted phylogenetic tree T , which is theprobability that the random variables X1, . . . ,Xn ofthe leaves take the states s1, . . . , sn

ps1...sn = Prob(X1 = s1,X2 = s2, . . . ,Xn = sn).

px1...xn =∑

xv ,v∈Int(T )

πxr∏

e∈E(T )

Me(xpa(e), xch(e)),

pA,T,C,C =∑

xr ,x5,x6∈A,C,G,T

πxr ·M1(xr , A) ·M6(xr , x6) ·M5(x6, x5)·

·M2(x5, T) ·M3(x5, C) ·M4(x6, C)

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 20: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Joint distribution

Definition

The joint distribution ps1,...,sn at the leaves of arooted phylogenetic tree T , which is theprobability that the random variables X1, . . . ,Xn ofthe leaves take the states s1, . . . , sn

ps1...sn = Prob(X1 = s1,X2 = s2, . . . ,Xn = sn).

px1...xn =∑

xv ,v∈Int(T )

πxr∏

e∈E(T )

Me(xpa(e), xch(e)),

pA,T,C,C =∑

xr ,x5,x6∈A,C,G,T

πxr ·M1(xr , A) ·M6(xr , x6) ·M5(x6, x5)·

·M2(x5, T) ·M3(x5, C) ·M4(x6, C)

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 21: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Joint distribution

Definition

We denote by ps1,...,sn the joint distribution at theleaves of a rooted phylogenetic tree T , which isthe probability that the random variablesX1, . . . ,Xn of the leaves take the states s1, . . . , sn

ps1...sn = Prob(X1 = s1,X2 = s2, . . . ,Xn = sn).

The entries of the joint distribution at the leaves pT =(pTs1...sn

)s1,...,sn

can be expressed as a polynomial in terms of the parameters of themodel.

We can estimate pT easily (by the relative frequencies in an alignment)but NOT the parameters.

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 22: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Joint distribution

Definition

We denote by ps1,...,sn the joint distribution at theleaves of a rooted phylogenetic tree T , which isthe probability that the random variablesX1, . . . ,Xn of the leaves take the states s1, . . . , sn

ps1...sn = Prob(X1 = s1,X2 = s2, . . . ,Xn = sn).

The entries of the joint distribution at the leaves pT =(pTs1...sn

)s1,...,sn

can be expressed as a polynomial in terms of the parameters of themodel.

We can estimate pT easily (by the relative frequencies in an alignment)but NOT the parameters.

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 23: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Phylogenetic variety

Definition

For fixed tree T and model M, fixed the position of the root r we use ϕT todenote the parametrization map,

ϕT : Rd −→ R4n

(π, Mee∈E(T )) 7→ P = (px1,x1,...,x1 , px1,x1,...,x2 , . . . , pxn,xn,...,xn)

The phylogenetic algebraic variety associated to a tree T and a model Mis

VT = ImϕT .

IT = I (VT ) is the phylogenetic ideal of T and M.

Polynomials f ∈ IT are called phylogenetic invariants of T .

Polynomials f ∈ IT and f 6∈ IT ′ , with T 6= T ′ are the topology invariantsof T .

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 24: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Phylogenetic variety

Definition

For fixed tree T and model M, fixed the position of the root r we use ϕT todenote the parametrization map,

ϕT : Rd −→ R4n

(π, Mee∈E(T )) 7→ P = (px1,x1,...,x1 , px1,x1,...,x2 , . . . , pxn,xn,...,xn)

The phylogenetic algebraic variety associated to a tree T and a model Mis

VT = ImϕT .

IT = I (VT ) is the phylogenetic ideal of T and M.

Polynomials f ∈ IT are called phylogenetic invariants of T .

Polynomials f ∈ IT and f 6∈ IT ′ , with T 6= T ′ are the topology invariantsof T .

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 25: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Phylogenetic variety

Definition

For fixed tree T and model M, fixed the position of the root r we use ϕT todenote the parametrization map,

ϕT : Rd −→ R4n

(π, Mee∈E(T )) 7→ P = (px1,x1,...,x1 , px1,x1,...,x2 , . . . , pxn,xn,...,xn)

The phylogenetic algebraic variety associated to a tree T and a model Mis

VT = ImϕT .

IT = I (VT ) is the phylogenetic ideal of T and M.

Polynomials f ∈ IT are called phylogenetic invariants of T .

Polynomials f ∈ IT and f 6∈ IT ′ , with T 6= T ′ are the topology invariantsof T .

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 26: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Phylogenetic variety

Definition

For fixed tree T and model M, fixed the position of the root r we use ϕT todenote the parametrization map,

ϕT : Rd −→ R4n

(π, Mee∈E(T )) 7→ P = (px1,x1,...,x1 , px1,x1,...,x2 , . . . , pxn,xn,...,xn)

The phylogenetic algebraic variety associated to a tree T and a model Mis

VT = ImϕT .

IT = I (VT ) is the phylogenetic ideal of T and M.

Polynomials f ∈ IT are called phylogenetic invariants of T .

Polynomials f ∈ IT and f 6∈ IT ′ , with T 6= T ′ are the topology invariantsof T .

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 27: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Using algebraic varieties in phylogenetics

An alignment produces a point p = (pAA...A, pAA...C, . . . , pTT...T) in R4n

.

p should be close to VT0 (if the tree T0 and model M fit the data).

Tree topology reconstruction using algebraic geometry. For eachpossible topology T , evaluate elements of I (VT ) at p : the polynomials ofI (VT0 ) should be ≈ 0 when evaluated at p.

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 28: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Using algebraic varieties in phylogenetics

An alignment produces a point p = (pAA...A, pAA...C, . . . , pTT...T) in R4n

.

p should be close to VT0 (if the tree T0 and model M fit the data).

Tree topology reconstruction using algebraic geometry. For eachpossible topology T , evaluate elements of I (VT ) at p : the polynomials ofI (VT0 ) should be ≈ 0 when evaluated at p.

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 29: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Using algebraic varieties in phylogenetics

An alignment produces a point p = (pAA...A, pAA...C, . . . , pTT...T) in R4n

.

p should be close to VT0 (if the tree T0 and model M fit the data).

Tree topology reconstruction using algebraic geometry. For eachpossible topology T , evaluate elements of I (VT ) at p : the polynomials ofI (VT0 ) should be ≈ 0 when evaluated at p.

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 30: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Problem: computation of invariants

Computational algebra softwares fail to compute the ideal for ≥ 4 species!

For example, Kimura 3-parameter with 4 species is a toric variety with 8002generators like,

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 31: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Problem: computation of invariants

Computational algebra softwares fail to compute the ideal for ≥ 4 species!

For example, Kimura 3-parameter with 4 species is a toric variety with 8002generators like,

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 32: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Problem: computation of invariants

Computational algebra softwares fail to compute the ideal for ≥ 4 species!

For example, Kimura 3-parameter with 4 species is a toric variety with 8002generators like,

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 33: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Flattening

1 2

AA AC AG . . . TT

Flatt12|34(P) =

AA

ACAG

...TT

pAAAA pAAAC pAAAG . . . pAATTpACAA pACAC pACAG . . . pACTTpAGAA pAGAC pAGAG . . . pAGTT

......

.... . .

...pTTAA pTTAC pTTAG . . . pTTTT

Theorem [Allman – Rhodes]

Let P = ϕT (π, Mee∈E(T )) where T = T12|34. Then

rank(Flatt12|34(P)) ≤ 4.

Flatt13|24(P) and Flatt14|23(P) have rank 16 for generic P.

Therefore 5× 5 minors of Flatt12|34(P) are topology invariants.

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 34: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Flattening

1 2

AA AC AG . . . TT

Flatt12|34(P) =

AA

ACAG

...TT

pAAAA pAAAC pAAAG . . . pAATTpACAA pACAC pACAG . . . pACTTpAGAA pAGAC pAGAG . . . pAGTT

......

.... . .

...pTTAA pTTAC pTTAG . . . pTTTT

Theorem [Allman – Rhodes]

Let P = ϕT (π, Mee∈E(T )) where T = T12|34. Then

rank(Flatt12|34(P)) ≤ 4.

Flatt13|24(P) and Flatt14|23(P) have rank 16 for generic P.

Therefore 5× 5 minors of Flatt12|34(P) are topology invariants.

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 35: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Flattening

1 2

AA AC AG . . . TT

Flatt12|34(P) =

AA

ACAG

...TT

pAAAA pAAAC pAAAG . . . pAATTpACAA pACAC pACAG . . . pACTTpAGAA pAGAC pAGAG . . . pAGTT

......

.... . .

...pTTAA pTTAC pTTAG . . . pTTTT

Theorem [Allman – Rhodes]

Let P = ϕT (π, Mee∈E(T )) where T = T12|34. Then

rank(Flatt12|34(P)) ≤ 4.

Flatt13|24(P) and Flatt14|23(P) have rank 16 for generic P.

Therefore 5× 5 minors of Flatt12|34(P) are topology invariants.

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 36: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Algebraic phylogenetic reconstruction methods

The distance of an m × n matrix M to the set

Rk = m × n matrices of rank ≤ k

can be computed easily by,

Eckart-Young Theorem

dk(M) = dF (M,Rk) =

√∑i≥k+1

σ2i ,

where σi are the singular values of M.

Phylogenetic reconstruction methods

Compute d4(FlattA|B(P)) for the tree possible bipartitions. The lower thescore is, the more it is likely that the bipartition comes from an edge of T.

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 37: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Algebraic phylogenetic reconstruction methods

The distance of an m × n matrix M to the set

Rk = m × n matrices of rank ≤ k

can be computed easily by,

Eckart-Young Theorem

dk(M) = dF (M,Rk) =

√∑i≥k+1

σ2i ,

where σi are the singular values of M.

Phylogenetic reconstruction methods

Compute d4(FlattA|B(P)) for the tree possible bipartitions. The lower thescore is, the more it is likely that the bipartition comes from an edge of T.

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 38: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Stochastic phylogenetic regions

Definition

The stochastic phylogenetic regions is defined as

V+T = P ∈ VT | P = ϕT (s) and s ∈ S ⊂ [0, 1]d,

is the subset of VT that contains distributions arising from stochasticparameters.

Stochastic Parameters

A vector π is stochastic iff its entries are non-negative and∑πi = 1.

A matrix is stochastic iff its entries are non-negative and∑j

Me(i , j) = 1,∀i , e

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 39: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Could the stochastic varieties be useful for phylogenetic

reconstruction?

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 40: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Computing the distance to a Phylogenetic variety

Let P = (p1, . . . , p4n) ∈ 44n−1 be a distribution. We want to compute the

distance of P to V+T ,

d(P ,V+T ) = min

Q∈V+T

d(P ,Q)

Since Q ∈ V+T , we can write Q = ϕT (x) with stochastic parameters x ∈ Rd .

Denote by Ω ⊂ Rd the domain of stochastic parameters. Let

fP(x) := d(P , ϕT (x)) =

4n∑i

(pi − ϕi (x))2.

If P+ = ϕT (x∗) ∈ V+T is such that d(P ,P+) = d(P ,V+

T ) then

(P − P+) ⊥ TPVT , i.e. x∗ is a critical point of fP(x)

x∗ is not a critical point of fP(x) but P+ ∈ ∂Ω

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 41: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Computing the distance to a Phylogenetic variety

Let P = (p1, . . . , p4n) ∈ 44n−1 be a distribution. We want to compute the

distance of P to V+T ,

d(P ,V+T ) = min

Q∈V+T

d(P ,Q)

Since Q ∈ V+T , we can write Q = ϕT (x) with stochastic parameters x ∈ Rd .

Denote by Ω ⊂ Rd the domain of stochastic parameters. Let

fP(x) := d(P , ϕT (x)) =

4n∑i

(pi − ϕi (x))2.

If P+ = ϕT (x∗) ∈ V+T is such that d(P ,P+) = d(P ,V+

T ) then

(P − P+) ⊥ TPVT , i.e. x∗ is a critical point of fP(x)

x∗ is not a critical point of fP(x) but P+ ∈ ∂Ω

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 42: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Computing the distance to a Phylogenetic variety

Let P = (p1, . . . , p4n) ∈ 44n−1 be a distribution. We want to compute the

distance of P to V+T ,

d(P ,V+T ) = min

Q∈V+T

d(P ,Q)

Since Q ∈ V+T , we can write Q = ϕT (x) with stochastic parameters x ∈ Rd .

Denote by Ω ⊂ Rd the domain of stochastic parameters. Let

fP(x) := d(P , ϕT (x)) =

4n∑i

(pi − ϕi (x))2.

If P+ = ϕT (x∗) ∈ V+T is such that d(P ,P+) = d(P ,V+

T ) then

(P − P+) ⊥ TPVT , i.e. x∗ is a critical point of fP(x)

x∗ is not a critical point of fP(x) but P+ ∈ ∂Ω

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 43: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Long branch attraction for JC model

Let P = ϕ12|34 (M, Id ,M, Id ,Me).

Proposition [Casanellas – Fernandez-Sanchez – G-L]

If Me has negative off-diagonal entries and M isstochastic then P+ = ϕ12|34(M, Id , M, Id , Id) is alocal minimum of the distance function d(P,V+

T ).

Conjecture: Global minumum

d(P,V+T ) = d

(P,P+).

Theorem [Casanellas – Fernandez-Sanchez – G-L]

Let P0 = ϕ12|34 (M, Id ,M, Id ,Me) such that d(P0,V+T ) = d

(P0,P

+) then,for any P close enough to P0 we have

d(P,V+T ) ≥ d(P,V+

T2).

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 44: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Long branch attraction for JC model

Let P = ϕ12|34 (M, Id ,M, Id ,Me).

Proposition [Casanellas – Fernandez-Sanchez – G-L]

If Me has negative off-diagonal entries and M isstochastic then P+ = ϕ12|34(M, Id , M, Id , Id) is alocal minimum of the distance function d(P,V+

T ).

Conjecture: Global minumum

d(P,V+T ) = d

(P,P+).

Theorem [Casanellas – Fernandez-Sanchez – G-L]

Let P0 = ϕ12|34 (M, Id ,M, Id ,Me) such that d(P0,V+T ) = d

(P0,P

+) then,for any P close enough to P0 we have

d(P,V+T ) ≥ d(P,V+

T2).

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 45: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Long branch attraction for JC model

Let P = ϕ12|34 (M, Id ,M, Id ,Me).

Proposition [Casanellas – Fernandez-Sanchez – G-L]

If Me has negative off-diagonal entries and M isstochastic then P+ = ϕ12|34(M, Id , M, Id , Id) is alocal minimum of the distance function d(P,V+

T ).

Conjecture: Global minumum

d(P,V+T ) = d

(P,P+).

Theorem [Casanellas – Fernandez-Sanchez – G-L]

Let P0 = ϕ12|34 (M, Id ,M, Id ,Me) such that d(P0,V+T ) = d

(P0,P

+) then,for any P close enough to P0 we have

d(P,V+T ) ≥ d(P,V+

T2).

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 46: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Computing the distance to a Phylogenetic variety

Lemma [Draisma – Horobet – Ottaviani – Sturmfels – Thomas]

For general P ∈ C4n

the number of critical points of fP on the manifold

V \ Vsing is finite and is called the Euclidean Distance degree of V.

Computations difficulties

ED degree for the Jukes Cantor model on 4-leaf trees is 290.

> 2.5 months with Macaluay2.

≈ 2.5 hours with Magma.

Numerical Algebraic Geometry Only PHCpack founds the 290 solutions.

The computations were performed on a machine with 10 Dual Core Intel(R)Xeon(R) Silver 64 Processor 4114 (2.20 GHz, 13.75M Cache) equipped with256 GB RAM running Ubuntu 18.04.2.

Algorithm

1. Compute the Euclidean distance degree d for the variety VT .

2. Compute the d critical points x such that ∇f (x) = 0 and x ∈ Ω.

3. Compute the critical points ∇f = 0 at the boundaries ∂Ω.

4. Choose point with the lowest value when evaluated at f .

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 47: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Computing the distance to a Phylogenetic variety

Lemma [Draisma – Horobet – Ottaviani – Sturmfels – Thomas]

For general P ∈ C4n

the number of critical points of fP on the manifold

V \ Vsing is finite and is called the Euclidean Distance degree of V.

Computations difficulties

ED degree for the Jukes Cantor model on 4-leaf trees is 290.

> 2.5 months with Macaluay2.

≈ 2.5 hours with Magma.

Numerical Algebraic Geometry Only PHCpack founds the 290 solutions.

The computations were performed on a machine with 10 Dual Core Intel(R)Xeon(R) Silver 64 Processor 4114 (2.20 GHz, 13.75M Cache) equipped with256 GB RAM running Ubuntu 18.04.2.

Algorithm

1. Compute the Euclidean distance degree d for the variety VT .

2. Compute the d critical points x such that ∇f (x) = 0 and x ∈ Ω.

3. Compute the critical points ∇f = 0 at the boundaries ∂Ω.

4. Choose point with the lowest value when evaluated at f .

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 48: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Computing the distance to a Phylogenetic variety

Lemma [Draisma – Horobet – Ottaviani – Sturmfels – Thomas]

For general P ∈ C4n

the number of critical points of fP on the manifold

V \ Vsing is finite and is called the Euclidean Distance degree of V.

Computations difficulties

ED degree for the Jukes Cantor model on 4-leaf trees is 290.

> 2.5 months with Macaluay2.

≈ 2.5 hours with Magma.

Numerical Algebraic Geometry Only PHCpack founds the 290 solutions.

The computations were performed on a machine with 10 Dual Core Intel(R)Xeon(R) Silver 64 Processor 4114 (2.20 GHz, 13.75M Cache) equipped with256 GB RAM running Ubuntu 18.04.2.

Algorithm

1. Compute the Euclidean distance degree d for the variety VT .

2. Compute the d critical points x such that ∇f (x) = 0 and x ∈ Ω.

3. Compute the critical points ∇f = 0 at the boundaries ∂Ω.

4. Choose point with the lowest value when evaluated at f .

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 49: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Simulations

We took trees with branch lengths a and b at the exterior edges. M is

a JC matrix with eigenvalue m ∈ [0.94, 1.06].

For each set of parameters we considered 100 data points, each

corresponding to 10000 independent samples from the corresponding

multinomial distribution.

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 50: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Simulations

We took trees with branch lengths a and b at the exterior edges. M is

a JC matrix with eigenvalue m ∈ [0.94, 1.06].

For each set of parameters we considered 100 data points, each

corresponding to 10000 independent samples from the corresponding

multinomial distribution.

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 51: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Simulations a = 0.5 & b = 0.5

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 52: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Simulations a = 0.75 & b = 0.1

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 53: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Stochastic conditions for the General Markov Model

Theorem [Allman – Rhodes – Taylor]

Let P = ϕT (π, Mee∈E(T )) be a 4-tensor that arisesfrom nonsingular real parameters for GM(κ) model onT12|34. If the marginalizations P+... and P...+ arisefrom stochastic parameters and, moreover, theκ2 × κ2 matrix

Flatt13|24

(P ∗2 (adj(PT

+..+)PT.+.+) ∗3 (adj(P.+.+)P.++.)

)is positive semidefinite, then P arises from stochasticparameters.

Theorem [Allman – Rhodes – Taylor]

Let P = ϕT (π, Mee∈E(T )) be a 4-tensor that arisesfrom nonsingular real parameters for GM(κ) model onT12|34. If the marginalizations P+... and P...+ arisefrom stochastic parameters and, moreover, theκ2 × κ2 matrix

Flatt13|24

(P ∗2 (adj(PT

+..+)PT.+.+) ∗3 (adj(P.+.+)P.++.)

)is positive semidefinite, then P arises from stochasticparameters.

Theorem [Allman – Rhodes – Taylor]

Let P = ϕT (π, Mee∈E(T )) be a 4-tensor that arisesfrom nonsingular real parameters for GM(κ) model onT12|34. If the marginalizations P+... and P...+ arisefrom stochastic parameters and, moreover, theκ2 × κ2 matrix

Flatt13|24

(P ∗2 (adj(PT

+..+)PT.+.+) ∗3 (adj(P.+.+)P.++.)

)is positive semidefinite, then P arises from stochasticparameters.

Theorem [Allman – Rhodes – Taylor]

Let P = ϕT (π, Mee∈E(T )) be a 4-tensor that arisesfrom nonsingular real parameters for GM(κ) model onT12|34. If the marginalizations P+... and P...+ arisefrom stochastic parameters and, moreover, theκ2 × κ2 matrix

Flatt13|24

(P ∗2 (adj(PT

+..+)PT.+.+) ∗3 (adj(P.+.+)P.++.)

)is positive semidefinite, then P arises from stochasticparameters.

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 54: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Stochastic conditions for the General Markov Model

Theorem [Allman – Rhodes – Taylor]

Let P = ϕT (π, Mee∈E(T )) be a 4-tensor that arisesfrom nonsingular real parameters for GM(κ) model onT12|34. If the marginalizations P+... and P...+ arisefrom stochastic parameters and, moreover, theκ2 × κ2 matrix

Flatt13|24

(P ∗2 (adj(PT

+..+)PT.+.+) ∗3 (adj(P.+.+)P.++.)

)is positive semidefinite, then P arises from stochasticparameters.

Theorem [Allman – Rhodes – Taylor]

Let P = ϕT (π, Mee∈E(T )) be a 4-tensor that arisesfrom nonsingular real parameters for GM(κ) model onT12|34. If the marginalizations P+... and P...+ arisefrom stochastic parameters and, moreover, theκ2 × κ2 matrix

Flatt13|24

(P ∗2 (adj(PT

+..+)PT.+.+) ∗3 (adj(P.+.+)P.++.)

)is positive semidefinite, then P arises from stochasticparameters.

Theorem [Allman – Rhodes – Taylor]

Let P = ϕT (π, Mee∈E(T )) be a 4-tensor that arisesfrom nonsingular real parameters for GM(κ) model onT12|34. If the marginalizations P+... and P...+ arisefrom stochastic parameters and, moreover, theκ2 × κ2 matrix

Flatt13|24

(P ∗2 (adj(PT

+..+)PT.+.+) ∗3 (adj(P.+.+)P.++.)

)is positive semidefinite, then P arises from stochasticparameters.

Theorem [Allman – Rhodes – Taylor]

Let P = ϕT (π, Mee∈E(T )) be a 4-tensor that arisesfrom nonsingular real parameters for GM(κ) model onT12|34. If the marginalizations P+... and P...+ arisefrom stochastic parameters and, moreover, theκ2 × κ2 matrix

Flatt13|24

(P ∗2 (adj(PT

+..+)PT.+.+) ∗3 (adj(P.+.+)P.++.)

)is positive semidefinite, then P arises from stochasticparameters.

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 55: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Stochastic conditions for the General Markov Model

Theorem [Allman – Rhodes – Taylor]

Let P = ϕT (π, Mee∈E(T )) be a 4-tensor that arisesfrom nonsingular real parameters for GM(κ) model onT12|34. If the marginalizations P+... and P...+ arisefrom stochastic parameters and, moreover, theκ2 × κ2 matrix

Flatt13|24

(P ∗2 (adj(PT

+..+)PT.+.+) ∗3 (adj(P.+.+)P.++.)

)is positive semidefinite, then P arises from stochasticparameters.

Theorem [Allman – Rhodes – Taylor]

Let P = ϕT (π, Mee∈E(T )) be a 4-tensor that arisesfrom nonsingular real parameters for GM(κ) model onT12|34. If the marginalizations P+... and P...+ arisefrom stochastic parameters and, moreover, theκ2 × κ2 matrix

Flatt13|24

(P ∗2 (adj(PT

+..+)PT.+.+) ∗3 (adj(P.+.+)P.++.)

)is positive semidefinite, then P arises from stochasticparameters.

Theorem [Allman – Rhodes – Taylor]

Let P = ϕT (π, Mee∈E(T )) be a 4-tensor that arisesfrom nonsingular real parameters for GM(κ) model onT12|34. If the marginalizations P+... and P...+ arisefrom stochastic parameters and, moreover, theκ2 × κ2 matrix

Flatt13|24

(P ∗2 (adj(PT

+..+)PT.+.+) ∗3 (adj(P.+.+)P.++.)

)is positive semidefinite, then P arises from stochasticparameters.

Theorem [Allman – Rhodes – Taylor]

Let P = ϕT (π, Mee∈E(T )) be a 4-tensor that arisesfrom nonsingular real parameters for GM(κ) model onT12|34. If the marginalizations P+... and P...+ arisefrom stochastic parameters and, moreover, theκ2 × κ2 matrix

Flatt13|24

(P ∗2 (adj(PT

+..+)PT.+.+) ∗3 (adj(P.+.+)P.++.)

)is positive semidefinite, then P arises from stochasticparameters.

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 56: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Stochastic conditions for the General Markov Model

Theorem [Allman – Rhodes – Taylor]

Let P = ϕT (π, Mee∈E(T )) be a 4-tensor that arisesfrom nonsingular real parameters for GM(κ) model onT12|34. If the marginalizations P+... and P...+ arisefrom stochastic parameters and, moreover, theκ2 × κ2 matrix

Flatt13|24

(P ∗2 (adj(PT

+..+)PT.+.+) ∗3 (adj(P.+.+)P.++.)

)is positive semidefinite, then P arises from stochasticparameters.

Theorem [Allman – Rhodes – Taylor]

Let P = ϕT (π, Mee∈E(T )) be a 4-tensor that arisesfrom nonsingular real parameters for GM(κ) model onT12|34. If the marginalizations P+... and P...+ arisefrom stochastic parameters and, moreover, theκ2 × κ2 matrix

Flatt13|24

(P ∗2 (adj(PT

+..+)PT.+.+) ∗3 (adj(P.+.+)P.++.)

)is positive semidefinite, then P arises from stochasticparameters.

Theorem [Allman – Rhodes – Taylor]

Let P = ϕT (π, Mee∈E(T )) be a 4-tensor that arisesfrom nonsingular real parameters for GM(κ) model onT12|34. If the marginalizations P+... and P...+ arisefrom stochastic parameters and, moreover, theκ2 × κ2 matrix

Flatt13|24

(P ∗2 (adj(PT

+..+)PT.+.+) ∗3 (adj(P.+.+)P.++.)

)is positive semidefinite, then P arises from stochasticparameters.

Theorem [Allman – Rhodes – Taylor]

Let P = ϕT (π, Mee∈E(T )) be a 4-tensor that arisesfrom nonsingular real parameters for GM(κ) model onT12|34. If the marginalizations P+... and P...+ arisefrom stochastic parameters and, moreover, theκ2 × κ2 matrix

Flatt13|24

(P ∗2 (adj(PT

+..+)PT.+.+) ∗3 (adj(P.+.+)P.++.)

)is positive semidefinite, then P arises from stochasticparameters.

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 57: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Stochastic conditions for the General Markov Model

Theorem (Casanellas, Fernandez-Sanchez, G-L)

Let P = ϕT (π, Mee∈E(T )) be a 4-tensor for GM(κ) model on T12|34. Let Pbe constructed as in the previous theorem. Then,

Flat13|24(P) = Flat14|23(P),

andFlatt12|34(P) 6= Flatt13|24(P).

In particular

det(P+..+)det(P.+.+)Flatt13|24

(P ∗2 (adj(PT

+..+)PT.+.+)) ∗3 (adj(P.+.+)P.++.)

))

=det(P+..+)det(P.+.+)Flatt14|23

(P ∗2 (adj(PT

+..+)PT.+.+)) ∗3 (adj(P.+.+)P.++.)

)gives rise to 256 topology invariants of degree 17.

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 58: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

T Leaf-transformations

T12|34

T13|24

T14|23

→ α12i (P12) →

→ α13i (P13) →

→ α14i (P14) →

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 59: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

T Leaf-transformations

T12|34

T13|24

T14|23

→ α12i (P12)

→ α13i (P13)

→ α14i (P14)

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 60: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

T Leaf-transformations

T12|34

T13|24

T14|23

→ α12i (P12) →

→ α13i (P13) →

→ α14i (P14) →

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 61: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

12|34 Leaf-transformations

Resulting trees associated with the 12|34 leaf-transformations on the (theoretical) distribution from T

Original tree T

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 62: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

13|24 Leaf-transformations

Resulting trees associated with some 13|24 leaf-transformations on the (theoretical) distribution from T

Original tree T

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 63: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Leaf-transformations on distributions of T = 12|34

α12i (P)

α13i (P)

α14i (P)

Flatt12|34(α12

i (P)) → rank ≤ 4 3

Flatt13|24(α12i (P)) → rank ≤ 4 8

Flatt14|23(α12i (P)) → rank ≤ 4 8

Flatt13|24(α13

i (P)) → rank ≤ 4 8

Flatt12|34(α13i (P)) → rank ≤ 4 3

Flatt14|32(α13i (P)) → rank ≤ 4 8

Flatt14|23(α14

i (P)) → rank ≤ 4 8

Flatt12|43(α14i (P)) → rank ≤ 4 3

Flatt13|42(α14i (P)) → rank ≤ 4 8

Flatt12|34(α12

i (P)) → PSD 8

Flatt13|24(α12i (P)) → PSD 3

Flatt14|23(α12i (P)) → PSD 3

Flatt13|24(α13

i (P)) → PSD 8

Flatt12|34(α13i (P)) → PSD 8

Flatt14|32(α13i (P)) → PSD 8

Flatt14|23(α14

i (P)) → PSD 8

Flatt12|43(α14i (P)) → PSD 8

Flatt13|42(α14i (P)) → PSD 8

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 64: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Leaf-transformations on distributions of T = 12|34

α12i (P)

α13i (P)

α14i (P)

Flatt12|34(α12

i (P)) → rank ≤ 4 3

Flatt13|24(α12i (P)) → rank ≤ 4 8

Flatt14|23(α12i (P)) → rank ≤ 4 8

Flatt13|24(α13

i (P)) → rank ≤ 4 8

Flatt12|34(α13i (P)) → rank ≤ 4 3

Flatt14|32(α13i (P)) → rank ≤ 4 8

Flatt14|23(α14

i (P)) → rank ≤ 4 8

Flatt12|43(α14i (P)) → rank ≤ 4 3

Flatt13|42(α14i (P)) → rank ≤ 4 8

Flatt12|34(α12

i (P)) → PSD 8

Flatt13|24(α12i (P)) → PSD 3

Flatt14|23(α12i (P)) → PSD 3

Flatt13|24(α13

i (P)) → PSD 8

Flatt12|34(α13i (P)) → PSD 8

Flatt14|32(α13i (P)) → PSD 8

Flatt14|23(α14

i (P)) → PSD 8

Flatt12|43(α14i (P)) → PSD 8

Flatt13|42(α14i (P)) → PSD 8

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 65: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Leaf-transformations on distributions of T = 12|34

α12i (P)

α13i (P)

α14i (P)

Flatt12|34(α12

i (P)) → rank ≤ 4 3

Flatt13|24(α12i (P)) → rank ≤ 4 8

Flatt14|23(α12i (P)) → rank ≤ 4 8

Flatt13|24(α13

i (P)) → rank ≤ 4 8

Flatt12|34(α13i (P)) → rank ≤ 4 3

Flatt14|32(α13i (P)) → rank ≤ 4 8

Flatt14|23(α14

i (P)) → rank ≤ 4 8

Flatt12|43(α14i (P)) → rank ≤ 4 3

Flatt13|42(α14i (P)) → rank ≤ 4 8

Flatt12|34(α12

i (P)) → PSD 8

Flatt13|24(α12i (P)) → PSD 3

Flatt14|23(α12i (P)) → PSD 3

Flatt13|24(α13

i (P)) → PSD 8

Flatt12|34(α13i (P)) → PSD 8

Flatt14|32(α13i (P)) → PSD 8

Flatt14|23(α14

i (P)) → PSD 8

Flatt12|43(α14i (P)) → PSD 8

Flatt13|42(α14i (P)) → PSD 8

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 66: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

SAQ: semi-algebraic quartet reconstruction method

Theorem [Casanellas – Fernandez-Sanchez – G-L]

The rank of the psd approximation of a real matrix M is less than or equal torank(M).

Lemma [Casanellas – Fernandez-Sanchez – G-L]

Let P be the theoretical distribution from a 3-parameter Kimura process on thequartet tree T = 12|34. Then, the rank of the psd approximation of the

flattening matrix FlatT ′(αT ′(P)) is grater than 4 for T ′ 6= T .

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 67: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

SAQ: semi-algebraic quartet reconstruction method

Theorem [Casanellas – Fernandez-Sanchez – G-L]

The rank of the psd approximation of a real matrix M is less than or equal torank(M).

Lemma [Casanellas – Fernandez-Sanchez – G-L]

Let P be the theoretical distribution from a 3-parameter Kimura process on thequartet tree T = 12|34. Then, the rank of the psd approximation of the

flattening matrix FlatT ′(αT ′(P)) is grater than 4 for T ′ 6= T .

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 68: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

SAQ: semi-algebraic quartet reconstruction method

SAQ method

Let P be a data point obtained from an alignment, then the score forT = 12|34 is:

s i12|34 :=min

δ4

(psd

(Flatt13|24

(α12i (P)

))), δ4

(psd

(Flatt14|23

(α12i (P)

)))δ4

(psd

(Flatt12|34 (α12

i (P))))

and s12|34 := meanis i12|34

SAQ(P) :=1

s12|34(P) + s13|24(P) + s14|23(P)

(s12|34(P), s13|24(P), s14|23(P)

).

If Q ∈ R256 is a distribution that tends to P generated on the tree 12|34 withgeneric stochastic parameters, then

limQ→P

SAQ(Q) = SAQ(P) = (1, 0, 0).

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 69: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

SAQ: semi-algebraic quartet reconstruction method

SAQ method

Let P be a data point obtained from an alignment, then the score forT = 12|34 is:

s i12|34 :=min

δ4

(psd

(Flatt13|24

(α12i (P)

))), δ4

(psd

(Flatt14|23

(α12i (P)

)))δ4

(psd

(Flatt12|34 (α12

i (P))))

and s12|34 := meanis i12|34

SAQ(P) :=1

s12|34(P) + s13|24(P) + s14|23(P)

(s12|34(P), s13|24(P), s14|23(P)

).

If Q ∈ R256 is a distribution that tends to P generated on the tree 12|34 withgeneric stochastic parameters, then

limQ→P

SAQ(Q) = SAQ(P) = (1, 0, 0).

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 70: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

SAQ: semi-algebraic quartet reconstruction method

SAQ method

Let P be a data point obtained from an alignment, then the score forT = 12|34 is:

s i12|34 :=min

δ4

(psd

(Flatt13|24

(α12i (P)

))), δ4

(psd

(Flatt14|23

(α12i (P)

)))δ4

(psd

(Flatt12|34 (α12

i (P))))

and s12|34 := meanis i12|34

SAQ(P) :=1

s12|34(P) + s13|24(P) + s14|23(P)

(s12|34(P), s13|24(P), s14|23(P)

).

If Q ∈ R256 is a distribution that tends to P generated on the tree 12|34 withgeneric stochastic parameters, then

limQ→P

SAQ(Q) = SAQ(P) = (1, 0, 0).

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 71: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Simulations: Tree Space

0.0 0.3 0.6 0.9 1.2 1.5

0.0

0.3

0.6

0.9

1.2

1.5

a) b)

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 72: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Simulations: Tree Space

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

GM; length 500 bp

pa

ram

ete

r b

33%

33%

95%

95%

95%

95

%

95%

95%

9

5%

95%

95%

95%

95%

95%

95%

95%

95%

9

5%

mean = 0.846 sd = 0.22

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

GM; length 1 000 bp

33%

33%

95%

95%

95%

95%

95%

95%

95%

95%

95%

95%

95%

95

%

95%

95%

95%

95%

9

5%

95%

95%

mean = 0.888 sd = 0.207

base pairs SAQ Erik+2 NJ ML

500 84.6 72.4 72.5 72.11 000 88.8 80.3 79.7 73.6

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 73: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Simulations: Random branch lengths

A total of 10 000 alignments are considered, obtained from 4-taxa trees withrandom branch lengths uniformly distributed in the interval (0,1), andgenerated according to the General Markov substitution model.

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 74: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Simulations: Mixture models

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 75: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Simulations: Mixture models

internal branch length 0.01 0.05 0.1 0.2 0.3

SAQ 37 83 96 100 100Erik+2 (2) 12 35 60 86 96

MP 0 2 19 76 99ML(GTR+2 Γ) 0 4 14 77 95

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 76: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Thanks foryour attention!

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction

Page 77: Semi-algebraic conditions for phylogenetic reconstruction · 2020. 12. 14. · December 11, 2020 Marina Garrote-L opezAlgebraic and semi-algebraic conditions in Phylo Reconstruction

Thanks foryour attention!

December 11, 2020 Marina Garrote-Lopez Algebraic and semi-algebraic conditions in Phylo Reconstruction