lecture 10 - phylogeny and morphometrics

31
Department of Geological Sciences | Indiana University (c) 2012, P. David Polly G562 Geometric Morphometrics Cottonwood tree (Populus deltoides), New Harmony, Indiana Hierarchical patterns in morphometric data Phylogeny, trees and morphospace WALLABY HUMAN LEOPARD FOSSA DOG OTTER Node 3 Node 4 Node 2 Node 1 Node 0 0 20 40 60 80

Upload: nguyenkhue

Post on 13-Feb-2017

227 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Lecture 10 - Phylogeny and morphometrics

Department of Geological Sciences | Indiana University (c) 2012, P. David Polly

G562 Geometric Morphometrics

Cottonwood tree (Populus deltoides), New Harmony, Indiana

Hierarchical patterns in morphometric data

Phylogeny, trees and morphospace

WALLABY

HUMAN

LEOPARD

FOSSA

DOG

OTTER

Node 3

Node 4

Node 2

Node 1

Node 0

0 20 40 60 80

Page 2: Lecture 10 - Phylogeny and morphometrics

Department of Geological Sciences | Indiana University (c) 2012, P. David Polly

G562 Geometric Morphometrics

Apodemus)

Marmota)

Spermophilus)

Most phenotypic data have a phylogenetic component

Evolution - descent with modification (and diversification)

Consequence: organisms that share a common ancestor are expected to share similarities that are not shared with distantly related organisms. In such a situation, evolution introduces a hierarchical structure to morphometric data.

Thus: whenever three or more OTUs* are involved – regardless of whether they are populations, stratigraphic samples, species, genera, families, or whatever – their phylogenetic links introduce some degree of autocorrelation between the more closely related taxa.

* OTU – Operational Taxonomic Unit, shorthand for “group in question”

Page 3: Lecture 10 - Phylogeny and morphometrics

Department of Geological Sciences | Indiana University (c) 2012, P. David Polly

G562 Geometric Morphometrics

Approaches to hierarchy in morphometric data

1. Building trees from morphometric data to show hierarchical similarity (hierarchical clustering)

2. Finding groupings in morphometric data (non-hierarchical clustering)

3. Mapping morphometric data onto hierarchical structure derived from an independent source (e.g., phylogenetic tree)

4. Using phylogenetic statistical methods to account for (or remove) effects of hierarchy in statistical tests

Page 4: Lecture 10 - Phylogeny and morphometrics

Department of Geological Sciences | Indiana University (c) 2012, P. David Polly

G562 Geometric Morphometrics

Controversies about phylogenetic signal in morphometric dataSome researchers, usually parsimony-based phylogeneticists, have argued that morphometric data do not have “phylogenetic signal” because they measure “overall similarity”

Others have argued that morphometric data do not have phylogenetic signal because they are mostly “adaptive”

And yet others argue that morphometric data do not have phylogenetic signal because they are mostly “non-genetic”

And a few have argued that morphometric data do not have phylogenetic signal because they are merely morphological, not molecular....

Page 5: Lecture 10 - Phylogeny and morphometrics

Department of Geological Sciences | Indiana University (c) 2012, P. David Polly

G562 Geometric Morphometrics

Examples of evidence offered against “phylogenetic signal”

1. A UPGMA tree whose topology does not agree with the author’s conception of phylogeny;

2. Correlation of morphometric data with a factor such as diet;

3. A two-dimensional PCA plot whose pattern of scatter does not appear to reflect phylogenetic relationships;

4. A parsimony tree based on gap-coded morphometric data that does not correspond to the author’s conception of phylogeny;

5. A morphological tree that does not correspond to a molecular tree.

Page 6: Lecture 10 - Phylogeny and morphometrics

Department of Geological Sciences | Indiana University (c) 2012, P. David Polly

G562 Geometric Morphometrics

Under what conditions would morphometric data not have a phylogenetic component?

1. If the phenotypes are non-genetic, entirely environmentally plastic responses to local conditions met by an organism during its lifetime

2. If morphometric variation is non-existent

3. If morphometric variation is entirely due to measurement error

4. If species were specially created and have no evolutionary history

5. If phylogenetic history is completely erased by other factors, such as homoplasy due to parallel functional adaptations in different clades

Page 7: Lecture 10 - Phylogeny and morphometrics

Department of Geological Sciences | Indiana University (c) 2012, P. David Polly

G562 Geometric Morphometrics

The issues at stake:

1. Can phylogenetic relationships be reconstructed from morphometric data?

2. Do morphometric patterns reflect adaptation (e.g., diet) or phylogeny or something else?

3. Are observations based on morphometrics related to evolution or chance or environmental plasticity?

Page 8: Lecture 10 - Phylogeny and morphometrics

Department of Geological Sciences | Indiana University (c) 2012, P. David Polly

G562 Geometric Morphometrics

The scientific solution:

1. Measure the contribution of potential factors, don’t assume that one factor is or is not important

2. Evaluate with hypothesis-driven tests for association between morphometric data and all relevant factors

3. Synthesize findings to describe the scope of each factor for explaining morphometric variation (i.e., under what conditions is the factor likely to be important and under what conditions is it not)

How much of morphometric variation can be explained by phylogenetic history? Under what circumstances phylogenetic history be recovered from morphometric data? With what accuracy can phylogenetic history be recovered from morphometric data? When does phylogenetic history interfere with recovering other relationships?

Page 9: Lecture 10 - Phylogeny and morphometrics

Department of Geological Sciences | Indiana University (c) 2012, P. David Polly

G562 Geometric Morphometrics

Morphometric Data

Phylogenetic

History

Functional

Role

Environmental

Interactions

Measurement

Error

Sample

Choice

Unconsidered

Factors

“Factor thinking” in morphometricsMany factors, not just one, contribute to morphometric similarities and differences (variance).

The question is often not whether a factor does or does not contribute, but rather how much does it contribute.

MANOVA and regression are two methods for partitioning variance among factors.

R2 is one metric for measuring the association of a factor with the total variance.

Page 10: Lecture 10 - Phylogeny and morphometrics

Department of Geological Sciences | Indiana University (c) 2012, P. David Polly

G562 Geometric Morphometrics

Empirical study of factors in morphometric variationCaumul, R and PD Polly. 2005. Phylogenetic and environmental components of morphological variation: skull, mandible, and molar shape in marmots (Marmota, Rodentia). Evolution, 59: 2460-2472.

Three phenotypic structures, each with different expectations for functional role, genetic vs. environmental contributions, and

complexity, from the same populations and, thus, with the same phylogenetic history.

Caumul and Polly, 20050.01

1. M. marmota

2. M. caudata caudata

3. M. caudata aurea

4. M. baibacina

5. M. himalayana robusta

6. M. sibirica sibirica

100.0

83.0

88.0

Page 11: Lecture 10 - Phylogeny and morphometrics

Department of Geological Sciences | Indiana University (c) 2012, P. David Polly

G562 Geometric Morphometrics

Caumul and Polly, 2005

Path analysis (controlled multiple regression)

Path coefficients (square to get R2)

Proportion unexplained

(no need to square)

Page 12: Lecture 10 - Phylogeny and morphometrics

Department of Geological Sciences | Indiana University (c) 2012, P. David Polly

G562 Geometric Morphometrics

Caumul and Polly, 2005

Proportion of morphometric variation explained by phylogeny and other factors

Page 13: Lecture 10 - Phylogeny and morphometrics

Department of Geological Sciences | Indiana University (c) 2012, P. David Polly

G562 Geometric Morphometrics

Caumul and Polly, 2005

Trees recovered from morphometric data

“Real” phylogeny

Phylo signal=7%: Poor recovery

Phylo signal=15%: Good recovery

Phylo signal=5%: Good recovery

Page 14: Lecture 10 - Phylogeny and morphometrics

Department of Geological Sciences | Indiana University (c) 2012, P. David Polly

G562 Geometric Morphometrics

What do we actually know about phylogenetic signal in morphometrics?1. Morphometric variation is largely, but not entirely heritable. Typical heritability

studies put the value at 40%-70% heritable (percentage of variation that is passed from parent to offspring), high for traits that are measured by geneticists.

2. Morphometric traits evolve quickly. Compared to the gain and loss of structures (i.e., cladistic state changes of the ideal type), the size and shape of structures changes rapidly. (Something that ought to be obvious based on logic alone).

3. Size and shape of homologous structures are often constrained by common ‘homologous’ functions. Once a structure has arisen, it often maintains a similar function throughout phylogenetic history (though there are notable exceptions). Thus the size and shape of that structure have functional constraints imposed on them. And morphometric comparisons are normally limited to structures that are found in all the OTUs being studied.

4. Traits have a window of time when they are likely to be phylogenetically informative. The window depends on the rate of evolution in the trait, the degree of functionality, and the degree of heritability. Within that window, phylogeny is likely to be recovered with a suitable methodology (e.g., Maximum-likelihood); outside that window, phylogeny is unlikely to be recovered.

Page 15: Lecture 10 - Phylogeny and morphometrics

Department of Geological Sciences | Indiana University (c) 2012, P. David Polly

G562 Geometric Morphometrics

Populations versus collections of populations

Variation within populations is largely free of phylogenetic effects and so is an appropriate system for measuring relationship of shape to factors such as body size, latitude, etc.

Variation between populations (or other OTUs) is normally influenced both by phylogenetic history and adaptive selection. The two may be difficult to disentangle (indeed, the two are themselves related). Among species in a appropriate system to measure disparity, adaptive similarity, etc.

Page 16: Lecture 10 - Phylogeny and morphometrics

Department of Geological Sciences | Indiana University (c) 2012, P. David Polly

G562 Geometric Morphometrics

Tree building: advantages

1. Trees summarize similarities and differences in shape across the whole shape (i.e., across all dimensions of morphospace) in a single, intuitive diagram

2. For biological data drawn from more than one species, the null assumption is that shape differences should be distributed in a tree-like hierarchy because of phylogeny

3. Morphometric trees can easily be compared to other trees (e.g., molecular phylogenies) using standard methods

4. Trees look impressive

Page 17: Lecture 10 - Phylogeny and morphometrics

Department of Geological Sciences | Indiana University (c) 2012, P. David Polly

G562 Geometric Morphometrics

Tree building: disadvantages

1. Your data might not have a natural hierarchical distribution

2. A tree necessarily distorts shape relations in order to force multivariate relationships into a single diagram

3. Trees forcefully represented complicated relationships and don’t necessarily encourage thoughtful exploration of data

4. Assessing statistical support for trees based on morphometric shape is complicated and in its infancy

5. May attract unwanted criticism because of controversies about phylogeny and morphometrics

Page 18: Lecture 10 - Phylogeny and morphometrics

Department of Geological Sciences | Indiana University (c) 2012, P. David Polly

G562 Geometric Morphometrics

Tree terminologyOTU (Operational Taxonomic Unit). This is a generic way of referring to whatever the things are on your tree (specimens, species means, handaxes, whatever).

Tree. A branching diagram that connects OTUs by their similarity. Many criteria can be used for constructing a tree. In all cases, an assumption in building a tree is that object differences are structured such that they form a hierarchically nested pattern.

Algorithm. The programming steps used to calculate the tree

Cladistic parsimony. A method for constructing trees for purposes of phylogeny reconstruction that requires traits to be categorized into ‘primitive’ and ‘derived’, hence an algorithm that cannot be applied to continuous quantitative data such as geometric shape.

Quantitative or continuous data. Data that are ‘measured’ and which can have a state equal to any real number. Geometric shape is an example of quantitative data.

Meristic or discrete data. Data in which a trait can have only a specific ‘state’, often coded using integers. Often this is the presence or absence of things like digits.

Phenetics. Tree building based on quantitative data. The term is normally used in a derogatory sense for comparison to cladistic parsimony because quantitative tree building methods do not formally divide variables into primitive and derived states.

Page 19: Lecture 10 - Phylogeny and morphometrics

Department of Geological Sciences | Indiana University (c) 2012, P. David Polly

G562 Geometric Morphometrics

Tree terminology (cont).Pairwise distance. Quantitative ‘distance’ between two OTUs. In geometric morphometrics the pairwise distance is most logically the Procrustes distance, which is the same as a Euclidean distance. There are many kinds of ‘distances’ that can be calculated, however.

Patristic distance. The distance between two OTUs along the branches of the tree. This is usually different than the true pairwise distance because of compromises made in constructing the tree.

Exact algorithm. A tree building algorithm that follows a single train of steps to calculate a tree.

Optimizing algorithm. A tree building algorithm that finds the ‘best’ tree using a certain criterion. A more-or-less exhaustive search is made through all possible trees to find the one that best fits the criterion.

Statistical method. An optimizing algorithm that incorporates a probability model based on variances and statistical distributions. Maximum likelihood is such a method.

Maximum likelihood. A general statistical method for estimating parameters like regression lines. In this discussion it refers to a specific statistically-based tree algorithm.

Page 20: Lecture 10 - Phylogeny and morphometrics

Department of Geological Sciences | Indiana University (c) 2012, P. David Polly

G562 Geometric Morphometrics

Types of treesAdditive. The length of branches on the tree correspond to change along the branch (i.e., to the shape distance between points on the tree). Neighbor-joining trees are an example of an additive tree.

Ultrametric tree. A tree constructed so that the branch lengths all end at the same distance from the root of the tree. This is a plausible requirement because many OTUs are taxa and have evolved from a common ancestor, all to the present day. However, the ‘true’ distance between OTUs may have to be distorted more to construct such a tree than with an additive tree. UPGMA trees are usually ultrametric.

Unnamed type of tree. One whose branches simply show the clustering relations of the OTUs, but don’t also show a ‘patristic distance’. Cladograms usually fall into this category.

Additive

Ultrametric

Page 21: Lecture 10 - Phylogeny and morphometrics

Department of Geological Sciences | Indiana University (c) 2012, P. David Polly

G562 Geometric Morphometrics

Approaches to tree constructionDistance matrix trees. For these trees, a pairwise distance matrix is first constructed. The algorithm fits the tree using those distances. A distance matrix is very closely related to a covariance matrix, but doesn’t keep track of individual traits (which is important to diagnose what supports different branches). Both the cluster and neighbor-joining methods are distance methods.

Trait based trees. A tree constructed from individual traits. Maximum likelihood is such a method. In theory, the value of each trait can be determined for each node. For geometric shape, this means that a shape can be constructed anywhere on the tree.

Taxon&A&&

Trait&1&Trait&2&Trait&3&Trait&4&

Taxon&B&&

Trait&1&Trait&2&Trait&3&Trait&4&

Distance)(D2))

Taxon&A&&

Trait&1&Trait&2&Trait&3&Trait&4&

Taxon&B&&

Trait&1&Trait&2&Trait&3&Trait&4&

Likelihood&(or&Bayesian)&op9mizes&across&all&traits&individually&

Distance&trees&combine&trait&values&in&single&distance&

Page 22: Lecture 10 - Phylogeny and morphometrics

Department of Geological Sciences | Indiana University (c) 2012, P. David Polly

G562 Geometric Morphometrics

Types of distance treesUPGMA: an ultrametric tree that can be calculated using any of the same distances, as well as with different joining methods.

Neighbour joining: an additive tree that can be calculated using several distances.

Squared Euclidean distance. For shape data, the Euclidean distance is equivalent to Procrustes distance and is recommended, specifically a squared Euclidean distance (which corresponds to theory of random evolution in quantitative traits)

UPGMA in Mathematica << HierarchicalClustering` (* load Mathematica package *)

tree = Agglomerate[scores[[1 ;;, 1 ;; 5]] -> labels, DistanceFunction -> SquaredEuclideanDistance, Linkage -> "Average"] (* Agglomerate[] function calculates tree, “-> labels” causes your list of labels to be inserted into the tree, “distance function” is option to select appropriate distance measure, “linkage” is option to specify the rules for building the tree. “Average” linkage and “SquaredEuclideanDistance” give a UPGMA tree. *)

DendrogramPlot[tree, LeafLabels -> (# &), Orientation -> Bottom] WALLABY LEOPARD OTTER FOSSA DOG HUMAN

Page 23: Lecture 10 - Phylogeny and morphometrics

Department of Geological Sciences | Indiana University (c) 2012, P. David Polly

G562 Geometric Morphometrics

Trait-based trees: Maximum likelihood

ML optimizes the tree statistically, keeping all the traits separate rather than combining them into a single distance.

The ML algorithm finds the tree topology that maximizes the likelihood of the shape data in your sample having evolved given that tree and a Brownian motion model of evolution (i.e., no long-term directional selection and no strong stabilizing selection).

The algorithm combines the probabilities associated with each OTU to find a branching pattern that connects them, presuming that the probability distribution of each is centered on its own value and that the variance of the distribution is a portion of the variance among all the OTUs.

Page 24: Lecture 10 - Phylogeny and morphometrics

Department of Geological Sciences | Indiana University (c) 2012, P. David Polly

G562 Geometric Morphometrics

Programs for building and processing trees:

Phylogenetics for Mathematica package(reads trees, reconstructs ancestral states, simulates data on trees, independent contrasts, phylogenetic regression) http://mypage.iu.edu/~pdpolly/Software.html

Morphometrics for Mathematica package(trees into morphospace, reconstructs ancestral shape on trees, probabilities of ancestral shapes) http://mypage.iu.edu/~pdpolly/Software.html

PHYLIP (builds trees using several algorithms) http://evolution.genetics.washington.edu/phylip.html

Mesquite (lots of tree-based analyses and graphics) http://mesquiteproject.org/

Page 25: Lecture 10 - Phylogeny and morphometrics

Department of Geological Sciences | Indiana University (c) 2012, P. David Polly

G562 Geometric Morphometrics

Building an ML tree in PHYLIPPHYLIP is a phylogenetics package written by Joseph Felsenstein and is available at http://evolution.genetics.washington.edu/phylip.html

1. Export scores in a comma delimited file with OTU names in first column. If you use the following code, the labels will be properly formatted (10 characters max) and the number of OTUs and traits will be added to the first line of the file:phyliplabels = MakePHYLIPLabels[labels];phylipdata = Table[Prepend[scores[[x]], phyliplabels[[x]]], {x, Length[scores]}];phylipdata=Prepend[phylipdata,{Length[phylipdata],Length[phylipdata[[1]]-1}];Export["/Users/Data files/mammalscores.csv", phylipdata, "CSV"]

2. Edit the file in a text editor where you should replace all commas with spaces

3. Save the file with the name ‘infile’ in the PHYLIP folder.

4. Start CONTML module by double-clicking it. (Continuous Maximum likelihood).

5. Choose options: ‘C” for continuous traits, “J’ jumble trees (add any odd number).

6. Click ‘Y’ to start the program.

7. View results in ‘outfile’, which is a text file showing tree and statistics, and in ‘treefile’, which is a file that can be viewed and edited in TreeView.

8. Import the “outfile” back to Mathematica with the ReadNewick[] function and display the tree with the DrawNewickTree[] function (both from PollyPhylogenetics package)

PHYLIP

Page 26: Lecture 10 - Phylogeny and morphometrics

Department of Geological Sciences | Indiana University (c) 2012, P. David Polly

G562 Geometric Morphometrics

Newick Trees

Common text format for storing trees. Named after Newick’s Lobster House in Dover, New Hampshire where Felsenstein, Rohlf, Maddison, Swofford and others had dinner as they discussed standards for phylogenetic software.

Format: Branching pattern represented by parentheses, taxa separated by commas, tip names written as text, branch lengths follow with colon.

(WALLABY:80,(HUMAN:65,((LEOPARD:35,FOSSA:35):20,(DOG:35,OTTER:35):20):10):15);

WALLABY

HUMAN

LEOPARD

FOSSA

DOG

OTTER

Node 3

Node 4

Node 2

Node 1

Node 0

0 20 40 60 80

Page 27: Lecture 10 - Phylogeny and morphometrics

Department of Geological Sciences | Indiana University (c) 2012, P. David Polly

G562 Geometric Morphometrics

Phylogenetic correlations

When doing statistics of shape data to other variables (e.g., regression or MANOVA), one must be careful of phylogenetic correlations when the data consist of different species or populations, some of which are more closely related and some of which are more distantly related.

Phylogeny makes such data non-independent because similarity in shape and the other variable may be due to common ancestry rather than direct causal association.

Use phylogenetic comparative statistics when you need to take phylogenetic correlations into account.

Page 28: Lecture 10 - Phylogeny and morphometrics

Department of Geological Sciences | Indiana University (c) 2012, P. David Polly

G562 Geometric Morphometrics

Phylogenetic comparative statisticsPhylogenetic independent contrasts (PIC) A simple method for removing effects of shared history from data. Invented by Joe Felsenstein. Elaborated by Andy Purvis. PIC transforms the original data into new data (contrasts) from which the phylogenetic effects have been removed. Perform regression or other statistical analyses on the contrasts to determine the non-phylogenetic correlation between data and a factor. In Mathematica: IndependentContrasts[trait, tree] (from Phylogenetics package)

Phylogenetic General Linear Models (PGLM) A more sophisticated method for assessing phylogenetic correlation and for mapping traits onto a tree. Developed by Martins and Hansen (1997). Can be used for the same purposes as PIC (e.g., regression), or can be used to reconstruct ancestral traits. In Mathematica: ReconstructNodes[tree, trait], PhylogeneticMatrices[tree] (Phylogenetics package), ReconstructAncestorShapes[proc, labels, tree], TreeToMorphospace[proc, labels, PCs, tree] (Morphometrics package)

Squared Change ParsimonyA cladist-friendly name for what is virtually the same things as the maximum-likelihood version of PGLM. Developed by Maddison (1991). Can be used to reconstruct ancestral traits or to calculate evolutionary changes of the trait on a tree.

Phylogenetic Principal Components Analysis (pPCA) A method that produces a principal components space based on a covariance matrix from which phylogenetic covariance has been removed, then projects scores into that space (without removing phylogenetic components of their variance). Developed by Revell (2009), discussed in detail by Polly et al. (2013). In Mathematica: PhylogeneticPrincipalComponentsOfShape[proc, labels, PCs, tree] (Morphometrics package)

Page 29: Lecture 10 - Phylogeny and morphometrics

Department of Geological Sciences | Indiana University (c) 2012, P. David Polly

G562 Geometric Morphometrics

Morphospace trees in Mathematica<<PollyMorphometrics` <<PollyPhylogenetics`

tree = ReadNewick[‘/path/filename.txt’]

TreeToMorphospace[proc, labels, {1,2}, tree]

Tips:

1. labels must be in same order as Procrustes objects

2. labels and tree tip names must be identical

Page 30: Lecture 10 - Phylogeny and morphometrics

Department of Geological Sciences | Indiana University (c) 2012, P. David Polly

G562 Geometric Morphometrics

Phylogenetic tree projected into morphospace

WALLABY

LEOPARDHUMAN

OTTER

FOSSA

DOG

Node 0

Node 1

Node 2

Node 3

Node 4

-0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4

-0.15

-0.10

-0.05

0.00

0.05

0.10

0.15

PC 1

PC2

• Ancestral shape scores reconstructed using maximum likelihood (assuming Brownian motion process of evolution)

• Ancestors plotted in morphospace • Tree branches drawn to connect ancestors and nodes

Page 31: Lecture 10 - Phylogeny and morphometrics

Suggested Reading

Department of Geological Sciences | Indiana University (c) 2012, P. David Polly

G562 Geometric Morphometrics

Caumul, R and PD Polly. 2005. Phylogenetic and environmental components of morphological variation: skull, mandible, and molar shape in marmots (Marmota, Rodentia). Evolution, 59: 2460-2472.

Felsenstein, J. 1985. Phylogenies and the comparative method. The American Naturalist 125: 1-15.

Klingenberg, CP and NA Gidaszewski. 2010. Testing and quantifying phylogenetic signals and homoplasy in morphometric data. Systematic Biology, 59: 245-261.

Maddison, W. P. 1991. Squared-change parsimony reconstructions of ancestral states for continuous-valued characters on a phylogenetic tree. Systematic Zoology, 40: 304-314.

Martins, E. P. and T. F. Hansen. 1997. Phylogenies and the comparative method: a general approach to incorporating phylogenetic information into the analysis of interspecific data. American Naturalist, 149: 646-667.

Revell, L. J. 2010. Phylogenetic signal and linear regression on species data. Methods in Ecology and Evolution, 1: 319-329.