managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · november...

117
Managing and analysing (next-generation) multivariate ecological data: new concepts and tools Steve C. Walker McMaster University Department of Mathematics and Statistics Bolker lab November 21, 2012 EEB seminar, McMaster University

Upload: others

Post on 16-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Managing and analysing (next-generation)multivariate ecological data:

new concepts and tools

Steve C. Walker

McMaster UniversityDepartment of Mathematics and Statistics

Bolker lab

November 21, 2012EEB seminar, McMaster University

Page 2: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

IntroductionMotivation from observational community ecologyIllustrating the basic issue

Previous work on the data management-analysis interface, bothinside and outside of community ecology

The ‘old’ schoolThe ‘middle’ schoolThe ‘new’ school

The R multitable package

Thermocline deepening experiment

Page 3: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Bythotrephes longimanus

Wisconsin Dept. of Natural Resources

Page 4: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Bythotrephes longimanus

Yan et al. 2002

Page 5: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Fourth-corner problem

abundance

species

siteproperties

prop

erties

spec

ies

fourthcorner

site

s

Legendre et al. 1997

Page 6: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Fourth-corner problem

abundance

speciessite

properties

prop

erties

spec

ies

fourthcorner

site

s

Legendre et al. 1997

Page 7: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Fourth-corner problem

abundance

speciessite

properties

prop

erties

spec

ies

fourthcorner

site

s

Legendre et al. 1997

Page 8: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Fourth-corner problem

abundance

speciessite

properties

prop

erties

spec

ies

fourthcorner

site

s

Legendre et al. 1997

Page 9: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Statistical methods for analyzing ‘fourth-corner’-esque data

I Chessel et al. (1996) — RLQ analysis

I Legendre et al. (1997) — coined term ‘fourth-corner’

I Ives and Godfray (2006) — mixed models ofphylogenetically-structured foodwebs

I Dray and Legendre (2008) — extends Legendre et al.

I Pillar and Duarte (2010) — phylogenetic null models

I Leibold et al. (2010) — semi-partial correlations

I Ives and Helmus (2011) — phylogenetic generalized linearmixed models (PGLMMs)

I ter Braak et al. (2012) – multiple comparison tests

Page 10: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

The data frame

variables

repl

icat

es

Page 11: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Thermocline manipulation experiment

abundance

time

species

traits

time

basin

scal

estim

e

species

env.vars.

Cantin et al. (2011)

Page 12: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Thermocline manipulation experiment

abundance

time

species

traits

time

basin

scal

estim

e

species

env.vars.

Cantin et al. (2011)

Page 13: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Thermocline manipulation experiment

abundance

time

species

traits

time

basin

scal

estim

e

species

env.vars.

Cantin et al. (2011)

Page 14: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Thermocline manipulation experiment

abundance

time

species

traits

time

basin

scal

estim

e

species

env.vars.

Cantin et al. (2011)

Page 15: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Thermocline manipulation experiment

abundance

time

species

traits

time

basin

scal

estim

e

species

env.vars.

Cantin et al. (2011)

Page 16: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Thermocline manipulation experiment

abundance

time

species

traits

time

basin

scal

estim

e

species

env.vars.

Cantin et al. (2011)

Page 17: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

IntroductionMotivation from observational community ecologyIllustrating the basic issue

Previous work on the data management-analysis interface, bothinside and outside of community ecology

The ‘old’ schoolThe ‘middle’ schoolThe ‘new’ school

The R multitable package

Thermocline deepening experiment

Page 18: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

How do we convert this into a data frame?

abundance

speciessite

properties

prop

erties

spec

ies

fourthcorner

site

s

Page 19: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Summarisation → lost information

site

smeanspecies

properties

=site

sabundance

species

×

spec

ies

speciesproperties

e.g. Leibold et al. 2010

Page 20: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Summarisation → lost information

site

s

meanspecies

propertiessite

properties

e.g. Leibold et al. 2010

Page 21: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Summarisation → lost information

site

s

functionaldiversityindices

siteproperties

e.g. Leibold et al. 2010

Page 22: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Repetition → redundant information

species 1, site 1

species 1, site 2

species 1, site 3

species 1, site 4

species 1, site 5

species 1, site 6

species 2, site 1

species 2, site 2

species 2, site 3

species 2, site 4

species 2, site 5

species 2, site 6

species 3, site 1

species 3, site 2

species 3, site 3

species 3, site 4

species 3, site 5

species 3, site 6

Abundance Environment Traits

Page 23: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Repetition → redundant information

species 1, site 1

species 1, site 2

species 1, site 3

species 1, site 4

species 1, site 5

species 1, site 6

species 2, site 1

species 2, site 2

species 2, site 3

species 2, site 4

species 2, site 5

species 2, site 6

species 3, site 1

species 3, site 2

species 3, site 3

species 3, site 4

species 3, site 5

species 3, site 6

Abundance Environment Traits

Page 24: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Repetition → redundant information

species 1, site 1

species 1, site 2

species 1, site 3

species 1, site 4

species 1, site 5

species 1, site 6

species 2, site 1

species 2, site 2

species 2, site 3

species 2, site 4

species 2, site 5

species 2, site 6

species 3, site 1

species 3, site 2

species 3, site 3

species 3, site 4

species 3, site 5

species 3, site 6

Abundance Environment Traits

Page 25: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Repetition → redundant information

species 1, site 1

species 1, site 2

species 1, site 3

species 1, site 4

species 1, site 5

species 1, site 6

species 2, site 1

species 2, site 2

species 2, site 3

species 2, site 4

species 2, site 5

species 2, site 6

species 3, site 1

species 3, site 2

species 3, site 3

species 3, site 4

species 3, site 5

species 3, site 6

Abundance Environment Traits

Page 26: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

When converting a fourth-corner problem into a single dataframe you’ve got two choices:

I Summarisation → lost information

I Repetition → redundant information

Page 27: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

IntroductionMotivation from observational community ecologyIllustrating the basic issue

Previous work on the data management-analysis interface, bothinside and outside of community ecology

The ‘old’ schoolThe ‘middle’ schoolThe ‘new’ school

The R multitable package

Thermocline deepening experiment

Page 28: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Linear algebra as data management

Ancient Chinese text (∼150 BCE)

Page 29: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Linear algebra as data management

Hart (2009)

Page 30: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Linear algebra as data management

Hart (2009)

Page 31: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Linear algebra as data management

Solve for the b’s

y1 = b1x11 + b2x12 + . . . + bmx1m

y2 = b1x21 + b2x22 + . . . + bmx2m...

......

. . ....

yn = b1xn1 + b2xn2 + . . . + bmxnm

(1)

Page 32: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Linear algebra as data management

y =

y1

y2...yn

,X =

x11 x12 . . . x1m

x21 x22 . . . x2m...

.... . .

...xn1 xn2 . . . xnm

,b =

b1

b2...

bn

y = Xb

XTy = XTXb

(XTX)−1

XTy = b

(2)

Page 33: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Linear algebra as data management

y =

y1

y2...yn

,X =

x11 x12 . . . x1m

x21 x22 . . . x2m...

.... . .

...xn1 xn2 . . . xnm

,b =

b1

b2...

bn

y = Xb

XTy = XTXb

(XTX)−1

XTy = b

(2)

Page 34: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Linear algebra as data management

y =

y1

y2...yn

,X =

x11 x12 . . . x1m

x21 x22 . . . x2m...

.... . .

...xn1 xn2 . . . xnm

,b =

b1

b2...

bn

y = Xb

XTy = XTXb

(XTX)−1

XTy = b

(2)

Page 35: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

The importance of data management to science

I Good theories of data management

(e.g. matrix algebra)allow us to think at a higher level of abstraction, therebyallowing us to focus on the interesting new parts of theproblem (e.g. the meaning of Y,X,B).

I This is because the uninteresting old details (e.g. how to solvethe linear equation) are automatically correct if the theory iscorrectly applied (e.g. because it has been previously learned).

I Therefore, we don’t need to actively think about such detailsuntil we step outside of the domain of the theory.

Page 36: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

The importance of data management to science

I Good theories of data management (e.g. matrix algebra)

allow us to think at a higher level of abstraction, therebyallowing us to focus on the interesting new parts of theproblem (e.g. the meaning of Y,X,B).

I This is because the uninteresting old details (e.g. how to solvethe linear equation) are automatically correct if the theory iscorrectly applied (e.g. because it has been previously learned).

I Therefore, we don’t need to actively think about such detailsuntil we step outside of the domain of the theory.

Page 37: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

The importance of data management to science

I Good theories of data management (e.g. matrix algebra)allow us to think at a higher level of abstraction,

therebyallowing us to focus on the interesting new parts of theproblem (e.g. the meaning of Y,X,B).

I This is because the uninteresting old details (e.g. how to solvethe linear equation) are automatically correct if the theory iscorrectly applied (e.g. because it has been previously learned).

I Therefore, we don’t need to actively think about such detailsuntil we step outside of the domain of the theory.

Page 38: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

The importance of data management to science

I Good theories of data management (e.g. matrix algebra)allow us to think at a higher level of abstraction, therebyallowing us to focus on the interesting new parts of theproblem

(e.g. the meaning of Y,X,B).

I This is because the uninteresting old details (e.g. how to solvethe linear equation) are automatically correct if the theory iscorrectly applied (e.g. because it has been previously learned).

I Therefore, we don’t need to actively think about such detailsuntil we step outside of the domain of the theory.

Page 39: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

The importance of data management to science

I Good theories of data management (e.g. matrix algebra)allow us to think at a higher level of abstraction, therebyallowing us to focus on the interesting new parts of theproblem (e.g. the meaning of Y,X,B).

I This is because the uninteresting old details (e.g. how to solvethe linear equation) are automatically correct if the theory iscorrectly applied (e.g. because it has been previously learned).

I Therefore, we don’t need to actively think about such detailsuntil we step outside of the domain of the theory.

Page 40: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

The importance of data management to science

I Good theories of data management (e.g. matrix algebra)allow us to think at a higher level of abstraction, therebyallowing us to focus on the interesting new parts of theproblem (e.g. the meaning of Y,X,B).

I This is because the uninteresting old details

(e.g. how to solvethe linear equation) are automatically correct if the theory iscorrectly applied (e.g. because it has been previously learned).

I Therefore, we don’t need to actively think about such detailsuntil we step outside of the domain of the theory.

Page 41: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

The importance of data management to science

I Good theories of data management (e.g. matrix algebra)allow us to think at a higher level of abstraction, therebyallowing us to focus on the interesting new parts of theproblem (e.g. the meaning of Y,X,B).

I This is because the uninteresting old details (e.g. how to solvethe linear equation)

are automatically correct if the theory iscorrectly applied (e.g. because it has been previously learned).

I Therefore, we don’t need to actively think about such detailsuntil we step outside of the domain of the theory.

Page 42: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

The importance of data management to science

I Good theories of data management (e.g. matrix algebra)allow us to think at a higher level of abstraction, therebyallowing us to focus on the interesting new parts of theproblem (e.g. the meaning of Y,X,B).

I This is because the uninteresting old details (e.g. how to solvethe linear equation) are automatically correct if the theory iscorrectly applied

(e.g. because it has been previously learned).

I Therefore, we don’t need to actively think about such detailsuntil we step outside of the domain of the theory.

Page 43: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

The importance of data management to science

I Good theories of data management (e.g. matrix algebra)allow us to think at a higher level of abstraction, therebyallowing us to focus on the interesting new parts of theproblem (e.g. the meaning of Y,X,B).

I This is because the uninteresting old details (e.g. how to solvethe linear equation) are automatically correct if the theory iscorrectly applied (e.g. because it has been previously learned).

I Therefore, we don’t need to actively think about such detailsuntil we step outside of the domain of the theory.

Page 44: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

The importance of data management to science

I Good theories of data management (e.g. matrix algebra)allow us to think at a higher level of abstraction, therebyallowing us to focus on the interesting new parts of theproblem (e.g. the meaning of Y,X,B).

I This is because the uninteresting old details (e.g. how to solvethe linear equation) are automatically correct if the theory iscorrectly applied (e.g. because it has been previously learned).

I Therefore, we don’t need to actively think about such details

until we step outside of the domain of the theory.

Page 45: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

The importance of data management to science

I Good theories of data management (e.g. matrix algebra)allow us to think at a higher level of abstraction, therebyallowing us to focus on the interesting new parts of theproblem (e.g. the meaning of Y,X,B).

I This is because the uninteresting old details (e.g. how to solvethe linear equation) are automatically correct if the theory iscorrectly applied (e.g. because it has been previously learned).

I Therefore, we don’t need to actively think about such detailsuntil we step outside of the domain of the theory.

Page 46: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

IntroductionMotivation from observational community ecologyIllustrating the basic issue

Previous work on the data management-analysis interface, bothinside and outside of community ecology

The ‘old’ schoolThe ‘middle’ schoolThe ‘new’ school

The R multitable package

Thermocline deepening experiment

Page 47: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Ihaka and Gentleman 1996

Page 48: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

The data frame

variables

repl

icat

es

Page 49: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

The R framework for data management

replicates

den

temp

precip

Chambers and Hastie 1991

Page 50: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

The R framework for data management

rep

licate

s

+den ~ temp + precip

den

tem

p

pre

cip

Chambers and Hastie 1991

Page 51: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

The R framework for data management

rep

licate

s

+den ~ temp + precip

+lm / glmer / plot / xyplot

den

tem

p

pre

cip

Chambers and Hastie 1991

Page 52: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

The R framework for data management

rep

licate

s

+den ~ temp + precip

+lm / glmer / plot / xyplot

den

tem

p

pre

cip

=

temp

den

p < 0.0001

(intcpt)tempprecip

coef-1.2 2.1-0.1

s.e.0.40.10.1

Chambers and Hastie 1991

Page 53: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

> datasetden temp precip

1 0.2 24.5 36.52 0.5 -26.4 36.03 0.8 4.9 15.54 1.5 12.2 34.85 0.6 18.7 99.3

Page 54: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

> dataset[1:2, ]den temp precip

1 0.2 24.5 36.52 0.5 -26.4 36.0

Page 55: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

> lm(den ~ temp + precip, data = dataset)Coefficients:(Intercept) temp precip

0.837385 0.001930 -0.002937

Page 56: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

The data frame

variables

repl

icat

es

Page 57: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Fourth corner problem

abundance

speciessite

properties

prop

erties

spec

ies

fourthcorner

site

s

Legendre et al. 1997

Page 58: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Thermocline manipulation experiment

abundance

time

species

traits

time

basin

scal

estim

especies

env.vars.

Cantin et al. (2011)

Page 59: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Goal Analyze next-generation multiple-table data setsusing this framework

Problem R doesn’t do multiple-tables ‘out-of-the-box’

Strategy Develop some theory to better understand multipletable data management and then use that theory toextend the R framework to allow multiple-table datasets

data sourcesdata list

data frame + formula + function = analysis

Page 60: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Goal Analyze next-generation multiple-table data setsusing this framework

Problem R doesn’t do multiple-tables ‘out-of-the-box’

Strategy Develop some theory to better understand multipletable data management and then use that theory toextend the R framework to allow multiple-table datasets

data sourcesdata list

data frame + formula + function = analysis

Page 61: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Goal Analyze next-generation multiple-table data setsusing this framework

Problem R doesn’t do multiple-tables ‘out-of-the-box’

Strategy Develop some theory to better understand multipletable data management and then use that theory toextend the R framework to allow multiple-table datasets

data sourcesdata list

data frame + formula + function = analysis

Page 62: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Goal Analyze next-generation multiple-table data setsusing this framework

Problem R doesn’t do multiple-tables ‘out-of-the-box’

Strategy Develop some theory to better understand multipletable data management and then use that theory toextend the R framework to allow multiple-table datasets

data sourcesdata list

data frame + formula + function = analysis

Page 63: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Goal Analyze next-generation multiple-table data setsusing this framework

Problem R doesn’t do multiple-tables ‘out-of-the-box’

Strategy Develop some theory to better understand multipletable data management and then use that theory toextend the R framework to allow multiple-table datasets

data sources

data list

data frame + formula + function = analysis

Page 64: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Goal Analyze next-generation multiple-table data setsusing this framework

Problem R doesn’t do multiple-tables ‘out-of-the-box’

Strategy Develop some theory to better understand multipletable data management and then use that theory toextend the R framework to allow multiple-table datasets

data sourcesdata list

data frame + formula + function = analysis

Page 65: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

IntroductionMotivation from observational community ecologyIllustrating the basic issue

Previous work on the data management-analysis interface, bothinside and outside of community ecology

The ‘old’ schoolThe ‘middle’ schoolThe ‘new’ school

The R multitable package

Thermocline deepening experiment

Page 66: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

ock star: Hadley Wickham

reshape2

plyr

ggplot2

...

Page 67: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

reshape2

abundance

time

spac

e

species

variables

repl

icat

escasting

melting

Page 68: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

reshape2

abundance

time

spac

e

species

variables

repl

icat

es

casting

melting

Page 69: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

reshape2

abundance

time

spac

e

species

variables

repl

icat

escasting

melting

Page 70: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

reshape2

> X

, , capybara

midlatitude subtropical tropical equatorial arctic subarctic

2009 4 0 8 0 0 0

2008 0 10 0 7 0 0

1537 0 0 0 0 0 0

, , moss

midlatitude subtropical tropical equatorial arctic subarctic

2009 0 0 9 0 5 0

2008 6 0 0 3 0 0

1537 0 0 0 0 0 0

, , vampire

midlatitude subtropical tropical equatorial arctic subarctic

2009 0 0 0 0 0 0

2008 0 0 0 0 0 0

1537 0 1 0 0 0 0

Page 71: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

reshape2

> Xmelt <- melt(X, varnames = c(’year’,’biome’,’species’),

value.name = ’abundance’)

> Xmelt

year biome species abundance

1 2009 midlatitude capybara 4

2 2008 midlatitude capybara 0

3 1537 midlatitude capybara 0

4 2009 subtropical capybara 0

5 2008 subtropical capybara 10

6 1537 subtropical capybara 0

7 2009 tropical capybara 8

...

48 1537 equatorial vampire 0

49 2009 arctic vampire 0

50 2008 arctic vampire 0

51 1537 arctic vampire 0

52 2009 subarctic vampire 0

53 2008 subarctic vampire 0

54 1537 subarctic vampire 0

Page 72: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

reshape2

> acast(Xmelt, year ~ biome ~ species)

, , capybara

arctic equatorial midlatitude subarctic subtropical tropical

1537 0 0 0 0 0 0

2008 0 7 0 0 10 0

2009 0 0 4 0 0 8

, , moss

arctic equatorial midlatitude subarctic subtropical tropical

1537 0 0 0 0 0 0

2008 0 3 6 0 0 0

2009 5 0 0 0 0 9

, , vampire

arctic equatorial midlatitude subarctic subtropical tropical

1537 0 0 0 0 1 0

2008 0 0 0 0 0 0

2009 0 0 0 0 0 0

Page 73: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Thermocline manipulation experiment

abundance

time

species

traits

time

basin

scal

estim

especies

env.vars.

Cantin et al. (2011)

Page 74: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Peter Solymos

mefa / mefa4

vegan

dclone

...

Page 75: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

mefa / mefa4

count data matrix(x$xtab)

segments(x$segm)

data framefor samples(x$samp)

data frame for taxa(x$taxa)

Solymos 2009

Page 76: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Thermocline manipulation experiment

abundance

time

species

traits

time

basin

scal

estim

especies

env.vars.

Cantin et al. (2011)

Page 77: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

IntroductionMotivation from observational community ecologyIllustrating the basic issue

Previous work on the data management-analysis interface, bothinside and outside of community ecology

The ‘old’ schoolThe ‘middle’ schoolThe ‘new’ school

The R multitable package

Thermocline deepening experiment

Page 78: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational
Page 79: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

multitable

data sourcesdata list

data frame + formula + function = analysis

Page 80: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

The central distinction of multitable

Variables

I Things that can berelated

I Axes on a scatterplot

I Columns in a dataframe (or database)

Replicates

I Information aboutrelationships

I Points on a scatterplot

I Rows in a data frame(or database)

Page 81: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

The central distinction of multitable

VariablesI Things that can be

related

I Axes on a scatterplot

I Columns in a dataframe (or database)

Replicates

I Information aboutrelationships

I Points on a scatterplot

I Rows in a data frame(or database)

Page 82: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

The central distinction of multitable

VariablesI Things that can be

related

I Axes on a scatterplot

I Columns in a dataframe (or database)

Replicates

I Information aboutrelationships

I Points on a scatterplot

I Rows in a data frame(or database)

Page 83: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

The central distinction of multitable

VariablesI Things that can be

related

I Axes on a scatterplot

I Columns in a dataframe (or database)

Replicates

I Information aboutrelationships

I Points on a scatterplot

I Rows in a data frame(or database)

Page 84: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

The scatterplot

−0.5 0.5 1.0 1.5

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

x variable

y va

riabl

e

Page 85: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

The data frame

variables

repl

icat

es

Page 86: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

The data frame — bipartite graph

replicates variables

Page 87: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Variables and replicates in the fourth corner problem?

abundance

speciessite

properties

prop

erties

spec

ies

fourthcorner

site

s

Legendre et al. 1997

Page 88: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Fourth corner problem — bipartite graph

sites

species

environment

abundance

traits

Page 89: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Thermocline manipulation experiment

abundance

time

species

traits

time

basin

scal

estim

especies

env.vars.

Cantin et al. (2011)

Page 90: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Thermocline manipulation experiment — bipartite graph

sites

time

species

environment

abundance

time scales

traits

Page 91: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Biadjacency matrices

sites

species

environment

abundance

traits

abundance environment traitssites 1 1 0

species 1 0 1

Page 92: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

> install.packages(‘multitable’)> library(multitable)

Page 93: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

> dlabundance:---------

sppA sppB sppCsiteA 0 1 10siteB 0 2 12siteC 2 1 1siteD 0 7 0siteE 2 0 0Replicated along: || sites || species ||

temperature:-----------siteA siteB siteC siteD siteE-0.24 0.40 2.12 -0.72 5.95Replicated along: || sites ||

continued...

Page 94: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

bodysize:--------sppA sppB sppC0.87 1.52 2.67Replicated along: || species ||

REPLICATION DIMENSIONS:sites species

5 3

Page 95: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

> summary(dl)abundance temperature bodysize

sites TRUE TRUE FALSEspecies TRUE FALSE TRUE

Page 96: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

> dl[1:3, ]abundance:---------

sppA sppB sppCsiteA 0 1 10siteB 0 2 12siteC 2 1 1Replicated along: || sites || species ||

temperature:-----------siteA siteB siteC-0.24 0.40 2.12Replicated along: || sites ||

continued...

Page 97: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

bodysize:--------sppA sppB sppC0.87 1.52 2.67Replicated along: || species ||

REPLICATION DIMENSIONS:sites species

3 3

Page 98: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

> df <- as.data.frame(dl)> df

abundance temperature bodysizesiteA.sppA 0 -0.24 0.87siteB.sppA 0 0.40 0.87siteC.sppA 2 2.12 0.87siteD.sppA 0 -0.72 0.87siteE.sppA 2 5.95 0.87siteA.sppB 1 -0.24 1.52siteB.sppB 2 0.40 1.52siteC.sppB 1 2.12 1.52siteD.sppB 7 -0.72 1.52siteE.sppB 0 5.95 1.52siteA.sppC 10 -0.24 2.67siteB.sppC 12 0.40 2.67siteC.sppC 1 2.12 2.67siteD.sppC 0 -0.72 2.67siteE.sppC 0 5.95 2.67

Page 99: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

> lm(abundance ~ temperature + bodysize, data = df)

Coefficients:(Intercept) temperature bodysize

-0.3613 -0.4403 2.1083

> lm(abundance ~ temperature * bodysize, data = df)

Coefficients:(Intercept) temperature bodysize

-2.1612 0.7580 3.1755temperature:bodysize

-0.7105

Page 100: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

> df <- as.data.frame(dims_to_vars(dl))> df

abundance temperature bodysize sites speciessiteA.sppA 0 -0.24 0.87 siteA sppAsiteB.sppA 0 0.40 0.87 siteB sppAsiteC.sppA 2 2.12 0.87 siteC sppAsiteD.sppA 0 -0.72 0.87 siteD sppAsiteE.sppA 2 5.95 0.87 siteE sppAsiteA.sppB 1 -0.24 1.52 siteA sppBsiteB.sppB 2 0.40 1.52 siteB sppBsiteC.sppB 1 2.12 1.52 siteC sppBsiteD.sppB 7 -0.72 1.52 siteD sppBsiteE.sppB 0 5.95 1.52 siteE sppBsiteA.sppC 10 -0.24 2.67 siteA sppCsiteB.sppC 12 0.40 2.67 siteB sppCsiteC.sppC 1 2.12 2.67 siteC sppCsiteD.sppC 0 -0.72 2.67 siteD sppCsiteE.sppC 0 5.95 2.67 siteE sppC

Page 101: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

> library(lme4)> form <- abundance ~ (temperature * bodysize) +

(-1 + temperature | species)> glmer(form, data = df, family = ’poisson’)

Bates, Maechler, and Bolker

Page 102: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Generalized linear mixed model fit by maximum likelihood:

Random effects:Groups Name Variance Std.Dev.species temperature 0.003439 0.05864

Number of obs: 15, groups: species, 3

Fixed effects:Estimate Std. Error z value Pr(>|z|)

(Intercept) -0.8456 0.5988 -1.412 0.1579tmprt 0.2845 0.2491 1.142 0.2535bdys 1.0077 0.2566 3.928 8.57e-05 ***tmprtr:bdys -0.2848 0.1391 -2.049 0.0405 *

Bates, Maechler, and Bolker

Page 103: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

library(ggplot2)ggplot(df) +

facet_wrap(~ species) +aes(x = temperature, y = abundance, size = bodysize) +geom_point()

● ●

●●

●●

●● ●

sppA sppB sppC

0.02.55.07.5

10.012.5

0 2 4 6 0 2 4 6 0 2 4 6temperature

abun

danc

e

bodysize●

1.0

1.5

2.0

2.5

Page 104: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

> cor(as.data.frame(dl))abundance temperature bodysize

abundance 1.0000000 -0.2839765 0.4176379temperature -0.2839765 1.0000000 0.0000000bodysize 0.4176379 0.0000000 1.0000000> summary(dl)

abundance temperature bodysizesites TRUE TRUE FALSEspecies TRUE FALSE TRUE

Page 105: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

> dlmelt(dl)

$sites.species

abndnc sites species

siteA.sppA 0 siteA sppA

siteB.sppA 0 siteB sppA

siteC.sppA 2 siteC sppA

siteD.sppA 0 siteD sppA

siteE.sppA 2 siteE sppA

siteA.sppB 1 siteA sppB

siteB.sppB 2 siteB sppB

siteC.sppB 1 siteC sppB

siteD.sppB 7 siteD sppB

siteE.sppB 0 siteE sppB

siteA.sppC 10 siteA sppC

siteB.sppC 12 siteB sppC

siteC.sppC 1 siteC sppC

siteD.sppC 0 siteD sppC

siteE.sppC 0 siteE sppC

$sites

temp sites

siteA -0.24 siteA

siteB 0.40 siteB

siteC 2.12 siteC

siteD -0.72 siteD

siteE 5.95 siteE

$species

bodysize species

sppA 0.87 sppA

sppB 1.52 sppB

sppC 2.67 sppC

Page 106: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

> identical(dl, dlcast(dlmelt(dl)))TRUE

Page 107: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

> dlapply(dl, 2, mean)abundance:---------sppA sppB sppC0.8 2.2 4.6

Replicated along: || species ||

bodysize:--------sppA sppB sppC0.87 1.52 2.67Replicated along: || species ||

REPLICATION DIMENSIONS:species

3

Page 108: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

IntroductionMotivation from observational community ecologyIllustrating the basic issue

Previous work on the data management-analysis interface, bothinside and outside of community ecology

The ‘old’ schoolThe ‘middle’ schoolThe ‘new’ school

The R multitable package

Thermocline deepening experiment

Page 109: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

● ●

● ●

● ●

● ●

● ●

● ● ●

● ●

● ● ●

● ●

2.5

5.0

7.5

10.0

200 240 280week

The

rmoc

line.

Dep

th

basin

B1

B2

B3

Page 110: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

(0.18) armoured rot (0.18) nauplii (0.18) unprotected rot

(0.33) Bosmina (0.36) colonial rot (0.45) Cycl adults

(0.75) Cal cope (0.77) Holopedium (0.80) Daphnia l&d

(0.96) Daphnia cat (1.23) Cycl cope (1.31) Cal adults

0.2

0.3

0.4

0.5

0.6

0.03

0.06

0.09

0.12

0.15

0.05

0.10

0.15

0.00

0.05

0.10

0.15

0.00

0.05

0.10

0.15

0.06

0.10

0.14

0.00

0.01

0.02

0.03

0.04

0.00

0.02

0.04

0.06

0.000.010.020.030.040.05

0.00

0.01

0.02

0.03

0.04

0.00

0.02

0.04

0.04

0.08

0.12

0.16

4 6 8 10 4 6 8 10 4 6 8 10

4 6 8 10 4 6 8 10 4 6 8 10

4 6 8 10 4 6 8 10 4 6 8 10

4 6 8 10 4 6 8 10 4 6 8 10Thermocline.Depth

sqrt

(abu

ndan

ce)

Length

0.25

0.50

0.75

1.00

1.25

basin

B1

B2

B3

Page 111: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

(0.18) armoured rot (0.18) nauplii (0.18) unprotected rot

(0.33) Bosmina (0.36) colonial rot (0.45) Cycl adults

(0.75) Cal cope (0.77) Holopedium (0.80) Daphnia l&d

(0.96) Daphnia cat (1.23) Cycl cope (1.31) Cal adults

0.2

0.3

0.4

0.5

0.04

0.06

0.08

0.10

0.12

0.08

0.12

0.16

0.05

0.10

0.025

0.050

0.075

0.100

0.075

0.100

0.125

0.150

0.00

0.01

0.02

0.03

0.0100.0150.0200.0250.030

0.00

0.01

0.02

0.03

0.04

0.00

0.01

0.02

0.03

0.04

0.01

0.02

0.03

0.04

0.04

0.08

0.12

0.16

4 6 8 10 4 6 8 10 4 6 8 10

4 6 8 10 4 6 8 10 4 6 8 10

4 6 8 10 4 6 8 10 4 6 8 10

4 6 8 10 4 6 8 10 4 6 8 10Thermocline.Depth

sqrt

(abu

ndan

ce)

Length

0.25

0.50

0.75

1.00

1.25

basin

B1

B2

B3

Page 112: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

ConclusionI The fundamental distinction between variables and replicates

that unifies most statistical software also applies tomultiple-table next-generation data

I Therefore, we may not need all of the new statisticaltechniques being developed specifically for next-generationdata in community ecology

I Although my field is observational community ecology, I thinkthat many fields may benefit from more systematic and formaltreatment of the distinction between variables and replicates

I Current limitations:I multitable only deals with arrays (not phylogenies, distance

matrices, etc...)I although data lists can be coerced to data frames which can

be used in virtually any R analysis function, it may be moreefficient to pass data lists directly

Page 113: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

ConclusionI The fundamental distinction between variables and replicates

that unifies most statistical software also applies tomultiple-table next-generation data

I Therefore, we may not need all of the new statisticaltechniques being developed specifically for next-generationdata in community ecology

I Although my field is observational community ecology, I thinkthat many fields may benefit from more systematic and formaltreatment of the distinction between variables and replicates

I Current limitations:I multitable only deals with arrays (not phylogenies, distance

matrices, etc...)I although data lists can be coerced to data frames which can

be used in virtually any R analysis function, it may be moreefficient to pass data lists directly

Page 114: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

ConclusionI The fundamental distinction between variables and replicates

that unifies most statistical software also applies tomultiple-table next-generation data

I Therefore, we may not need all of the new statisticaltechniques being developed specifically for next-generationdata in community ecology

I Although my field is observational community ecology, I thinkthat many fields may benefit from more systematic and formaltreatment of the distinction between variables and replicates

I Current limitations:I multitable only deals with arrays (not phylogenies, distance

matrices, etc...)I although data lists can be coerced to data frames which can

be used in virtually any R analysis function, it may be moreefficient to pass data lists directly

Page 115: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

ConclusionI The fundamental distinction between variables and replicates

that unifies most statistical software also applies tomultiple-table next-generation data

I Therefore, we may not need all of the new statisticaltechniques being developed specifically for next-generationdata in community ecology

I Although my field is observational community ecology, I thinkthat many fields may benefit from more systematic and formaltreatment of the distinction between variables and replicates

I Current limitations:I multitable only deals with arrays (not phylogenies, distance

matrices, etc...)

I although data lists can be coerced to data frames which canbe used in virtually any R analysis function, it may be moreefficient to pass data lists directly

Page 116: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

ConclusionI The fundamental distinction between variables and replicates

that unifies most statistical software also applies tomultiple-table next-generation data

I Therefore, we may not need all of the new statisticaltechniques being developed specifically for next-generationdata in community ecology

I Although my field is observational community ecology, I thinkthat many fields may benefit from more systematic and formaltreatment of the distinction between variables and replicates

I Current limitations:I multitable only deals with arrays (not phylogenies, distance

matrices, etc...)I although data lists can be coerced to data frames which can

be used in virtually any R analysis function, it may be moreefficient to pass data lists directly

Page 117: Managing and analysing (next-generation) multivariate ecological … · 2012. 11. 28. · November 21, 2012 EEB seminar, McMaster University. Introduction Motivation from observational

Acknowledgements

I Ben Bolker (for being my new postdoc supervisor...and forextremely useful and encouraging discussions on this topiclong before that)

I Collaborators on the multitable project:I Pierre Legendre (previous postdoc supervisor)I Guillaume Guenard (Universite de Montreal)I Peter Solymos (University of Alberta)I Beatrix Beisner (Universite du Quebec a Montreal)

I Contributers to the free software I use

I Collectors of the free data I use

I Funding (NSERC, OGS, Pierre, Don, and the U of T)

I Laura Timms (McGill / ROM)