d1t3 enm workflows updated

ECOLOGICAL NICHE MODELING METHODS UPDATETown Peterson, University of Kansas

It Is A Bit Too Easy …

• Very easy access to lots of occurrence data

• Very easy access to rich geospatial data

• Easy-to-use modeling tools• Lots of literature setting out

the examples

Ecological Niche Modeling

1. Accumulate Input Data2. Integrate Occurrence and Environmental Data3. Model Calibration4. Model Evaluation5. Summary and Interpretation

Accumulate Input Data

Collate primary biodiversity data documenting occurrences

Process environmental layers to be maximally relevant to distributional ecology of species in question

Collate GIS database of relevant data layers

Assess spatial precision of occurrence data; adjust inclusion of data accordingly

Data subsetting for model evaluation

Occurrence andenvironmental data

Assess spatial autocorrelation

Occurrence Data in Niche Modeling

• Goal is to represent the full diversity of situations under which a particular species maintains populations

• Spatial biases (i.e., non-random or non-uniform distribution within G) is not damning

• Biases within E are catastrophic, and will translate directly into biases in any niche estimate

• More is usually better, but not always…

speciesLink Network

Uncertainty in Direction and Distance

Georeferencing should …

• Represent the place at which the species was found

• Represent the certainty and uncertainty with which that place is characterized

• Summarize the methods used to establish that place

• Preserve all of the original information for possible reinterpretation

Internal Consistency Testing

Data Cleaning• Attempt to detect meaningfully erroneous records, so

that they can be treated with caution in analysis• Use internal consistency to detect initial problems– Species names consistent?– Terrestrial species on land, marine species in the ocean?– Latlong matches country, state, district, etc.

• Use external consistency to go deeper– Occurrence data match known distribution spatially?– Occurrence data match known distribution environmentally?

• If precision data are available, filter to retain only records that are precise enough for the study

• Iterative process with important consequences

Data Subsetting

• Must respond to the question at hand … why are you doing the study?

• Ideally completely independent data streams• Failing that, can be – Macrospatial– Microspatial (but see spatial autocorrelation)– Random

• Will return to this point later…

Generalities: Environmental Data

• Raster format: i.e., information exists across entire region of interest

• Relevant information as regards the distributional potential of the species of interest

• More dimensions = better (generally), BUT – collinearity is bad– too many dimensions is bad

Major Sources

• Climate data – long time span, but low temporal resolution

• Remote-sensing data – high temporal resolution, diverse products, short time span

• Topographic data – high temporal resolution, uncertain connection to species’ distributional ecology

• Soils data – uneven global coverage, categorical data

• Others

Spatial Autocorrelation

Two Major Implications• Non-independence in model evaluation– Available data are often split into data sets for calibration and

evaluation– Data points that are not independent of one another may end

up in different data sets, thereby compromising the robustness of the test

• Inflation of sample sizes– Because individual data points may be non-independent of one

another, sample sizes may appear larger than they actually are– This inflation may create opportunity for Type 1 errors in model

evaluation and model comparisons

Process for Maximum Relevancy

Integrate Occurrence and Environmental Data

Assess BAM scenario for species in question; avoid M-limited situations

Saupe et al. 2012. Variation in niche and distribution model performance: The need for a priori assessment of key causal factors. Ecological Modelling, 237–238, 11-22.

Estimate M and S as area of analysis in study

Barve et al. 2011. The crucial role of the accessible area in ecological niche modeling and species distribution modeling. Ecological Modelling, 222, 1810-1819.

Reduce dimensionality (PCA or correlation analysis)

Occurrence andenvironmental data

Occurrence andenvironmental data ready for analysis

Assess BAM Scenario

BAM I: Eltonian Noise Hypothesis

A

M

B A

M

BAM II

ClassicBAM

Hutchinson’sDream

Wallace’sDream

All OK

Project onto Geography

Effect of BAM Scenarios

BAM Conclusions

• Some situations are not amenable to fitting ecological niche models that will have predictive power

• Models tend much more to good fitting of the potential distribution, rather than the actual distribution

• Must ponder carefully the BAM configuration in a particular study situation to avoid configurations that will not yield usable models

M and S as Study Area

Test Arena: The Lawrence Species

M and Model Training

Model Evaluation

M and Model Comparison

Model Comparison

M• When the species has no history in an area:– Use a radius related to dispersal distances

• When history is short (i.e., environment constant):– Use a radius representing compounding of

dispersal distances• When history is long (i.e., environmental

change is a factor) – Seek ways of assessing areas that the species’

distribution through time has covered…

Icterus cucullatus

Sampling

Reduce Dimensionality

Model Calibration

Estimate ecological niche (various algorithms)

Model calibration, adjusting parameters to maximize quality

Model thresholding

Peterson et al. 2007. Transferability and model evaluation in ecological niche modeling: A comparison of GARP and Maxent. Ecography, 30, 550-560.

Occurrence andenvironmental data ready for analysis

“No Silver Bullet” paper to appear

Warren, D. L. and S. N. Seifert. 2011. Ecological niche modeling in Maxent: The importance of model complexity and the performance of model selection criteria. Ecological Applications 21:335-342.

Preliminary models

Estimate Ecological Niche

No Silver Bullets in ENM

• Single algorithms may perform ‘best’ on average• The best algorithm in any given situation, however,

may be other than the ‘best’• NSB thinking suggests that we should not use a

single approach• Use a suite of approaches (e.g., as implemented in

OM, BIOMOD, BIOENSEMBLES, etc.), challenge to predict, choose best for that situation

• Maxent is good, but it is not the only algorithm …

Model Thresholding

“Presence”

Thresholding

• Use an approach that prioritizes omission error over commission error, in view of the greater reliability of presence data

• Minimum training presence thresholding seeks the highest suitability value that includes 100% of the calibration data

• Suggest (strongly) using a parallel approach that seeks that highest suitability value that includes (100-E)% of the calibration data

Model Optimization and Parameter Choice

Model Evaluation

Project niche model to geographic space

Model evaluation

Peterson et al. 2008. Rethinking receiver operating characteristic analysis applications in ecological niche modelling. Ecological Modelling, 213, 63-72.

Preliminarymodels

Reset data subsets based on evaluation results

Corroborated models ready for projection to

geographic times/regions of

interest

If predicted suitable area covers 15% of the testing area, then 15% of evaluation points are expected to fall in the predicted suitable area by chance.

• p = proportion of area predicted suitable

• s = number of successes• n = number of evaluation

points

Cumulative binomial distribution calculates the probability of obtaining s successes out of n trials in a situation in which p proportion of the testing area is predicted present. If this probability is below 0.05, we interpret the situation as indicating that the model’s predictions are significantly better than random.

Threshold-dependent Approach

Threshold-independent Approaches

http://shiny.conabio.gob.mx:3838/nichetoolb2/

Significance vs Performance

• Predictions that are significantly better than random is important, and is a sine qua non for model interpretation

• BUT, it is also important to assure that the model performs sufficiently well for the intended uses of the output

• Performance measures include omission rate, correct classification rate, etc.

Summary and Interpretation

Evaluation of model transfer results

Transfer to other situations (time and space)

Assess extrapolation (MESS and MOP)Owens, H. L., L. P. Campbell, L. Dornak, E. E. Saupe, N. Barve, J. Soberón, K. Ingenloff, A. Lira-Noriega, C. M. Hensz, C. E. Myers, and A. T. Peterson. 2013. Constraints on interpretation of ecological niche models by limited environmental ranges on calibration areas. Ecological Modelling 263:10-18.

Refine estimate of current distribution via land use, etc.

Compare present and “other” to assess effects of change

Models calibrated and evaluated, and transferred to present and “other” situations

MESS and MOP• Both have the intention of detecting extrapolative

situations• MESS is implemented within Maxent• MESS compares the area in question to the

centroid of the calibration cloud• MOP compares the area in question to the nearest

part of the calibration cloud• Agree on ‘out of range’ conditions• MOP better characterizes similarities between

calibration and transfer regions, and thus is more optimistic as regards in-range extrapolation

Ecological Niche Modeling

1. Accumulate Input Data2. Integrate Occurrence and Environmental Data3. Model Calibration4. Model Evaluation5. Summary and Interpretation

[email protected]

d1t3 enm workflows updated

Science