hierarchal inductive process modeling and analysis youri...
TRANSCRIPT
-
HIERARCHAL INDUCTIVE PROCESS MODELING AND ANALYSIS
Youri Noël Nelson
A Thesis Submitted to theUniversity of North Carolina Wilmington in Partial Fulfillment
of the Requirements for the Degree ofMaster of Science
Department of Mathematics and Statistics
University of North Carolina Wilmington
2011
Approved by
Advisory Committee
Michael Freeze Xin Lu
Wei Feng Stuart Borrett
Chair Co-Chair
Accepted by
Dean, Graduate School
-
TABLE OF CONTENTS
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
DEDICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . v
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
LIST OF SYMBOLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1 HIPM Description . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.1 Measure of Fit . . . . . . . . . . . . . . . . . . . . . 12
2.1.2 Entities specification and model library . . . . . . . . 13
2.2 Experiment Design . . . . . . . . . . . . . . . . . . . . . . . . 16
3 COMPUTATIONAL RESULTS . . . . . . . . . . . . . . . . . . . . . 20
3.1 Increase in number of time-series input . . . . . . . . . . . . . 24
3.2 Value of Information . . . . . . . . . . . . . . . . . . . . . . . 28
3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4 ANALYTICAL ANALYSIS . . . . . . . . . . . . . . . . . . . . . . . 33
4.1 Most recurrent models . . . . . . . . . . . . . . . . . . . . . . 33
4.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3 Model A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4 Model B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.5 Model C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.6 Effects of increasing the number of constraints . . . . . . . . . 63
5 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
ii
-
APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
A. Sample CIAO data - 1997 . . . . . . . . . . . . . . . . . . . . . . . 72
B. Full entity specification file . . . . . . . . . . . . . . . . . . . . . . 73
C. Full ross Sea generic model library . . . . . . . . . . . . . . . . . . 75
D. Models selected in both experiment 8 and 19 . . . . . . . . . . . . 87
E. Models selected in both experiment 8 and 21 . . . . . . . . . . . . 89
iii
-
ABSTRACT
Understanding the Phytoplankton dynamic in the Ross Sea Polynya may yield useful
knowledge in the search for solving the worlds rising carbon dioxide levels. Modeling
such dynamics is a very lengthy and tedious process that can be helped with the use
of computational tools like HIPM. This system relies on knowledge that is already
available, in the shape of time series data and process library, to construct and then
evaluates these models. In this research models were ranked by sum of squared
error, from lowest to highest. The lowest being the best fit model. Some of the
questions that arise from the use of HIPM are about the amount and value of the
time series provided to the software, from which we formulated two hypotheses.
Will having more time series better the output of the system ? Will time series
for different variables provide different quality of output? Through 31 experiments
and mathematical analysis, we began to answer these questions. The computational
result showed us that our first hypothesis does not always hold true, which is thought
to be because of the way the fit is measured. On the other hand the mathematical
analysis showed us many variations, over all the experiments, in the zooplankton
equation structure which can be indication that the process library needs to be better
defined and that the system needs to take into consideration not only Phaeocystis
antartica phytoplankton species but also diatoms. This thesis provides the start to
an answer for this hypothesis but further research is still needed.
iv
-
DEDICATION
This Thesis is dedicated to all my friends and family have supported me in this
incredible journey I started 5 years ago. More importantly I want to dedicate to our
Lord and Savior as I certainly would not be here today without his help, support
and comfort.
“I can do anything through God who strengthens me.”(Philippians 4:13)
I also want to dedicate this to my nephew Noah Nelson and my niece Sarah Nelson
for always putting a smile on my face during the tough times, their unconditional
love and making me want to persevere always. I love you beyond words.
Thank you, Christel & Douglas Nelson, Lara Nelson, Celio & Elise Nelson, Sven
Diebold, Andrew & Robin Nelson, Ed & Pat Nelson, Joann Nelson, Philip Varvaris,
Luke Brown, Taylor Jackson and Bud Edwards (for always being there at the right
place at the right time) and all my other friends and family members that are not
named here but are present in my heart and to whom I am so grateful for all the
words of encouragement and support throughout the years.
v
-
ACKNOWLEDGMENTS
I would like to thank Dr. Feng, Dr. Borrett, Dr. Simmons, Dr. Freeze and Dr.
Lu for all their help and support in this endeavor and process, as well as my friend
Brevin Rock for his advice in completing a Masters thesis.
vi
-
LIST OF TABLES
1 Example of entity definition and instantiation (P) . . . . . . . . . . . 15
2 Example of process definition (Growth) . . . . . . . . . . . . . . . . . 16
3 Data contained in CIAO set . . . . . . . . . . . . . . . . . . . . . . . 18
4 Cutoff Value Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5 Model A Parameter Values . . . . . . . . . . . . . . . . . . . . . . . . 34
6 Model B Parameter Values . . . . . . . . . . . . . . . . . . . . . . . . 36
7 Model C Parameter Values . . . . . . . . . . . . . . . . . . . . . . . . 57
vii
-
LIST OF FIGURES
1 Initial Conceptual Model . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Tree diagram representing the process library . . . . . . . . . . . . . 5
3 Map of the Ross Sea . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4 reMSE summary - Part 1 . . . . . . . . . . . . . . . . . . . . . . . . . 21
5 reMSE summary - Part 2 . . . . . . . . . . . . . . . . . . . . . . . . . 22
6 reMSE summary - Part 3 . . . . . . . . . . . . . . . . . . . . . . . . . 23
7 Good fit Models VS. Number of inputted time-series . . . . . . . . . 24
8 Mean Activation Values Graph . . . . . . . . . . . . . . . . . . . . . 29
viii
-
LIST OF SYMBOLS
P = Amount of Phytoplankton present in the system (mg Chla/m3),
D = Detritus concentration (mg C/m3),
F = Iron concentration (µM),
Z = Zooplankton concentration (mg C/m3),
N = Nitrate concentration (µM),
Eice(t) = Sea ice concentration
ETH2O(t) = Temperature of the water (◦C)
EPUR(t) = Photosynthetically usable radiation ( µmol photons m−2 s−1)
ETH2Omax= Maximum water temperature
ETH2Omin = Minimum water temperature
ai = Optimal parameters of the system selected by HIPM software
ix
-
1 INTRODUCTION
Whether you talk about biology, mathematics, physics, ecology, or any other type
of science, all have a common objective to explain and describe the world that sur-
rounds us. All of these fields build upon the collection of observations, to explain
recurring phenomena. To explain and depict some of these phenomena scientists
make use of models which can take a variety of forms including conceptual, formal,
physical and diagrammatic (Haefner, 2005).
Models are widely used in science and researchers continue to look for tools or
techniques that will enhance and optimize their ability to construct new models or
improve existing ones. Given a certain task the type of modeling technique will
differ, for instance in his book Haefner (2005) uses a Forrester Diagram to model a
hypothetical agro-ecosystem system, which is a qualitative model formulation. An-
other example would be in biology when describing predator-prey interaction, one
can use differential equations models like those formulated by Lokta and Volterra
(Berryman 1992). Models are useful for system study because they let researchers
conduct experiments and test theories on the system that would otherwise be un-
ethical or impossible to perform, as well as enabling them to predict the behavior of
varying components of an ecosystem.
Model construction is a difficult and lengthy endeavor. For a given system there
may be many different combinations of processes (i.e. grazing, decay, growth) that
could provide a plausible explanation for the behavior being studied. Thus, ex-
ploring and evaluating all these possibilities makes for a tedious task. In the past,
limitations in computational powers restricted scientists in their ability to investi-
gate more complex models, certain known or suspected processes would be left out
to simplify calculations in part because as computational powers increased so did our
capacity to evaluate more intricate models (Oreskes 2000). In addition, numerical
-
models of natural systems are non-unique, there is multiple ways to represent the
same dynamic. Creating computational tools that would quickly and automatically
evaluate multiple models seemed to be a promising idea to search through the exten-
sive model space. The success of machine learning and data mining in commercial
domains led scientists to investigate the field of automated modeling to serve that
particular purpose (Fayyad et al., 1996).
The act of gathering small pieces of information and combining it to prior knowl-
edge to formulate a complex overview of an object or process studied is called in-
duction. Induction prevents from searching the entire space of possible equations
by only piecing together the meaningful terms, for instance a predator-prey model
will need terms specifying growth and death (Todorovski et al. 2005). Inductive
modeling methods (i.e. LAGRAMGE, HIPM, ARIMA, FUSE) use the principles of
induction to construct models of the studied system. Methods used for commercial
application, such as Knowledge Discovery in Database (KDD) process, were insuffi-
cient for scientific purposes as they only described and did not explain the observed
system behavior (Langley et al. 2006). A simple example would be the modeling of
water consumption in a city, a water company could easily create a numerical model
based on previous years that would give a good estimate of the projected water
consumption over time but it may not explain why the consumption fluctuates the
way it does. In other words the commercial methods were able to produce models
that are useful when trying to make accurate predictions for a system but become
very limited when trying to explain which processes drive systems behaviors; these
methods did not explore the realm of all possible models. Thus, induction methods
had to be enhanced to automate the task of building and evaluating multiple models
(Dzeroski et al. 1995).
In this thesis, I used the hierarchal inductive process modeling technique, which
is encoded as computer algorithm called HIPM (Langley et al. 2006; Bridewell et
2
-
al. 2005; Dzeroski et al. 1995; Borrett et al. 2007). Inductive process modeling
methods such as HIPM (Bridewell et al. 2008; Borrett et al. 2007; Langley et al.
2006; Todorovski et al. 2005) searches through two spaces; the first space is made
up of mathematical formulations and alternative model structures, which consist of
entities, processes and the connection biding the two and the second space is made
up of parameter values (Borrett et al. 2007).The system takes as input a hierarchy
of generic processes - a process being a certain action on the system which is defined
by mean of fragment mathematical equations and the rule on how to combine these
fragments with the rest of the equations -, a set of entities - an entity being an ob-
ject regrouping the properties of the organism or nutrient by mean of variables and
parameters - and a set of observed time series of the entities variables (Todorovski
et al. 2005). HIPM will perform one of two search for for the model structure, a
heuristic search or exhaustive search. With the search option selected, HIPM creates
all the possible model structures with the given background knowledge and selects
the best set of parameters for each model structure. Finally, the system ranks the
models based on their sum of squared error (Todorovski et al. 2005).
This system allows for model representation of complex system dynamics, for
example in the study of photosynthesis regulation it generated a model that repro-
duced both the qualitative shape and the quantitative details of the time series data
while incorporating processes that made biological sense (Langley et al. 2006). In
our case we studied the phytoplankton dynamic in the aquatic ecosystem of the Ross
Sea.
In this thesis I used the HIPM tool combined with the appropriate process li-
brary to study of the phytoplankton dynamic in Ross Sea ecosystem. Here the term
process library is defined as the collection of processes (i.e. grazing, decay, growth)
and entities (i.e. phytoplankton, zooplankton, nitrate), with their relation to one
another. It is best represented by Figure 2.
3
-
Figure 1: This schematic represent the interaction between entities and exogenousvariables driving the model. Here, P, Z , D , NO3 and Fe are the state variables.PUR, T and Ice are the exogenous variables acting on the system and influencing thestate variables. The arrows represent the interaction of one variable onto another(Borrett, unpublished research).
Arrigo, Borrett, Bridewell and Langley used HIPM and the Ross Sea process li-
brary to create and search a space of over 1120 possible model structures to explain
the phytoplankton and nitrogen temporal dynamics in the Ross Sea ecosystem; all
models contained five state variables, phytoplankton, zooplankton, detritus, nitro-
gen and iron. Time series for both phytoplankton and nitrogen where available and
given to HIPM along with the process library. Their initial research found that 200
model structures were deemed of good fit, in this case good fit was defined by models
having a sum of squared error less than or equal to 0.2. From a computer scientist
standpoint, reducing the search space from 1120 models structure to 200 is a great
accomplishment; however for a biologist the solution is not specific enough and offers
few insights on the ecosystem dynamics. There is a need for ways to constraint the
search further, bringing down the number of good fit models, making the output
4
-
Figure 2: A tree diagram representing the process library constructed for the RossSea ecosystem problem. The interaction between processes and entities is defined inthe library as explained in Section 2.1.2 ( Borrett et al. 2007)
useful to biologists.
Superficially, HIPM appears related to equation discovery methods, which is a
subfield of machine learning (Langley, 1995; Mitchell, 1997) that investigates col-
lections of measurements and observations, using different computational methods,
in search of quantitative laws (Todorovski, 2003). For example the LAGRAMGE
system will take in as input background knowledge encoded in terms of a grammar
5
-
specifying the space of possible equations and a dependent variable and will output
the best equation for the variable, able to only perform the search for one variable
at the time (Dzeroski et al. 1993, Todrovski 2003). This is further related to the
methods used in Ljungs work (1993) on system identification, but is further removed
to that of inductive process modeling.
The main assumption behind system identification is that the model structure
is known and that the primary concern is finding the adequate parameter values;
equation discovery focuses on both the structure and parameter values (Todorovski
et al. 1998). Both of these approach produce descriptive models that summarize
and predict the data but they fail to search through the space of alternative expla-
nations, these methods do not take into account models with theoretical variables
or consider alternate processes to explain certain dynamics (Bridewell et al. 2005).
The Southern Ocean covers an area equivalent to about 10% of the global ocean
and is a key element of the global ocean system as it links all major ocean basins and
facilitates the global distribution of its deep water; it is considered to play an impor-
tant part in the global carbon (C) cycle (Arrigo et al. 2003). The Ross Sea polynya
(area of open water surrounded by sea ice) is one of the most productive ecosystems
in the Southern Ocean as it experiences some of the largest phytoplankton blooms
in the region (Arrigo et al 1994, 1998, 2000, 2003). Indeed, phytoplankton produc-
tivity (photosynthesis) is important to the carbon cycle as it removes carbon dioxide
(CO2) from surface water during photosynthesis, part of which will then be exported
to deep ocean water. What makes the Ross Sea polynya so interesting for ecologist
compared to other locations such as Terra Nova Bay, is the type of phytoplankton
dominating the ecosystem. In the Ross Sea polynya , Phaeocystis antartica domi-
nates as opposed to diatoms (species such as Fragilariopsis spp.) in Terra Nova Bay.
Phaeocystis antartica are thought to resist grazing more than other phytoplankton
species, which could imply that more carbon would be taken from shallow water into
6
-
the depth as the un-eaten phytoplankton full of CO2 sinks to the bottom (Tagliabue
and Arrigo 2003). Deep ocean water has a larger residence time than shallow water,
meaning that carbon trapped in deep ocean water will be effectively removed from
atmospheric circulation for a much longer time than the carbon contained in surface
water.
Figure 3: Map of the southwestern Ross Sea showing the Ross Sea ploynya, locatednorth of the Ross Sea Ice Shelf, and the Terra Nova Bay polynya, located on thewestern continental shelf (Arrigo et al. 2003)
Thus, there is an incentive to understand the ecological processes that control the
7
-
phytoplankton productivity and community composition -which species dominates-
in the Ross Sea. Fluctuations in phytoplankton population could potentially have
effects on the CO2 levels in the atmosphere (Carlson et al. 1998) and if we can
figure out why Phaeocystis antartica is predominant it would be useful informa-
tion to scientist as they entertain the idea of altering phytoplankton populations
around the world to create carbon sinks, providing a temporary solution to our CO2
problem. It is all these elements that initiated the search for the best process ex-
planation of the phytoplankton dynamics in the Ross Sea, by determining which
processes act upon the system and which entities are most important, scientist will
accumulate knowledge that may prove valuable in the fight against rising CO2 levels.
As mentioned the tool that I have chosen for model search relies on measure-
ments and observations of one or more variables of a system to make inferences on
the remaining variables for which no data is available and the processes at works in
the system. In Borrett’s study, the only state variables for which he had measure-
ments and observations are Phytoplankton and Nitrate. Ultimately the goal is to
select model structures that would be good approximations of the natural system
and give good insights on the processes at work in the system. However, here I was
faced with an under constrained optimization problem, there was no data available
for 3 of the state variables. Indeed, one of the big challenges of using HIPM for this
particular ecosystem was that the data that is used to conduct the search is very
expensive to collect, and it becomes especially complicated when it comes to iron
(Fe) as it is difficult to measure. From this last statement arise two questions: does
knowing data for more than one state variable narrow down the number of possible
good fit models in a significant manner? Will knowledge about certain variable have
better optimization power than for others? For example if we could only afford to
collect data for one of the five variables in the system, would phytoplankton give us
8
-
better model output (fewer good fit models) in HIPM than zooplankton or would it
be detritus ?
This is an important question because as scientist are trying to advance their knowl-
edge on the Ross Sea; there is a need to make educated decisions on what information
to collect in an effort to optimize the use of resources.
This thesis is structured in five parts, firstly I described the method used to
gather the data that was used in my analysis, and this includes the HIPM software
as well as an overview of the data sets. I then went into the quantitative analysis,
by looking strictly at the results generated from the HIPM software and discussing
what it tells us on an ecological standpoint. In section 4, I entered the analytical
part of our analysis, picking and studying some of the best-fit models selected during
the quantitative analysis. I then discussed these analytical results and in the next
section tied it back to the biology in an effort to link both qualitative and quantita-
tive research. Through this analysis we saw how we can help HIPMs model selection
method as well as assist scientists in finding a model that most accurately explain
the processes at works in the ecosystem observed.
9
-
2 METHOD
The method employed in this paper involves constructing process models from con-
tinuous data. To assist in this task we used a piece of software named HIPM. It
is the output and model selection efficiency of this computer software that we are
investigating. To better understand the task at hand it is important to define what
HIPM does, as well as the steps we are taking to test its efficiency.
2.1 HIPM Description
Ecologists rely on system modeling quite heavily to build ecological theory, guide
environmental assessment and management (Borrett et al. 2007). Typically scien-
tists will build and study a couple of models, basing the model structure on previous
research or by making a judgement call on which entities and processes should or
not be included. One of the aspirations and problems of modeling natural systems is
to capture the essence of the system necessary for the model purpose by figuring out
what can be left out; in that regards which entities and processes should be included,
and what are the best mathematical formulation and parameter values for a given
structure become an essential part of this search. Choosing from among the possible
model structures presents an intricate and time consuming challenge for ecologists
who want to navigate this space (Borrett et al. 2007). In searching through this
space of possible models, we are guided by the claim made by Langley et al. (1987),
which we support, that we must look for models that will fit real-life observations. In
summary,we are faced with the problem of constructing models anchored in domain
theory, conducting a time consuming search and linking the models to empirical
data (Borrett et al. 2007). This is where the HIPM software comes into play to
remedy these issues, HIPM stands for Hierarchal Inductive Process Modeling. This
scientific approach (Lantley et al. 2005) assumes the following:
10
-
• Given: Time-series data for continuous variables.
• Given: Background knowledge about the entities of the system; in other words
constraints on variables and other parameters driving these entities.
• Given: Background knowledge on the type of processes that may be involved
in driving the ecosystem as well as the constraints that may exist for the said
processes.
Then the task for the software is to perform a search through the structure and
parameter space defined by the process-entity library to find the models that best
fit the data. HIPM operates in four phases.
1. In an exhaustive search, it first finds all the possible instantiations of the
generic processes for all variables. This means that the system will find all the
possible combinations of processes that can affect a given variable (We will
give an example in Section 2.1.2 ). For our purposes we used the exhaustive
search option programmed into the software but there is also a heuristic search
option available.
2. The system then walks through each model and puts them together. In other
words, it puts together, into a generic model, one instantiation of generic
processes for each variable present in the system. It uses the constraints given
by the users to determine which instantiations can be linked together into a
generic model; the program goes through an exhaustive search to find all the
possible models. In our study it makes 1120 model structures, due mainly to
the large amount of different grazing processes that are potentially present in
the ecosystem.
3. It searches for the parameter values for each model using the constraints de-
fined by the users. To infer these parameters, the system picks a random
11
-
set of values that respect the constraints and, using the Levenberg-Marquardt
gradient descent method, finds a local optimum. To avoid entrapment in lo-
cal minima, the system will restart the parameter estimation from multiple
random points retaining only the parameters that produce the lowest error.
In our experiment we set the number of restarts to 128. This technique has
been found to produce reasonable matches to time series in multiple systems
(Langley et al. 2007).
4. Evaluates the performances of the produced model structures (predicted val-
ues) against the data series (observed values) by calculating the root mean
square error (reMSE); models with the lowest reMSE will be considered best
fit models.
2.1.1 Measure of Fit
As mentioned above, HIPM evaluates and selects the best model structure and set
of parameters according to a fitness measure. The system currently uses the sum
of square error (SSE) to evaluate fitness (Bridewell et al. 2007), which is defined as
follow:
n∑i=1
SSE(xi, xobsi ) =
n∑i=1
m∑k=1
(xi,k − xobsi,k )2
where xi, . . . , xn are the variables that are being fitted with m observed values for
each. To take into account the modeling of variables of varying scale, the system
uses a relative mean squared error that we define in the following way:
reMSE =
∑ni=1
SSE(xi,xobsi )
s2(xobsi )
nm
Here s2(xobsi ) is the sample variance of the observation for xi. Across this paper
12
-
we will refer to the relative mean squared error as reMSE. The biggest asset to this
rescaling is the ability to compare values across data sets. Typically, an ReMSE of
1.0 or above signifies that the model performs poorly and inversely, the lower the
reMSE, the better the fit.
2.1.2 Entities specification and model library
Each entity of a system is defined by a combination of variables and parameters
which makes them actors but also receivers of action in the model. A distinction is
to be made between generic entity and instantiated entity. Indeed, a formal generic
entity has a name and a set of properties which can include both variables and
parameters. In a given model the parameters of the instantiated entity will not
change whereas the variables do. Every variable in the entity has a name and a
rule that determines how multiple processes and their subprocesses are combined
(e.g. summed, minimum, product, etc...). For the parameters there is a name
and a range that constrains their possible values. On the other hand, instantiated
entities have their variables associated with either time-series or they are given initial
values and the parameters have been assigned real values. A field is also included
to indicate the parent generic entity (Borrett et al. 2007). One given generic entity
can be instantiated multiple times, the generic entity can be thought of as a blue
print for the instantiated entities. For example in our system we defined the entity
phytoplankton as presented in Table 1. Here our entity’s name is “P”; it contains the
variables “conc”, “growth rate” and “growth lim” with the rules determining how
they will be aggregated with other processes; the next part of the entity definition is
the list of parameters that are of concern for this entity such as “max growth’ with
possible values in the (0,600) range. Following the definition of a generic entity in
Table 1 is an instantiated entity, “pe” which refers to the parent generic entity. The
variables are then either given the name of a time-series to which the model will be
13
-
fitted such as for “conc”, with the “PHA c” referring to the phytoplankton column
of the CIAO data set, or an initial value such as 0 for “growth rate”, indicating
that this particular state variable won’t be fitted to a time-series. The mention
“system” as opposed to “exogenous” simply states that this variable is dependent
on the system as opposed to being independent like variables such as solar radiation
or water temperature. The full instantiated entity library can be found in Appendix
B and the generic entity library in Appendix C.
For HIPM to be fully functional there needs to be a library of processes. Processes
are the physical, chemical, or biological actions that drive change in dynamic models.
Just as we made a distinction between generic entity and instantiated entity, we
make a distinction between generic processes and instantiated processes. All generic
processes are defined by a name by which entities can tie into the process, the
subprocesses that are tied to that one process and one or multiple equations. The
generic process can also include a set of Bolean conditions that determine if the
process is active, making the process dynamic by turning the process on and off
depending on whether the conditions are satisfied (Borrett et al. 2007). For instance
we could set the photosynthetic process to only occur if a set environment light
variable is greater than zero. We have an example of generic process in Table 2, it is
named “growth”, and any of the following entities “P, N, D, E”can take a role in the
process, then there is a list of the subprocesses, with the entities that can take a role
in the subprocess, that are linked to this process and finally the equation that defined
this process; this equation calls onto the “conc” and “growth rate’ variables that all
entities must have. The instantiated process will take on a specific name and will be
bound to a specific instantiated entity, one of P, N, D or E. The instantiated entity
will take it’s role in the equation of the instantiated process. All the instantiated
processes will be aggregated according to the rule defined in the generic entity. It
is this organization in terms of entity and process that drives inductive process
14
-
modeling. It makes for an easier construction of systems of equations by building in
fragments.
Table 1: In this table we are first giving an example of generic entity definition withits variables and parameters followed by an example of an instantiated entity, morespecifically Phytoplankton - P, to which the variable “conc” is given a time seriesand the other variables initial values.
pe = lib.add_generic_entity("P",
{ "conc":"sum",
"growth_rate":"prod",
"growth_lim":"min"},
{ "max_growth": (0.4,0.8),
"exude_rate": (0.001,0.2),
"death_rate": (0.02,0.04),
"Ek_max":(1,100),
"sinking_rate":(0.0001,0.25),
"biomin":(0.02,0.04),
"PhotoInhib":(200,1500),});
p1 = entity_instance (pe, "phyto",
{ "conc": ("system", "PHA_c", (0,600)),
"growth_rate": ("system", 0, (0,1)),
"growth_lim": ("system", 1, (0,1))},
{ "max_growth":0.59,
"exude_rate":0.19,
"death_rate":0.025,
"Ek_max":30,
"biomin":0.025,
"PhotoInhib":200 } );
15
-
Table 2: Defining a process - Growth
lib.add_generic_process(
"growth", "",
[("P",[pe],1,1), ("N",[no3,fe],1,100),
("D",[de],1,1), ("E",[ee],1,1)],
[("limited_growth", ["P","N","E"], 0),
("exudation",["P"],1),
("nutrient_uptake",["P","N"],0)],
{},
{},
{"P.conc": "P.growth_rate * P.conc"} );
To sum it up, HIPM’s power resides in its knowledge of the modeled domain as
well as its ability to estimate parameters (Bridewell et al. 2007).
2.2 Experiment Design
Having now established how HIPM works let us consider the problem at hand.
Though in theory HIPM is an extremely powerful tool which permits a search
through a wide structure and parameter space, previous research has demonstrated
that a more thorough investigation of HIPM’s output is necessary to evaluate its
potential and usefulness to biologist. In our example of the Ross Sea ecosystem
with the process-entity library set up as described, the search space represents 1120
possible models; each model can take on a wide variety of parameters set depending
on the constraints given to the software. The Phytoplankton dynamic models of the
Ross Sea have five variables: Phytoplankton (P ), Zooplankton (Z), Detritus (D),
Nitrate (N) and Iron (F ). In previous research, real-life time series about Phyto-
16
-
plankton and Nitrate were available to us for this particular ecosystem, thus the
data was fed to HIPM. By doing so, HIPM came out with about 200 possible mod-
els that have a reMSE of less or equal to 0.2 which from a computer science stand
point is a good improvement. Indeed, we reduce the search space from 1120 possible
models to 200 models. However, for a biologist that is still a quite large amount of
models approximating the ecosystem studied; going through and testing out every
one of these 200 models would be extremely time-consuming. Therefore, it is clear
that we somehow need to lower this number of possible models to a point deemed
reasonable/useful to biologist. Logically we assume that increasing the number of
constraints (i.e. add real-life time series of a variable for which we had no previous
empirical data) would help model discrimination in HIPM. But this would imply
that the scientist would have to go into the field and collect time series for one of
the variables in the system; that process being very expensive, can HIPM be used
to make an informed decision about which variable would yield the most discrimi-
natory powers, if there is at all a difference between variables? This is what we are
investigating and in the light of these elements we have formulated two hypotheses:
• Hypothesis 1: Increasing the number of constraints: increasing the number of
time-series for which we have data in HIPM for model selection will induce
better fits. In other words, the increase in number of known time-series of
system variables leads to better model discrimination and therefore better
model selection.
• Hypothesis 2: Variables yield different values of information: some variables
will have more discriminatory power and restrict the best fit models more than
others.
To test our two hypotheses it was imperative to employ a full data set including
time-series for all variables of the system in order to compare the results depending
17
-
upon whether certain time-series are included or not as constraint for HIPM. Since
no full data set with real-life data was available, we turned to a simulated data set
called the ”Couple Ice and Ocean model” datasets otherwise referred to as CIAO
datasets. This dataset is generated from a three dimensional ecosystem model that
spans the entire water column and multiple stations across the Ross Sea. However,
for our purposes only a portion of this data, the top 5 meters at the Ross Sea Polynya
station 01, is used. The type of information contained in the CIAO dataset is stated
in Table 3.
Table 3: Information included in the CIAO data set.NOTE: A sample of the CIAO 1997 data can be found as Appendix A.
Symbol Units DescriptionJDAY Day Day of the measurementsTEMP ◦C Temperature of the waterDPML m Mixed layer depthAI Sea ice concentrationNITR µM Nitrate concentrationPHOS mg Chla/m3 Phosphate concentration,SILC µM Silicate concentrationIRON nM or µM Iron concentrationPARL µmol photons m−2 s−1 Solar radiation used by organism in photosynthesis.PHA mg Chla/m3 Phaeo chlorophyll concentrationDIAT mg Chla/m3 Diatom chlorophyll concentrationZOO mg C/m3 Zooplankton concentrationDET mg C/m3 Detritus concentrationPURL µmol photons m−2 s−1 Photosynthetically usable radiation
In addition to a full data set, it is necessary to have a working library, that, as
stated in Section 2.1.2, defined both entities and processes for HIPM. The process-
entity library that we used is available in Appendix B and C, it was previously
put together by Bridewell, Borrett, Langley and Arrigo. All the processes and
subprocesses in which the instantiated entities can take a role in our study are
represented in Figure 2.
18
-
Having the background knowledge necessary for HIPM to conduct successful runs
we designed thirty one experiments; each experiment represents a possible combi-
nation of time-series constraints that could potentially be entered into the software.
For example, if we had time-series for Iron and Nitrate and fed the information into
HIPM they would act as additional constraints in the model selection process. To
be selected, models have to exhibit behavior close to the given time-series. All the
experiments are summarized in Table 4 .
19
-
3 COMPUTATIONAL RESULTS
The main topic in this paper, is to determine how to optimize the usage we make of
HIPM to assist scientists in there decision making process when it comes to selecting
a model that most accurately represent an ecosystem. The first need is to narrow
down the number of possible good fit models capable of describing the system. We
did this feeding additional time series about one of the state variable into HIPM,
thus providing more constraints; so did this assumption hold true? Secondly, if
adding more constraints to HIPM does reduce that number, are observations for a
specific state variable holding more reducing power than the other state variables?
The data collected helped us answer these questions as well as discuss the efficiency
of HIPM in its current state.
There were thirty-one different experiments performed, each returning a measure of
fit value (reMSE) every one of the 1120 models tested in every experiment. This
makes for a large amount of data to analyze. To get a better idea of what this data
looks like, the measures of fit values of models that had an reMSE between 0 and
2 were graphed, ranking and graphing them from lowest to highest (see Figure 4, 5
and 6) value. We did not look at reMSE higher than 2.0 since, as stated previously,
models with reMSE higher than 1.0 are typically classified as poorly performing
models as it indicates a very large difference between observed and expected values.
We estimated that the (0,2) range would be sufficient for our purpose, as it would
encompass most models. Based on these initial results we decided to pick an reMSE
of 0.5 as our good fit model cutoff; any model under that cutoff is considered of good
fit. This choice of cutoff was made because the multiple graphs seemed to exhibit a
turning point or slight step pattern around this reMSE value, such as portrayed in
the graph for experiments 1, 5 or 20.
20
-
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
0.0
0.5
1.0
1.5
2.0 1[P]
197 Good Fit Models●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
2[Z]
101 Good Fit Models●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
3[D]
366 Good Fit Models
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●
0.0
0.5
1.0
1.5
2.0 4[N]
439 Good Fit Models●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●
5[F]
509 Good Fit Models●●●●
●●●●●●●●●●●●●●●
●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●
●●●●●●●
●●●●●●●●●●●●●●●●●●
●●
●●●●●●
●●
●
●●
6[P,Z]
5 Good Fit Models
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●
●●
0.0
0.5
1.0
1.5
2.0 7[P,D]
61 Good Fit Models●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●
●
8[P,N]
25 Good Fit Models ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●
●●●
●
●
9[P,F]
79 Good Fit Models
●●●●●
●●●
●●●
●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●
0 200 400 600 800 1200
0.0
0.5
1.0
1.5
2.0 10[Z,D]
8 Good Fit Models
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●
●
●
●●
●●
●●
0 200 400 600 800 1000
11[Z,N]
1 Good Fit Models
●●●●●●●●●●
●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●
●
●●
●
●●●
●
●●
●
●●●
0 200 400 600 800 1000
12[Z,F]
0 Good Fit Models
Figure 4: reMSE value are ranked from lowest to highest. The reMSE = 0.5 signifiesthe good fit model cutoff, any models under that value are considered good fit models.The experimental setup for each run as well as the ID number is indicated in thetop right corner.
21
-
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
0.0
0.5
1.0
1.5
2.0 13[D,N]
67 Good Fit Models ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●