hierarchal inductive process modeling and analysis youri...

HIERARCHAL INDUCTIVE PROCESS MODELING AND ANALYSIS

Youri Noël Nelson

A Thesis Submitted to theUniversity of North Carolina Wilmington in Partial Fulfillment

of the Requirements for the Degree ofMaster of Science

Department of Mathematics and Statistics

University of North Carolina Wilmington

2011

Approved by

Advisory Committee

Michael Freeze Xin Lu

Wei Feng Stuart Borrett

Chair Co-Chair

Accepted by

Dean, Graduate School

TABLE OF CONTENTS

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

DEDICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . v

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

LIST OF SYMBOLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1 HIPM Description . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1.1 Measure of Fit . . . . . . . . . . . . . . . . . . . . . 12

2.1.2 Entities specification and model library . . . . . . . . 13

2.2 Experiment Design . . . . . . . . . . . . . . . . . . . . . . . . 16

3 COMPUTATIONAL RESULTS . . . . . . . . . . . . . . . . . . . . . 20

3.1 Increase in number of time-series input . . . . . . . . . . . . . 24

3.2 Value of Information . . . . . . . . . . . . . . . . . . . . . . . 28

3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4 ANALYTICAL ANALYSIS . . . . . . . . . . . . . . . . . . . . . . . 33

4.1 Most recurrent models . . . . . . . . . . . . . . . . . . . . . . 33

4.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.3 Model A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.4 Model B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.5 Model C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.6 Effects of increasing the number of constraints . . . . . . . . . 63

5 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

ii

APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

A. Sample CIAO data - 1997 . . . . . . . . . . . . . . . . . . . . . . . 72

B. Full entity specification file . . . . . . . . . . . . . . . . . . . . . . 73

C. Full ross Sea generic model library . . . . . . . . . . . . . . . . . . 75

D. Models selected in both experiment 8 and 19 . . . . . . . . . . . . 87

E. Models selected in both experiment 8 and 21 . . . . . . . . . . . . 89

iii

ABSTRACT

Understanding the Phytoplankton dynamic in the Ross Sea Polynya may yield useful

knowledge in the search for solving the worlds rising carbon dioxide levels. Modeling

such dynamics is a very lengthy and tedious process that can be helped with the use

of computational tools like HIPM. This system relies on knowledge that is already

available, in the shape of time series data and process library, to construct and then

evaluates these models. In this research models were ranked by sum of squared

error, from lowest to highest. The lowest being the best fit model. Some of the

questions that arise from the use of HIPM are about the amount and value of the

time series provided to the software, from which we formulated two hypotheses.

Will having more time series better the output of the system ? Will time series

for different variables provide different quality of output? Through 31 experiments

and mathematical analysis, we began to answer these questions. The computational

result showed us that our first hypothesis does not always hold true, which is thought

to be because of the way the fit is measured. On the other hand the mathematical

analysis showed us many variations, over all the experiments, in the zooplankton

equation structure which can be indication that the process library needs to be better

defined and that the system needs to take into consideration not only Phaeocystis

antartica phytoplankton species but also diatoms. This thesis provides the start to

an answer for this hypothesis but further research is still needed.

iv

DEDICATION

This Thesis is dedicated to all my friends and family have supported me in this

incredible journey I started 5 years ago. More importantly I want to dedicate to our

Lord and Savior as I certainly would not be here today without his help, support

and comfort.

“I can do anything through God who strengthens me.”(Philippians 4:13)

I also want to dedicate this to my nephew Noah Nelson and my niece Sarah Nelson

for always putting a smile on my face during the tough times, their unconditional

love and making me want to persevere always. I love you beyond words.

Thank you, Christel & Douglas Nelson, Lara Nelson, Celio & Elise Nelson, Sven

Diebold, Andrew & Robin Nelson, Ed & Pat Nelson, Joann Nelson, Philip Varvaris,

Luke Brown, Taylor Jackson and Bud Edwards (for always being there at the right

place at the right time) and all my other friends and family members that are not

named here but are present in my heart and to whom I am so grateful for all the

words of encouragement and support throughout the years.

v

ACKNOWLEDGMENTS

I would like to thank Dr. Feng, Dr. Borrett, Dr. Simmons, Dr. Freeze and Dr.

Lu for all their help and support in this endeavor and process, as well as my friend

Brevin Rock for his advice in completing a Masters thesis.

vi

LIST OF TABLES

1 Example of entity definition and instantiation (P) . . . . . . . . . . . 15

2 Example of process definition (Growth) . . . . . . . . . . . . . . . . . 16

3 Data contained in CIAO set . . . . . . . . . . . . . . . . . . . . . . . 18

4 Cutoff Value Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5 Model A Parameter Values . . . . . . . . . . . . . . . . . . . . . . . . 34

6 Model B Parameter Values . . . . . . . . . . . . . . . . . . . . . . . . 36

7 Model C Parameter Values . . . . . . . . . . . . . . . . . . . . . . . . 57

vii

LIST OF FIGURES

1 Initial Conceptual Model . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Tree diagram representing the process library . . . . . . . . . . . . . 5

3 Map of the Ross Sea . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4 reMSE summary - Part 1 . . . . . . . . . . . . . . . . . . . . . . . . . 21



7 Good fit Models VS. Number of inputted time-series . . . . . . . . . 24

8 Mean Activation Values Graph . . . . . . . . . . . . . . . . . . . . . 29

viii

LIST OF SYMBOLS

P = Amount of Phytoplankton present in the system (mg Chla/m3),

D = Detritus concentration (mg C/m3),

F = Iron concentration (µM),

Z = Zooplankton concentration (mg C/m3),

N = Nitrate concentration (µM),

Eice(t) = Sea ice concentration

ETH2O(t) = Temperature of the water (◦C)

EPUR(t) = Photosynthetically usable radiation ( µmol photons m−2 s−1)

ETH2Omax= Maximum water temperature

ETH2Omin = Minimum water temperature

ai = Optimal parameters of the system selected by HIPM software

ix

1 INTRODUCTION

Whether you talk about biology, mathematics, physics, ecology, or any other type

of science, all have a common objective to explain and describe the world that sur-

rounds us. All of these fields build upon the collection of observations, to explain

recurring phenomena. To explain and depict some of these phenomena scientists

make use of models which can take a variety of forms including conceptual, formal,

physical and diagrammatic (Haefner, 2005).

Models are widely used in science and researchers continue to look for tools or

techniques that will enhance and optimize their ability to construct new models or

improve existing ones. Given a certain task the type of modeling technique will

differ, for instance in his book Haefner (2005) uses a Forrester Diagram to model a

hypothetical agro-ecosystem system, which is a qualitative model formulation. An-

other example would be in biology when describing predator-prey interaction, one

can use differential equations models like those formulated by Lokta and Volterra

(Berryman 1992). Models are useful for system study because they let researchers

conduct experiments and test theories on the system that would otherwise be un-

ethical or impossible to perform, as well as enabling them to predict the behavior of

varying components of an ecosystem.

Model construction is a difficult and lengthy endeavor. For a given system there

may be many different combinations of processes (i.e. grazing, decay, growth) that

could provide a plausible explanation for the behavior being studied. Thus, ex-

ploring and evaluating all these possibilities makes for a tedious task. In the past,

limitations in computational powers restricted scientists in their ability to investi-

gate more complex models, certain known or suspected processes would be left out

to simplify calculations in part because as computational powers increased so did our

capacity to evaluate more intricate models (Oreskes 2000). In addition, numerical

models of natural systems are non-unique, there is multiple ways to represent the

same dynamic. Creating computational tools that would quickly and automatically

evaluate multiple models seemed to be a promising idea to search through the exten-

sive model space. The success of machine learning and data mining in commercial

domains led scientists to investigate the field of automated modeling to serve that

particular purpose (Fayyad et al., 1996).

The act of gathering small pieces of information and combining it to prior knowl-

edge to formulate a complex overview of an object or process studied is called in-

duction. Induction prevents from searching the entire space of possible equations

by only piecing together the meaningful terms, for instance a predator-prey model

will need terms specifying growth and death (Todorovski et al. 2005). Inductive

modeling methods (i.e. LAGRAMGE, HIPM, ARIMA, FUSE) use the principles of

induction to construct models of the studied system. Methods used for commercial

application, such as Knowledge Discovery in Database (KDD) process, were insuffi-

cient for scientific purposes as they only described and did not explain the observed

system behavior (Langley et al. 2006). A simple example would be the modeling of

water consumption in a city, a water company could easily create a numerical model

based on previous years that would give a good estimate of the projected water

consumption over time but it may not explain why the consumption fluctuates the

way it does. In other words the commercial methods were able to produce models

that are useful when trying to make accurate predictions for a system but become

very limited when trying to explain which processes drive systems behaviors; these

methods did not explore the realm of all possible models. Thus, induction methods

had to be enhanced to automate the task of building and evaluating multiple models

(Dzeroski et al. 1995).

In this thesis, I used the hierarchal inductive process modeling technique, which

is encoded as computer algorithm called HIPM (Langley et al. 2006; Bridewell et

2

al. 2005; Dzeroski et al. 1995; Borrett et al. 2007). Inductive process modeling

methods such as HIPM (Bridewell et al. 2008; Borrett et al. 2007; Langley et al.

2006; Todorovski et al. 2005) searches through two spaces; the first space is made

up of mathematical formulations and alternative model structures, which consist of

entities, processes and the connection biding the two and the second space is made

up of parameter values (Borrett et al. 2007).The system takes as input a hierarchy

of generic processes - a process being a certain action on the system which is defined

by mean of fragment mathematical equations and the rule on how to combine these

fragments with the rest of the equations -, a set of entities - an entity being an ob-

ject regrouping the properties of the organism or nutrient by mean of variables and

parameters - and a set of observed time series of the entities variables (Todorovski

et al. 2005). HIPM will perform one of two search for for the model structure, a

heuristic search or exhaustive search. With the search option selected, HIPM creates

all the possible model structures with the given background knowledge and selects

the best set of parameters for each model structure. Finally, the system ranks the

models based on their sum of squared error (Todorovski et al. 2005).

This system allows for model representation of complex system dynamics, for

example in the study of photosynthesis regulation it generated a model that repro-

duced both the qualitative shape and the quantitative details of the time series data

while incorporating processes that made biological sense (Langley et al. 2006). In

our case we studied the phytoplankton dynamic in the aquatic ecosystem of the Ross

Sea.

In this thesis I used the HIPM tool combined with the appropriate process li-

brary to study of the phytoplankton dynamic in Ross Sea ecosystem. Here the term

process library is defined as the collection of processes (i.e. grazing, decay, growth)

and entities (i.e. phytoplankton, zooplankton, nitrate), with their relation to one

another. It is best represented by Figure 2.

3

Figure 1: This schematic represent the interaction between entities and exogenousvariables driving the model. Here, P, Z , D , NO3 and Fe are the state variables.PUR, T and Ice are the exogenous variables acting on the system and influencing thestate variables. The arrows represent the interaction of one variable onto another(Borrett, unpublished research).

Arrigo, Borrett, Bridewell and Langley used HIPM and the Ross Sea process li-

brary to create and search a space of over 1120 possible model structures to explain

the phytoplankton and nitrogen temporal dynamics in the Ross Sea ecosystem; all

models contained five state variables, phytoplankton, zooplankton, detritus, nitro-

gen and iron. Time series for both phytoplankton and nitrogen where available and

given to HIPM along with the process library. Their initial research found that 200

model structures were deemed of good fit, in this case good fit was defined by models

having a sum of squared error less than or equal to 0.2. From a computer scientist

standpoint, reducing the search space from 1120 models structure to 200 is a great

accomplishment; however for a biologist the solution is not specific enough and offers

few insights on the ecosystem dynamics. There is a need for ways to constraint the

search further, bringing down the number of good fit models, making the output

4

Figure 2: A tree diagram representing the process library constructed for the RossSea ecosystem problem. The interaction between processes and entities is defined inthe library as explained in Section 2.1.2 ( Borrett et al. 2007)

useful to biologists.

Superficially, HIPM appears related to equation discovery methods, which is a

subfield of machine learning (Langley, 1995; Mitchell, 1997) that investigates col-

lections of measurements and observations, using different computational methods,

in search of quantitative laws (Todorovski, 2003). For example the LAGRAMGE

system will take in as input background knowledge encoded in terms of a grammar

5

specifying the space of possible equations and a dependent variable and will output

the best equation for the variable, able to only perform the search for one variable

at the time (Dzeroski et al. 1993, Todrovski 2003). This is further related to the

methods used in Ljungs work (1993) on system identification, but is further removed

to that of inductive process modeling.

The main assumption behind system identification is that the model structure

is known and that the primary concern is finding the adequate parameter values;

equation discovery focuses on both the structure and parameter values (Todorovski

et al. 1998). Both of these approach produce descriptive models that summarize

and predict the data but they fail to search through the space of alternative expla-

nations, these methods do not take into account models with theoretical variables

or consider alternate processes to explain certain dynamics (Bridewell et al. 2005).

The Southern Ocean covers an area equivalent to about 10% of the global ocean

and is a key element of the global ocean system as it links all major ocean basins and

facilitates the global distribution of its deep water; it is considered to play an impor-

tant part in the global carbon (C) cycle (Arrigo et al. 2003). The Ross Sea polynya

(area of open water surrounded by sea ice) is one of the most productive ecosystems

in the Southern Ocean as it experiences some of the largest phytoplankton blooms

in the region (Arrigo et al 1994, 1998, 2000, 2003). Indeed, phytoplankton produc-

tivity (photosynthesis) is important to the carbon cycle as it removes carbon dioxide

(CO2) from surface water during photosynthesis, part of which will then be exported

to deep ocean water. What makes the Ross Sea polynya so interesting for ecologist

compared to other locations such as Terra Nova Bay, is the type of phytoplankton

dominating the ecosystem. In the Ross Sea polynya , Phaeocystis antartica domi-

nates as opposed to diatoms (species such as Fragilariopsis spp.) in Terra Nova Bay.

Phaeocystis antartica are thought to resist grazing more than other phytoplankton

species, which could imply that more carbon would be taken from shallow water into

6

the depth as the un-eaten phytoplankton full of CO2 sinks to the bottom (Tagliabue

and Arrigo 2003). Deep ocean water has a larger residence time than shallow water,

meaning that carbon trapped in deep ocean water will be effectively removed from

atmospheric circulation for a much longer time than the carbon contained in surface

water.

Figure 3: Map of the southwestern Ross Sea showing the Ross Sea ploynya, locatednorth of the Ross Sea Ice Shelf, and the Terra Nova Bay polynya, located on thewestern continental shelf (Arrigo et al. 2003)

Thus, there is an incentive to understand the ecological processes that control the

7

phytoplankton productivity and community composition -which species dominates-

in the Ross Sea. Fluctuations in phytoplankton population could potentially have

effects on the CO2 levels in the atmosphere (Carlson et al. 1998) and if we can

figure out why Phaeocystis antartica is predominant it would be useful informa-

tion to scientist as they entertain the idea of altering phytoplankton populations

around the world to create carbon sinks, providing a temporary solution to our CO2

problem. It is all these elements that initiated the search for the best process ex-

planation of the phytoplankton dynamics in the Ross Sea, by determining which

processes act upon the system and which entities are most important, scientist will

accumulate knowledge that may prove valuable in the fight against rising CO2 levels.

As mentioned the tool that I have chosen for model search relies on measure-

ments and observations of one or more variables of a system to make inferences on

the remaining variables for which no data is available and the processes at works in

the system. In Borrett’s study, the only state variables for which he had measure-

ments and observations are Phytoplankton and Nitrate. Ultimately the goal is to

select model structures that would be good approximations of the natural system

and give good insights on the processes at work in the system. However, here I was

faced with an under constrained optimization problem, there was no data available

for 3 of the state variables. Indeed, one of the big challenges of using HIPM for this

particular ecosystem was that the data that is used to conduct the search is very

expensive to collect, and it becomes especially complicated when it comes to iron

(Fe) as it is difficult to measure. From this last statement arise two questions: does

knowing data for more than one state variable narrow down the number of possible

good fit models in a significant manner? Will knowledge about certain variable have

better optimization power than for others? For example if we could only afford to

collect data for one of the five variables in the system, would phytoplankton give us

8

better model output (fewer good fit models) in HIPM than zooplankton or would it

be detritus ?

This is an important question because as scientist are trying to advance their knowl-

edge on the Ross Sea; there is a need to make educated decisions on what information

to collect in an effort to optimize the use of resources.

This thesis is structured in five parts, firstly I described the method used to

gather the data that was used in my analysis, and this includes the HIPM software

as well as an overview of the data sets. I then went into the quantitative analysis,

by looking strictly at the results generated from the HIPM software and discussing

what it tells us on an ecological standpoint. In section 4, I entered the analytical

part of our analysis, picking and studying some of the best-fit models selected during

the quantitative analysis. I then discussed these analytical results and in the next

section tied it back to the biology in an effort to link both qualitative and quantita-

tive research. Through this analysis we saw how we can help HIPMs model selection

method as well as assist scientists in finding a model that most accurately explain

the processes at works in the ecosystem observed.

9

2 METHOD

The method employed in this paper involves constructing process models from con-

tinuous data. To assist in this task we used a piece of software named HIPM. It

is the output and model selection efficiency of this computer software that we are

investigating. To better understand the task at hand it is important to define what

HIPM does, as well as the steps we are taking to test its efficiency.

2.1 HIPM Description

Ecologists rely on system modeling quite heavily to build ecological theory, guide

environmental assessment and management (Borrett et al. 2007). Typically scien-

tists will build and study a couple of models, basing the model structure on previous

research or by making a judgement call on which entities and processes should or

not be included. One of the aspirations and problems of modeling natural systems is

to capture the essence of the system necessary for the model purpose by figuring out

what can be left out; in that regards which entities and processes should be included,

and what are the best mathematical formulation and parameter values for a given

structure become an essential part of this search. Choosing from among the possible

model structures presents an intricate and time consuming challenge for ecologists

who want to navigate this space (Borrett et al. 2007). In searching through this

space of possible models, we are guided by the claim made by Langley et al. (1987),

which we support, that we must look for models that will fit real-life observations. In

summary,we are faced with the problem of constructing models anchored in domain

theory, conducting a time consuming search and linking the models to empirical

data (Borrett et al. 2007). This is where the HIPM software comes into play to

remedy these issues, HIPM stands for Hierarchal Inductive Process Modeling. This

scientific approach (Lantley et al. 2005) assumes the following:

10

• Given: Time-series data for continuous variables.

• Given: Background knowledge about the entities of the system; in other words

constraints on variables and other parameters driving these entities.

• Given: Background knowledge on the type of processes that may be involved

in driving the ecosystem as well as the constraints that may exist for the said

processes.

Then the task for the software is to perform a search through the structure and

parameter space defined by the process-entity library to find the models that best

fit the data. HIPM operates in four phases.

1. In an exhaustive search, it first finds all the possible instantiations of the

generic processes for all variables. This means that the system will find all the

possible combinations of processes that can affect a given variable (We will

give an example in Section 2.1.2 ). For our purposes we used the exhaustive

search option programmed into the software but there is also a heuristic search

option available.

2. The system then walks through each model and puts them together. In other

words, it puts together, into a generic model, one instantiation of generic

processes for each variable present in the system. It uses the constraints given

by the users to determine which instantiations can be linked together into a

generic model; the program goes through an exhaustive search to find all the

possible models. In our study it makes 1120 model structures, due mainly to

the large amount of different grazing processes that are potentially present in

the ecosystem.

3. It searches for the parameter values for each model using the constraints de-

fined by the users. To infer these parameters, the system picks a random

11

set of values that respect the constraints and, using the Levenberg-Marquardt

gradient descent method, finds a local optimum. To avoid entrapment in lo-

cal minima, the system will restart the parameter estimation from multiple

random points retaining only the parameters that produce the lowest error.

In our experiment we set the number of restarts to 128. This technique has

been found to produce reasonable matches to time series in multiple systems

(Langley et al. 2007).

4. Evaluates the performances of the produced model structures (predicted val-

ues) against the data series (observed values) by calculating the root mean

square error (reMSE); models with the lowest reMSE will be considered best

fit models.

2.1.1 Measure of Fit

As mentioned above, HIPM evaluates and selects the best model structure and set

of parameters according to a fitness measure. The system currently uses the sum

of square error (SSE) to evaluate fitness (Bridewell et al. 2007), which is defined as

follow:

n∑i=1

SSE(xi, xobsi ) =

n∑i=1

m∑k=1

(xi,k − xobsi,k )2

where xi, . . . , xn are the variables that are being fitted with m observed values for

each. To take into account the modeling of variables of varying scale, the system

uses a relative mean squared error that we define in the following way:

reMSE =

∑ni=1

SSE(xi,xobsi )

s2(xobsi )

nm

Here s2(xobsi ) is the sample variance of the observation for xi. Across this paper

12

we will refer to the relative mean squared error as reMSE. The biggest asset to this

rescaling is the ability to compare values across data sets. Typically, an ReMSE of

1.0 or above signifies that the model performs poorly and inversely, the lower the

reMSE, the better the fit.

2.1.2 Entities specification and model library

Each entity of a system is defined by a combination of variables and parameters

which makes them actors but also receivers of action in the model. A distinction is

to be made between generic entity and instantiated entity. Indeed, a formal generic

entity has a name and a set of properties which can include both variables and

parameters. In a given model the parameters of the instantiated entity will not

change whereas the variables do. Every variable in the entity has a name and a

rule that determines how multiple processes and their subprocesses are combined

(e.g. summed, minimum, product, etc...). For the parameters there is a name

and a range that constrains their possible values. On the other hand, instantiated

entities have their variables associated with either time-series or they are given initial

values and the parameters have been assigned real values. A field is also included

to indicate the parent generic entity (Borrett et al. 2007). One given generic entity

can be instantiated multiple times, the generic entity can be thought of as a blue

print for the instantiated entities. For example in our system we defined the entity

phytoplankton as presented in Table 1. Here our entity’s name is “P”; it contains the

variables “conc”, “growth rate” and “growth lim” with the rules determining how

they will be aggregated with other processes; the next part of the entity definition is

the list of parameters that are of concern for this entity such as “max growth’ with

possible values in the (0,600) range. Following the definition of a generic entity in

Table 1 is an instantiated entity, “pe” which refers to the parent generic entity. The

variables are then either given the name of a time-series to which the model will be

13

fitted such as for “conc”, with the “PHA c” referring to the phytoplankton column

of the CIAO data set, or an initial value such as 0 for “growth rate”, indicating

that this particular state variable won’t be fitted to a time-series. The mention

“system” as opposed to “exogenous” simply states that this variable is dependent

on the system as opposed to being independent like variables such as solar radiation

or water temperature. The full instantiated entity library can be found in Appendix

B and the generic entity library in Appendix C.

For HIPM to be fully functional there needs to be a library of processes. Processes

are the physical, chemical, or biological actions that drive change in dynamic models.

Just as we made a distinction between generic entity and instantiated entity, we

make a distinction between generic processes and instantiated processes. All generic

processes are defined by a name by which entities can tie into the process, the

subprocesses that are tied to that one process and one or multiple equations. The

generic process can also include a set of Bolean conditions that determine if the

process is active, making the process dynamic by turning the process on and off

depending on whether the conditions are satisfied (Borrett et al. 2007). For instance

we could set the photosynthetic process to only occur if a set environment light

variable is greater than zero. We have an example of generic process in Table 2, it is

named “growth”, and any of the following entities “P, N, D, E”can take a role in the

process, then there is a list of the subprocesses, with the entities that can take a role

in the subprocess, that are linked to this process and finally the equation that defined

this process; this equation calls onto the “conc” and “growth rate’ variables that all

entities must have. The instantiated process will take on a specific name and will be

bound to a specific instantiated entity, one of P, N, D or E. The instantiated entity

will take it’s role in the equation of the instantiated process. All the instantiated

processes will be aggregated according to the rule defined in the generic entity. It

is this organization in terms of entity and process that drives inductive process

14

modeling. It makes for an easier construction of systems of equations by building in

fragments.

Table 1: In this table we are first giving an example of generic entity definition withits variables and parameters followed by an example of an instantiated entity, morespecifically Phytoplankton - P, to which the variable “conc” is given a time seriesand the other variables initial values.

pe = lib.add_generic_entity("P",

{ "conc":"sum",

"growth_rate":"prod",

"growth_lim":"min"},

{ "max_growth": (0.4,0.8),

"exude_rate": (0.001,0.2),

"death_rate": (0.02,0.04),

"Ek_max":(1,100),

"sinking_rate":(0.0001,0.25),

"biomin":(0.02,0.04),

"PhotoInhib":(200,1500),});

p1 = entity_instance (pe, "phyto",

{ "conc": ("system", "PHA_c", (0,600)),

"growth_rate": ("system", 0, (0,1)),

"growth_lim": ("system", 1, (0,1))},

{ "max_growth":0.59,

"exude_rate":0.19,

"death_rate":0.025,

"Ek_max":30,

"biomin":0.025,

"PhotoInhib":200 } );

15

Table 2: Defining a process - Growth

lib.add_generic_process(

"growth", "",

[("P",[pe],1,1), ("N",[no3,fe],1,100),

("D",[de],1,1), ("E",[ee],1,1)],

[("limited_growth", ["P","N","E"], 0),

("exudation",["P"],1),

("nutrient_uptake",["P","N"],0)],

{},

{},

{"P.conc": "P.growth_rate * P.conc"} );

To sum it up, HIPM’s power resides in its knowledge of the modeled domain as

well as its ability to estimate parameters (Bridewell et al. 2007).

2.2 Experiment Design

Having now established how HIPM works let us consider the problem at hand.

Though in theory HIPM is an extremely powerful tool which permits a search

through a wide structure and parameter space, previous research has demonstrated

that a more thorough investigation of HIPM’s output is necessary to evaluate its

potential and usefulness to biologist. In our example of the Ross Sea ecosystem

with the process-entity library set up as described, the search space represents 1120

possible models; each model can take on a wide variety of parameters set depending

on the constraints given to the software. The Phytoplankton dynamic models of the

Ross Sea have five variables: Phytoplankton (P ), Zooplankton (Z), Detritus (D),

Nitrate (N) and Iron (F ). In previous research, real-life time series about Phyto-

16

plankton and Nitrate were available to us for this particular ecosystem, thus the

data was fed to HIPM. By doing so, HIPM came out with about 200 possible mod-

els that have a reMSE of less or equal to 0.2 which from a computer science stand

point is a good improvement. Indeed, we reduce the search space from 1120 possible

models to 200 models. However, for a biologist that is still a quite large amount of

models approximating the ecosystem studied; going through and testing out every

one of these 200 models would be extremely time-consuming. Therefore, it is clear

that we somehow need to lower this number of possible models to a point deemed

reasonable/useful to biologist. Logically we assume that increasing the number of

constraints (i.e. add real-life time series of a variable for which we had no previous

empirical data) would help model discrimination in HIPM. But this would imply

that the scientist would have to go into the field and collect time series for one of

the variables in the system; that process being very expensive, can HIPM be used

to make an informed decision about which variable would yield the most discrimi-

natory powers, if there is at all a difference between variables? This is what we are

investigating and in the light of these elements we have formulated two hypotheses:

• Hypothesis 1: Increasing the number of constraints: increasing the number of

time-series for which we have data in HIPM for model selection will induce

better fits. In other words, the increase in number of known time-series of

system variables leads to better model discrimination and therefore better

model selection.

• Hypothesis 2: Variables yield different values of information: some variables

will have more discriminatory power and restrict the best fit models more than

others.

To test our two hypotheses it was imperative to employ a full data set including

time-series for all variables of the system in order to compare the results depending

17

upon whether certain time-series are included or not as constraint for HIPM. Since

no full data set with real-life data was available, we turned to a simulated data set

called the ”Couple Ice and Ocean model” datasets otherwise referred to as CIAO

datasets. This dataset is generated from a three dimensional ecosystem model that

spans the entire water column and multiple stations across the Ross Sea. However,

for our purposes only a portion of this data, the top 5 meters at the Ross Sea Polynya

station 01, is used. The type of information contained in the CIAO dataset is stated

in Table 3.

Table 3: Information included in the CIAO data set.NOTE: A sample of the CIAO 1997 data can be found as Appendix A.

Symbol Units DescriptionJDAY Day Day of the measurementsTEMP ◦C Temperature of the waterDPML m Mixed layer depthAI Sea ice concentrationNITR µM Nitrate concentrationPHOS mg Chla/m3 Phosphate concentration,SILC µM Silicate concentrationIRON nM or µM Iron concentrationPARL µmol photons m−2 s−1 Solar radiation used by organism in photosynthesis.PHA mg Chla/m3 Phaeo chlorophyll concentrationDIAT mg Chla/m3 Diatom chlorophyll concentrationZOO mg C/m3 Zooplankton concentrationDET mg C/m3 Detritus concentrationPURL µmol photons m−2 s−1 Photosynthetically usable radiation

In addition to a full data set, it is necessary to have a working library, that, as

stated in Section 2.1.2, defined both entities and processes for HIPM. The process-

entity library that we used is available in Appendix B and C, it was previously

put together by Bridewell, Borrett, Langley and Arrigo. All the processes and

subprocesses in which the instantiated entities can take a role in our study are

represented in Figure 2.

18

Having the background knowledge necessary for HIPM to conduct successful runs

we designed thirty one experiments; each experiment represents a possible combi-

nation of time-series constraints that could potentially be entered into the software.

For example, if we had time-series for Iron and Nitrate and fed the information into

HIPM they would act as additional constraints in the model selection process. To

be selected, models have to exhibit behavior close to the given time-series. All the

experiments are summarized in Table 4 .

19

3 COMPUTATIONAL RESULTS

The main topic in this paper, is to determine how to optimize the usage we make of

HIPM to assist scientists in there decision making process when it comes to selecting

a model that most accurately represent an ecosystem. The first need is to narrow

down the number of possible good fit models capable of describing the system. We

did this feeding additional time series about one of the state variable into HIPM,

thus providing more constraints; so did this assumption hold true? Secondly, if

adding more constraints to HIPM does reduce that number, are observations for a

specific state variable holding more reducing power than the other state variables?

The data collected helped us answer these questions as well as discuss the efficiency

of HIPM in its current state.

There were thirty-one different experiments performed, each returning a measure of

fit value (reMSE) every one of the 1120 models tested in every experiment. This

makes for a large amount of data to analyze. To get a better idea of what this data

looks like, the measures of fit values of models that had an reMSE between 0 and

2 were graphed, ranking and graphing them from lowest to highest (see Figure 4, 5

and 6) value. We did not look at reMSE higher than 2.0 since, as stated previously,

models with reMSE higher than 1.0 are typically classified as poorly performing

models as it indicates a very large difference between observed and expected values.

We estimated that the (0,2) range would be sufficient for our purpose, as it would

encompass most models. Based on these initial results we decided to pick an reMSE

of 0.5 as our good fit model cutoff; any model under that cutoff is considered of good

fit. This choice of cutoff was made because the multiple graphs seemed to exhibit a

turning point or slight step pattern around this reMSE value, such as portrayed in

the graph for experiments 1, 5 or 20.

20

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0.0

0.5

1.0

1.5

2.0 1[P]

197 Good Fit Models●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

2[Z]

101 Good Fit Models●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

3[D]

366 Good Fit Models

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

0.0

0.5

1.0

1.5

2.0 4[N]

439 Good Fit Models●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

5[F]

509 Good Fit Models●●●●

●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●

●●

●●●●●●

●●

●

●●

6[P,Z]

5 Good Fit Models

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●

0.0

0.5

1.0

1.5

2.0 7[P,D]

61 Good Fit Models●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●

8[P,N]

25 Good Fit Models ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●

●

●

9[P,F]

79 Good Fit Models

●●●●●

●●●

●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

0 200 400 600 800 1200

0.0

0.5

1.0

1.5

2.0 10[Z,D]

8 Good Fit Models

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●

●

●●

●●

●●

0 200 400 600 800 1000

11[Z,N]

1 Good Fit Models

●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●

●●

●

●●●

●

●●

●

●●●

0 200 400 600 800 1000

12[Z,F]

0 Good Fit Models

Figure 4: reMSE value are ranked from lowest to highest. The reMSE = 0.5 signifiesthe good fit model cutoff, any models under that value are considered good fit models.The experimental setup for each run as well as the ID number is indicated in thetop right corner.

21

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0.0

0.5

1.0

1.5

2.0 13[D,N]

67 Good Fit Models ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

hierarchal inductive process modeling and analysis youri...

Documents