chapter 1: mathematical models and data...

24
9/2/2011 1 Chapter 1: Mathematical Models and Data Analysis 1.1 Introduction 1.2 Mathematical models 1.3 Types of problems in mathematical modeling 1.4 What is data analysis? 1.5 Types of uncertainty in data 1.6 T ypes of applied data analysis and modeling methods 1.7 Example of a data collection and analysis system 1.8 Decision analysis and data mining 1.9 Structure of book 1 Chap 1Data AnalysisReddy Energy Use-Background Energy use is central to the functioning and development of societies societies Large amounts of energy are used in the world (and the United States in particular) Even small annual increases in energy use has drastic results- exponential growth ( rates in developing countries are relatively large) We are close to running out of conventional energy resources (currently supplies 85% of needs) Energy use is causing other societal problems Chap 1Data AnalysisReddy 2

Upload: others

Post on 31-Aug-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 1: Mathematical Models and Data Analysisauroenergy.com/wp-content/uploads/2011/12/Chap1... · 1.2 Mathematical models 1.3 Types of problems in mathematical modeling ... 1.2.3

9/2/2011

1

Chapter 1: Mathematical Models and Data Analysis

1.1 Introduction

1.2 Mathematical models

1.3 Types of problems in mathematical modeling

1.4 What is data analysis?

1.5 Types of uncertainty in data

1.6 Types of applied data analysis and modeling methodsyp pp y g

1.7 Example of a data collection and analysis system

1.8 Decision analysis and data mining

1.9 Structure of book

1Chap 1‐Data Analysis‐Reddy

Energy Use-Background

• Energy use is central to the functioning and development of societiessocieties

• Large amounts of energy are used in the world (and the United States in particular)

• Even small annual increases in energy use has drastic results-exponential growth

( rates in developing countries are relatively large)

• We are close to running out of conventional energy resources (currently supplies 85% of needs)

• Energy use is causing other societal problems

Chap 1‐Data Analysis‐Reddy 2

Page 2: Chapter 1: Mathematical Models and Data Analysisauroenergy.com/wp-content/uploads/2011/12/Chap1... · 1.2 Mathematical models 1.3 Types of problems in mathematical modeling ... 1.2.3

9/2/2011

2

World primary energy use has increased exponentially

Worldwide energy use: 500 ExaJoules (16 TW)Annual growth is about 2% : doubling time ~ 35 years

Chap 1‐Data Analysis‐Reddy 3

Population growth from 1950: About 220% Energy growth: About 420%

From Boyle et al., 2003

Carbon Wedges

Chap 1‐Data Analysis‐Reddy 4

Source: Socolow & Pacala,

“A plan to carbon in check”

Scientific American, Sept. 2006

Page 3: Chapter 1: Mathematical Models and Data Analysisauroenergy.com/wp-content/uploads/2011/12/Chap1... · 1.2 Mathematical models 1.3 Types of problems in mathematical modeling ... 1.2.3

9/2/2011

3

U.S. Primary Energy Consumption by Source and Sector, 2008(Quadrillion Btu)

Chap 1‐Data Analysis‐Reddy 5

Source: Energy Information Administration, Annual Energy Review 2008, Tables 1.3, 2.1b‐2.1f , 10.3, and 10.4

U.S. Building Energy Scene Buildings consume over 40% of the total energy Also about 70% of electricity produced in US Responsible for 49% SOx and 35% of CO2p $785 billion (6.1% of GDP)- new construction $483 billion (3.3% of GDP)- retrofits and repairs 2003 CBECS study- over 85% of building stock build > 1990

About 2% undergo major retrofits each year

Opportunities in Commercial Building energy saving

6

Opportunities in Commercial Building energy saving(OTA,1998):-Current cost-effective technology ~ $20 B/yr (25% energy savings)- Advanced/Intelligent technology ~ $50 B/yr (over 50%)

Chap 1‐Data Analysis‐Reddy

Page 4: Chapter 1: Mathematical Models and Data Analysisauroenergy.com/wp-content/uploads/2011/12/Chap1... · 1.2 Mathematical models 1.3 Types of problems in mathematical modeling ... 1.2.3

9/2/2011

4

A typical building undergoes various phases during its life span in terms of its energy use:

(a) Design phase (typical duration is 1-2 years) includes designing for energy efficiency (i) conceptual or schematic design, where number of design alternatives involving different types of feasible systems and their associated sub-systems(ii) detailed design and construction document preparation of the final design selected.

(b) Construction (typical duration is 1-2 years) which includes:(i) civil construction(ii) installation of the various building systems,(iii) start-up commissioning to assure that various systems are installed properly andare operating as intended.

(c) Operation and maintenance (O&M) over its active life (typically 50-60 years for the shell, and 20-25 years for the MEP systems):

(i) di (ii) O&M

Chap 1‐Data Analysis‐Reddy 7

(i) audits (ii) O&M assessment(iii) commissioning (iv) maintenance(v) condition monitoring, (vi) fault detection and diagnostics(vii) active load control (viii) supervisory control(ix) monitoring and verification of energy conservation measures (ECM)

Often 10 timesmore than theembodied orInitial energy

Energy Conservation• In most cases energy conservation measures are more cost

effective than supply side measures (such as renewables or new conventional power)new conventional power)

• Over next 20 years, expected energy efficiency gains of:

- 25-30% in most industrialized countries

- over 40% in developing countries

• MAJOR driver of sustainable development

• Energy management allows one to better use existing power resources and defer new generation capacity

Along with low energy design, important need to operate existing buildings efficiently- need for different skill sets

Chap 1‐Data Analysis‐Reddy 8

Page 5: Chapter 1: Mathematical Models and Data Analysisauroenergy.com/wp-content/uploads/2011/12/Chap1... · 1.2 Mathematical models 1.3 Types of problems in mathematical modeling ... 1.2.3

9/2/2011

5

Data analysis and modeling methods

Used when performance data of the system is available for:for:

- predicting the behavior of the system under different operating conditions,

- identifying energy conservation opportunities,

- verifying the effect of energy conservation measures and commissioning practices once implemented (M&V),

- verifying that the system is performing as intended (called condition monitoring)

- Operating the various sub-systems optimally (supervisory control)

Chap 1‐Data Analysis‐Reddy 9

1.2 Mathematical Models

1.2.1 One way of classifying data

Categorical (nominal qualitative): non numerical quality ex- Categorical (nominal, qualitative): non-numerical quality, ex. Male/female, pass/fail

- Ordinal: has order or rank, ex. hot/mild/cold day

- Count: number of items in certain classes

- Metric: measurements of time, weight, height,…

Chap 1‐Data Analysis‐Reddy 10

Dimensionality of data: univariate, bi-variate, multi-variatey=f(x1) y=f(x1, x2)

Page 6: Chapter 1: Mathematical Models and Data Analysisauroenergy.com/wp-content/uploads/2011/12/Chap1... · 1.2 Mathematical models 1.3 Types of problems in mathematical modeling ... 1.2.3

9/2/2011

6

Discrete and Continuous Data

Fig.1.1 Fig. 1.2Philadelphia, PA

10

n=60

1 2 3 4 5 6

Num

ber

0 20 40 60 80 1000

40

80

120

160

200

240

Chap 1‐Data Analysis‐Reddy 11

1 2 3 4 5 6

Dry bulb temperature0 20 40 60 80 100

Rolling of a dice

(discrete data can only take on finite or countable number of values )categorical, ordinal, count) 

(data can take on any continuous value,discretization is due to instrument)

Source of data:

‐ Population

‐ Sample

‐ single sample: single or successive readings taken at different times but under identical conditions

‐multi‐sample: repeated measurements 

of same quantity by different 

instruments or observers

“Round‐robin” experiment of testing the same solar thermal collector in six different test facilities 

Fig. 1.3

‐ Two stage: staged experiments, second reading taken after first reading is analyzed

Chap 1‐Data Analysis‐Reddy 12

(shown by different symbols) following the same testing methodology

Page 7: Chapter 1: Mathematical Models and Data Analysisauroenergy.com/wp-content/uploads/2011/12/Chap1... · 1.2 Mathematical models 1.3 Types of problems in mathematical modeling ... 1.2.3

9/2/2011

7

1.2.2 What is a system model?

A system is the object under study which could be as simple or as complex as one may wish to consider. It is any ordered,as complex as one may wish to consider. It is any ordered, inter‐related set of things, and their attributes. 

A model is a construct which allows one to represent the real‐life system so that it can be used to predict the future behavior of the system under various “what‐if” scenarios. 

Chap 1‐Data Analysis‐Reddy 13

1.2.3 Types of Models(i) intuitive models (or qualitative or descriptive models) - non-analytical because only

general qualitative trends of the system are known. Such a model which relies on quantitative or ordinal data is an aid to thought or to communication. Ex: Sociological or anthropological behavior

(ii) empirical models -use metric or count data are those where the properties of the system can be summarized in a graph, a table or a curve fit to observation points. Such models presume some knowledge of the fundamental quantitative trends but lack accurate understanding. Econometric models are typical examples; and

(iii) mechanistic models (or structural models) which use metric or count data are based on mathematical relationships used to describe physical laws such as Newton’s laws, the laws of thermodynamics, etc... Such models can be used for prediction (system design) or for proper system operation and control (data analysis).

Further such models can be separated into two sub-groups: • exact structural model where the model equation is thought to apply rigorously,• inexact structural models where the model equation applies only approximately, either

because the process is not fully known or because one chose to simplify the exact model so as to make it more usable.

Chap 1‐Data Analysis‐Reddy 14

Page 8: Chapter 1: Mathematical Models and Data Analysisauroenergy.com/wp-content/uploads/2011/12/Chap1... · 1.2 Mathematical models 1.3 Types of problems in mathematical modeling ... 1.2.3

9/2/2011

8

System model

A system model is a description of the system. Empirical and mechanistic models are made up of three components:

1x 1yf(x)

p p

(i) input variables (also referred to as regressor, forcing, exciting, exogenous or independent variables which act on the system:

- controllable by the experimenter, - uncontrollable or extraneous such as climatic variables;

(ii) system structure and parameters/properties provide the necessary physical description of the systems in terms of physical and materialphysical description of the systems in terms of physical and material constants ; for example, thermal mass, overall heat transfer coefficients, mechanical properties of the elements; and

(iii) output variables (also called response, state, endogenous or dependent variables) which describe system response to the input variables.

Chap 1‐Data Analysis‐Reddy 15

Examples of Simple Models

Pressure drop of fluid flowing through pipe2

2h

L vp f

D

( )sp s i

dTMc P UA T T

d

Lumped model of water storage tank

Chap 1‐Data Analysis‐Reddy 16

( )p s idt

Can you identify the input variables,system parameters and output variables?

Page 9: Chapter 1: Mathematical Models and Data Analysisauroenergy.com/wp-content/uploads/2011/12/Chap1... · 1.2 Mathematical models 1.3 Types of problems in mathematical modeling ... 1.2.3

9/2/2011

9

Different classification methods

1 Distributed vs lumped parameter

Table 1.1 Ways of classifying mathematical models

2 Dynamic vs static or steady-state

3 Deterministic vs stochastic

4 Continuous vs discrete

5 Linear vs non-linear in the functional model

6 Linear vs non-linear in the model parameters

7 Time invariant vs time variant

Chap 1‐Data Analysis‐Reddy 17

8 Homogeneous vs non-homogeneous

9 Simulation vs performance models

10 Physics based (white box) vs data based (black box) and mix of both (grey box)

Fig. 1.4. Example of solid sphere cooling

Radius ofsphere

Thi t b d l d l d d l id d th Bi t b Bi <0 1

1/k 1/h Heat flow 0.1ehLBi

k

Chap 1‐Data Analysis‐Reddy 18

This system can be modeled as a lumped model provided the Biot number Bi <0.1. This number is proportional to the ratio of the heat conductive resistance (1/k) inside the sphere to the convective resistance (1/h) from the outer envelope of

the sphere to the air.

Page 10: Chapter 1: Mathematical Models and Data Analysisauroenergy.com/wp-content/uploads/2011/12/Chap1... · 1.2 Mathematical models 1.3 Types of problems in mathematical modeling ... 1.2.3

9/2/2011

10

iT

X X

0T

Heatflow

Wallsection

andx

R C c A x

ih0h

(a)

.

Schematic

C

0TiT

0TiT

and pR C c A xkA

1

1

2 i

RR

h A 2

0

1

2

RR

h A

(b)

R1 R2 Rn Rn+1Ts1 Ts2 Tsn

2R1C lumped model

Chap 1‐Data Analysis‐Reddy 19

1C

(c)2C nC

xFigure 1.5 Thermal networks to model heat flow through a homogeneous plane wall of surface area A and wall thickness

n th order lumped model

1x 1y

Linear vs Non-linear

1 1y

2y2x

1 1c xc y c y

Chap 1‐Data Analysis‐Reddy 20

2 2c x 1 1 2 2c y c y

Figure 1.6 Principle of superposition of a linear system

Page 11: Chapter 1: Mathematical Models and Data Analysisauroenergy.com/wp-content/uploads/2011/12/Chap1... · 1.2 Mathematical models 1.3 Types of problems in mathematical modeling ... 1.2.3

9/2/2011

11

Linear vs Non-linear

• An important distinction needs to be made between a linear model and a model which is linear in its parameters. Formodel and a model which is linear in its parameters. For example,

• is linear in both model and parameters a and b,

• is a non-linear model but is linear in its parameters, and

1 2y ax bx

1 2siny a x bx

• is non-linear in both model and parameters.

Chap 1‐Data Analysis‐Reddy 21

1exp( )y a bx

Simulation vs Performance Based

Simulation models are used to predict system performance during the design phase when no actual system exists and alternatives are being evaluated.

Performance based models rely on measured performance data of the actual system to provide insights into model structure and to estimate its parameters.

• White-box models (also called detailed mechanistic models, reference models or small-time step models) are based on the laws of physics and permit accurate and microscopic modeling

• Black-box models (or empirical or curve-fit or data-driven models) areBlack box models (or empirical or curve fit or data driven models) are based on little or no physical behavior of the system and rely on the available data to identify the model structure.

• Gray-box models fall in-between the two above categories and are best suited for performance models

Chap 1‐Data Analysis‐Reddy 22

Page 12: Chapter 1: Mathematical Models and Data Analysisauroenergy.com/wp-content/uploads/2011/12/Chap1... · 1.2 Mathematical models 1.3 Types of problems in mathematical modeling ... 1.2.3

9/2/2011

12

Models for Sensor ResponseSteady-state models (also called zeroeth order models):

- simplest model one can use.

- They apply when input variables (and hence, the output variables) are y pp y p ( p )maintained constant.

A zeroeth order model for the dynamic performance of measuring systems is used (i) when the variation in the quantity to be measured is very slow as compared to how quickly the instrument responds, or (ii) as a standard of comparison for other more sophisticated models.

For a zero-order instrument, the output is directly proportional to the input, such that (Doebelin, 1995):

1.7a 1.7b

where a0 and b0 are the system parameters, assumed time invariant,

qo and qi are the output and the input quantities respectively,

K = b0 /a0 is called the static sensitivity of the instrument (only parameter needed to specify sensor response)

Chap 1‐Data Analysis‐Reddy 23

a q b q0 o 0 i q Kqo i

The first-order model:or

where is the time constant of the instrument = a1 / a0 , and K is the static sensitivity of the instrument which is identical to the value defined for the zeroeth model. Thus, two numerical parameters are used to completely specify

adq

dt+ a q b q1

00 o 0 i dq

dt+ q Kq0

o i

1.8b

a first-order instrument.

)1(K.q)(q /iso

tet

The solution to eq.1.8b for a step change in input is:

isqwhere is the value of the input quantity after the step change.The time constant characterizes the speed of response; the smaller its valuethe faster its response and vice versa

1.9

Chap 1‐Data Analysis‐Reddy 24

the faster its response, and vice versa.Numerically, the time constant represents the time taken for the responseto reach 63.2% of its final change, or to reach a value within 36.8% of thefinal value. This is easily seen from eq. 1.9, by setting t= , in which case :

1o

is

q ( )(1 ) 0.632

K.q

te

Page 13: Chapter 1: Mathematical Models and Data Analysisauroenergy.com/wp-content/uploads/2011/12/Chap1... · 1.2 Mathematical models 1.3 Types of problems in mathematical modeling ... 1.2.3

9/2/2011

13

Steady-state value

t re

adin

g

15

20

25

Steady-state value

63.2% of changeSmall time constant instrument

Large time constant instrument

0 5 10 15 20 25 30

Instr

um

en

0

5

10

Dynamic responseDynamic response

of two instruments

Large timeconstant

Small timeconstant

Time from step change in input (seconds)

g

Chap 1‐Data Analysis‐Reddy 25

Fig 1.2.7 Step-responses of two first-order instruments with different response times with assumed numerical values of time (x-axis) and instrument reading (y-axis).

The response is characterized by the time constant which is the time for the instrument reading to reach 63.2% of the steady-state value

Block Diagrams• Information flow or block diagram is a standard shorthand manner of

schematically representing the inputs and output quantities of an element or a system as well as the computational sequence of variablesvariables.

• Widely used during system simulation since a block implies that its output can be calculated provided the inputs are known.

• They are very useful for setting up the set of model equations to solve in order to simulate or analyze systems or components.

• Block diagrams should not be confused with material flow diagramswhich for a given system configuration are unique. On the other hand, there can be numerous ways of assembling block diagrams depending on how the problem is framed.

Chap 1‐Data Analysis‐Reddy 26

Page 14: Chapter 1: Mathematical Models and Data Analysisauroenergy.com/wp-content/uploads/2011/12/Chap1... · 1.2 Mathematical models 1.3 Types of problems in mathematical modeling ... 1.2.3

9/2/2011

14

1p

2pv

s

Fig. 1.8 Schematic of a centrifugal pump rotating at speed s (say in rpm)pump rotating at speed s (say, in rpm)

which pumps a water flow rate v from lower pressure p1 to higher pressure p2

Pump

Pump

Pump

1p

1p

1p

2p

2p

2p

v

v

v

(a)

(b)

(c)

Chap 1‐Data Analysis‐Reddy 27

Pump2ps +

Fig. 1.9 Different block diagrams for modelinga pump depending on how the problem is formulated

1.3 Types of Problems in Mathematical ModelingConsider the dynamic model of a component or system represented by the block diagram in Fig. 1.11. For simplicity, let us assume a linear model with no lagged terms in the forcing variables. Then, the model can be represented in matrix form as:

Yt= AYt‐1 + BUt + CWt with  Y1=d 1.10

where the output or state variable at time t is Yt . The forcing  variables are of two types: 

‐vector U denoting observable and controllable input variables and {A B C}

W

controllable input variables, and ‐vector W indicating uncontrollableinput variables or disturbing inputs. The parameter vectors of the model are {A, B, C} while d represents the initial condition vector.

Chap 1‐Data Analysis‐Reddy 28

{A,B,C}U Y

Fig. 1.11

Page 15: Chapter 1: Mathematical Models and Data Analysisauroenergy.com/wp-content/uploads/2011/12/Chap1... · 1.2 Mathematical models 1.3 Types of problems in mathematical modeling ... 1.2.3

9/2/2011

15

Chap 1‐Data Analysis‐Reddy 29

Fig. 1.12 Different types of mathematical models used in forward and inverse approaches. The dotted line indicates that control problems

often need model selection and parameter estimation as a first step.

1.3.2 Forward Problems

Such problems are framed as one where:

Given {U,W} and {B,C,d}, determine Y 1.11

The objective is to predict the response or state variables of a specified model with known structure and known parameters when subject to specified input or forcing variables (Fig.1.12).

This is the type of models which is implicitly studied in classical mathematics and also in system simulation design courses.

Chap 1‐Data Analysis‐Reddy 30

Unique solutions: Well‐defined or well‐specified problems: Degrees of freedom (df)=0

Page 16: Chapter 1: Mathematical Models and Data Analysisauroenergy.com/wp-content/uploads/2011/12/Chap1... · 1.2 Mathematical models 1.3 Types of problems in mathematical modeling ... 1.2.3

9/2/2011

16

Solving the two equations simultaneously yields the

21 1 1

22 2 2

. . for the pump

. . for the pipe network

p a b V c V

p a b V c V

1 1 1

2 2 2 2

1a b cp

Va b cp

V

Pressuredrop

Pump curve

Operating point

simultaneously yields the performance conditions of the operating point, i.e., pressure drop and flow rate .

Note that the numerical values of the model parameters are

V Y                   B              U

Chap 1‐Data Analysis‐Reddy 31

Volume flow rate (V)

System curve

Fig. 1.13 Example of a forward problem where solving two simultaneous equations,

one representing the pump curve and the other the system curve, yields the operating point

pknown, and that and V are the two variables, while the two equations provide the two constraints.

( )p

1.3.3 Inverse Problems

Generally speaking, inverse problems are those which involve identification of 

d l t t d/ ti t fmodel structure and/or estimates of model parameters 

where the system under study already exists, and one uses measured or observed system behavior to aid in the model building. 

iff d l f hDifferent model forms may capture the data trend; this is why some argue that inverse problems are generally “ill‐defined” or “ill‐posed”. 

Chap 1‐Data Analysis‐Reddy 32

More than one possible solution, Degrees of freedom     0

Page 17: Chapter 1: Mathematical Models and Data Analysisauroenergy.com/wp-content/uploads/2011/12/Chap1... · 1.2 Mathematical models 1.3 Types of problems in mathematical modeling ... 1.2.3

9/2/2011

17

Three types of inverse models all of which require some sort of identification or estimation (Fig. 1.12):

(a) calibrated forward models where ones uses a mechanistic model originally ( ) f g ydeveloped for the purpose of system simulation, and modifies or “tunes” the numerous model parameters so that model predictions match observed system behavior as closely as possible.

Given {Y”, U”, W”, d”}, determine {A”,B”,C”} 1.13a

where the “ notation is used to represent limited measurements or reduced parameter set;parameter set;

Chap 1‐Data Analysis‐Reddy 33

Over‐constrained or under‐parameterized problems: Degrees of freedom < 0

Pressuredrop

0

0

0

0

0

0

0

0

0

0

(b) model selection and parameter estimation where a suite of plausible model structures are formulated from basic scientific and engineering principles involving known influential and physically-relevant regressors, and

Volume flow rate (V)

0 00

0

Fig. 1.14 Example of a parameter estimation problem where the model parameters of a presumed function of pressure drop versus

volume flow rate are identified from discrete experimental data points

performing experiments (or identifying system performance data) which allows these competing models to be evaluated and the “best” model identified

Given {Y, U, W, d}, determine {A,B,C}

1.13b

Chap 1‐Data Analysis‐Reddy 34

(c) models for system control and diagnostics

Statistical problems: Over‐constrained or under‐parameterized problems:             Degrees of freedom > 0

Page 18: Chapter 1: Mathematical Models and Data Analysisauroenergy.com/wp-content/uploads/2011/12/Chap1... · 1.2 Mathematical models 1.3 Types of problems in mathematical modeling ... 1.2.3

9/2/2011

18

qctb

Example 1.3.1 Simulation of a chiller Forward problem

Expansion Valve

Condenser

CompressorEvaporator

Ptc

te

Chap 1‐Data Analysis‐Reddy 35

1

qeta

te

Fig. 1.15 Schematic of the cooling plant

2 2

2 2 2 2

239.5 10.073 0.109 3.41 0.00250

- 0.2030 0.00820 0.0013 0.000080005

e e e c c

e c e c e c e c

q t t t t

t t t t t t t t

2 22.634 0.3081 0.00301 1.066 0.00528e e c cP t t t t

Five equations and five unknowns (d.f = 0): { , , , , }e c e ct t q P qY

Cooling capacity:

2 2 2 2 - 0.0011 0.000306 0.000567 0.0000031e c e c e c e ct t t t t t t t

c eq q P

( )[1 exp( )]ee e p a e

e p

UAq m c t t

m c

UA

Chiller power

1st law balance

HX in evaporator

Chap 1‐Data Analysis‐Reddy 36

( )[1 exp( )]cc c p c b

c p

UAq m c t t

m c

0 02.84 , 34.05 , 134.39 and 28.34e c et C t C q kWt P kW Solutions:

HX in condenser

Page 19: Chapter 1: Mathematical Models and Data Analysisauroenergy.com/wp-content/uploads/2011/12/Chap1... · 1.2 Mathematical models 1.3 Types of problems in mathematical modeling ... 1.2.3

9/2/2011

19

Example 1.3.2: Dose-Response Curves Inverse problems

The dots represent observed laboratory tests performed at high doses. Three types of models are fit to the data and all of them agree at high doses. However, they deviate substantially at low doses because the models are functionally differentthe models are functionally different. - Model I is a nonlinear model applicable to highly toxic agents, - Model II is generally taken to apply to contaminants that are quite harmless as low doses (i.e., the body is able to metabolize the toxin at low doses). - Model III is an intermediate one between the other two curves.

Chap 1‐Data Analysis‐Reddy 37

Fig. 1.16 Three different inverse models depending on toxin type for extrapolating dose‐response observations at high doses to the response at low doses 

The above models are somewhat empirical (or black-box) and are useful as performance models. However, they provide little understanding of the basic process itself

Issues relevant to inverse modeling

(i) can the observed data of dose versus response provide some insights into the process which induces cancer in biological cells?

(ii) H lid th lt t l t d d t l d ?(ii) How valid are these results extrapolated down to low doses?

(iii) Since laboratory tests are performed on animal subjects, how valid are these results when extrapolated to humans?

There are no simple answers to these queries (until the basic process itself is completely understood). The manner one chooses to extrapolate the dose-response curve downwards is dependent on either the assumption one

k di h b i i lf h h ( hi hmakes regarding the basic process itself or how one chooses to err (which has policy-making implications). For example, erring conservatively in terms of risk would overstate the risk and prompt implementation of more precautionary measures, which some critics would fault as is improper use of limited resources.

Chap 1‐Data Analysis‐Reddy 38

Page 20: Chapter 1: Mathematical Models and Data Analysisauroenergy.com/wp-content/uploads/2011/12/Chap1... · 1.2 Mathematical models 1.3 Types of problems in mathematical modeling ... 1.2.3

9/2/2011

20

1.4 What is Data Analysis?

• “an evaluation of collected observations so as t t t i f ti f l f ifito extract information useful for a specific purpose”

• “the process of systematically applying statistical and logical techniques to describe, summarize, and compare data”., p

Chap 1‐Data Analysis‐Reddy 39

Basic Steps in Statistical Data Analysis

• Defining the problem 

The context of the problem and the exact definition of the problem being studied need to be defined. This allows one to design both the data collection system and the subsequent analysis procedures to be followed.

• Collecting the dataNowadays the data is overwhelming, less care seems to be given

to proper experimental design, different sources o data available,

• Analyzing the data• Reporting the results

Chap 1‐Data Analysis‐Reddy 40

Page 21: Chapter 1: Mathematical Models and Data Analysisauroenergy.com/wp-content/uploads/2011/12/Chap1... · 1.2 Mathematical models 1.3 Types of problems in mathematical modeling ... 1.2.3

9/2/2011

21

1.5 Types of Uncertainty in Data• purely stochastic variability (or aleatory uncertainty) where the ambiguity

in outcome is inherent in the nature of the process, and no amount of additional measurements can reduce the inherent randomness. Examples involve coin tossing, or card games. These processes are inherently random g, g p y(either on a temporal or spatial basis), and whose outcome, while uncertain, can be anticipated on a statistical basis;

• epistemic uncertainty or ignorance or lack of complete knowledge of the process which result in certain influential variables not being considered (and, thus, not measured);

• inaccurate measurement of numerical data due to instrument or sampling errors;

• cognitive vagueness involving human linguistic description. For example, people use words like tall/short or very important/not important which cannot be quantified exactly. This type of uncertainty is generally associated with qualitative and ordinal data where subjective elements come into play.

Chap 1‐Data Analysis‐Reddy 41

1.6a Types of Methods

(a) Exploratory data analysis and descriptive statistics, which entails performing “numerical detective work” on the data and developing methods for screening, organizing, summarizing and detecting basic trends in the data (such as graphs and tables) which would help in informationin the data (such as graphs, and tables) which would help in information gathering and knowledge generation.

(b) Model building and point estimation which involves (i) taking measurements of the various parameters (or regressor variables) affecting the output (or response variables) of a device or a phenomenon, (ii) identifying a causal quantitative correlation between them by regression, and (iii) using it to make predictions about system behavior under future operating conditions.

(c) Inferential problems are those which involve making uncertainty inferences or calculating uncertainty or confidence intervals of population estimates from selected samples. They also apply to regression, i.e., uncertainty in model parameters, and in model predictions.

Chap 1‐Data Analysis‐Reddy 42

Page 22: Chapter 1: Mathematical Models and Data Analysisauroenergy.com/wp-content/uploads/2011/12/Chap1... · 1.2 Mathematical models 1.3 Types of problems in mathematical modeling ... 1.2.3

9/2/2011

22

1.6b Types of Methods

(d) Design of experiments is the process of prescribing the exact manner in which samples for testing need to be selected, and the conditions and sequence under which the testing needs to be performed such that the

l ti hi d l b t i bl d t frelationship or model between a response variable and a set of regressorvariables can be identified in a robust and accurate manner.

(e) Time series analysis and signal processing. Time series analysis involves the use of a set of tools that include traditional model building techniques as well as those involving the sequential behavior of the data and its noise.

(f) Classification and clustering problems: Classification problems are those where one would like to develop a model to statistically distinguish or “discriminate” differences between two or more groups when one knows beforehand that such groupings exist in the data set provided, and, to

b l i ll l if f l ifi d b isubsequently assign, allocate or classify a future unclassified observation into a specific group with the smallest probability of error. Clustering, on the other hand, is a more difficult problem, involving situations when the number of clusters or groups is not known beforehand, and the intent is to allocate a set of observation sets into groups which are similar or “close” to one another with respect to certain attribute(s) or characteristic(s).

Chap 1‐Data Analysis‐Reddy 43

1.6c Types of Methods

(f) Inverse modeling is an approach to data analysis methods which includes three classes: statistical calibration of mechanistic models, model selection and parameter estimation, and inferring forcing functions and boundary/initial conditions It combines the basic physics of the processboundary/initial conditions. It combines the basic physics of the process with statistical methods so as to get a better understanding of the system dynamics, and thereby use it to predict system performance either within or outside the temporal and/or spatial range used to develop the model.

(h) Risk analysis and decision making: Analysis is often a precursor to decision-making in the real world. Along with engineering analysis there are other aspects such as making simplifying assumptions, extrapolations into the future, financial ambiguity,… that come into play while making decisions Decision theory is the study of methods for arriving at “rational”decisions. Decision theory is the study of methods for arriving at rational decisions under uncertainty. The decisions themselves may or may not prove to be correct in the long term, but the process provides a structure for the overall methodology

Chap 1‐Data Analysis‐Reddy 44

Page 23: Chapter 1: Mathematical Models and Data Analysisauroenergy.com/wp-content/uploads/2011/12/Chap1... · 1.2 Mathematical models 1.3 Types of problems in mathematical modeling ... 1.2.3

9/2/2011

23

Data Sampling(1 sec-1 min)

Clean (andStore Raw

Data)

Averageand Store

Data(1-30 min)

-Initial cleaning and flagging (missing,misrecorded, dead channels)

- Gross error detection-Removal of spikes

SystemMonitoring

Formulate intention of client

MeasurementSystemDesign

1.7 Example of adata collectionsystem

- Data transformation- Data filtering

- Outlier detection- Data validation

Extract DataSub-set forIntendedAnalyses

PerformEngineering

Analysis

- Statistical inference- Identify patterns in data

- Regression analysis-Parameter estimation- System identification

- Data adequate for sound decision?

Define Issue tobe Analyzed

- Formulate intention of clientas engineering problem

- Determine analysis approach- Determine data needed

Chap 1‐Data Analysis‐Reddy 45

PerformDecisionAnalysis

PresentDecision to

Client

Perform additionalanalyses

Redesign and takeadditional

measurements

q- Is prior presumption correct?

- How to improve operation and/or effy?- Which risk-averse strategy to select?

- How to react to catastrophic risk?

End

Fig. 1.17 Flowchart depicting various stages in data analysis and decision making as applied to continuous monitoring of thermal systems

1.8 Decision analysis and data mining

There are two disciplines with overlapping/complementary aims to that of data analysis and modeling.

The first deals with decision analysis whose objective is to provideThe first deals with decision analysis whose objective is to provide both an overall paradigm and a set of tools with which decision makers can construct and analyze a model of a decision situation. This is done after the data analysis phase

The second discipline is data mining which is defined as the science of extracting useful information from large/enormous data sets. Though it is based on a range of techniques, from the very simple to the sophisticated (involving such methods as clustering techniquesthe sophisticated (involving such methods as clustering techniques, artificial neural networks, genetic algorithms,…), it has the distinguishing feature that it is concerned with shifting through large/enormous amounts of data with no clear aim in mind except to discern hidden information, discover patterns and trends, or summarize data behavior

Chap 1‐Data Analysis‐Reddy 46

Page 24: Chapter 1: Mathematical Models and Data Analysisauroenergy.com/wp-content/uploads/2011/12/Chap1... · 1.2 Mathematical models 1.3 Types of problems in mathematical modeling ... 1.2.3

9/2/2011

24

Table 1.3 Analysis Methods Covered in Book

Chapter Topic First course Second course 1 Introduction: Mathematical models and data-driven methods X X1 Introduction: Mathematical models and data-driven methods X X 2 Probability and statistics, important probability distributions X 3 Exploratory data analysis and descriptive statistics X 4 Inferential statistics, non-parametric tests and sampling X 5 OLS regression, residual analysis, point and interval estimation X 6 Design of experiments X 7 Traditional optimization methods and dynamic programming X 8 Classification and clustering analysis X 9 Time series analysis, ARIMA, process monitoring and control X 10 Parameter estimation methods X

Chap 1‐Data Analysis‐Reddy 47

11 Inverse methods (calibration, system identification, control) X 12 Decision-making and risk analysis X