statistics

28
Hydrologic Statistics READING: CHAPTER 11 IN APPLIED HYDROLOGY SOME SLIDES BY VENKATESH MERWADE 04/04/2006

Upload: tsuak

Post on 25-Dec-2015

213 views

Category:

Documents


0 download

DESCRIPTION

Hydrologic Statistic

TRANSCRIPT

Hydrologic Statistics

READING: CHAPTER 11 IN APPLIED HYDROLOGY

SOME SLIDES BY VENKATESH MERWADE

04/04/2006

Hydrologic Models

Deterministic (eg. Rainfall runoff analysis)◦ Analysis of hydrological processes using deterministic

approaches ◦ Hydrological parameters are based on physical relations

of the various components of the hydrologic cycle. ◦ Do not consider randomness; a given input produces the

same output.

Stochastic (eg. flood frequency analysis)◦ Probabilistic description and modeling of hydrologic

phenomena ◦ Statistical analysis of hydrologic data.

2

Classification based on randomness.

Probability A measure of how likely an event will occur

A number expressing the ratio of favorable outcome to the all possible outcomes

Probability is usually represented as P(.)◦ P (getting a club from a deck of playing cards) = 13/52 = 0.25 = 25 %◦ P (getting a 3 after rolling a dice) = 1/6

3

Random Variable Random variable: a quantity used to represent probabilistic uncertainty◦ Incremental precipitation ◦ Instantaneous streamflow◦ Wind velocity

Random variable (X) is described by a probability distribution

Probability distribution is a set of probabilities associated with the values in a random variable’s sample space

4

Sampling terminology Sample: a finite set of observations x1, x2,….., xn of the random variable

A sample comes from a hypothetical infinite population possessing constant statistical properties

Sample space: set of possible samples that can be drawn from a population

Event: subset of a sample space

6

ExampleExample Population: streamflowPopulation: streamflow Sample space: instantaneous streamflow, annual Sample space: instantaneous streamflow, annual

maximum streamflow, daily average streamflow maximum streamflow, daily average streamflow Sample: 100 observations of annual max. streamflowSample: 100 observations of annual max. streamflow Event: daily average streamflow > 100 cfsEvent: daily average streamflow > 100 cfs

Types of sampling Random sampling: the likelihood of selection of each member of the population is equal

◦ Pick any streamflow value from a population

Stratified sampling: Population is divided into groups, and then a random sampling is used

◦ Pick a streamflow value from annual maximum series.

Uniform sampling: Data are selected such that the points are uniformly far apart in time or space

◦ Pick steamflow values measured on Monday midnight

Convenience sampling: Data are collected according to the convenience of experimenter.

◦ Pick streamflow during summer7

Summary statistics Also called descriptive statistics

◦ If x1, x2, …xn is a sample then

8

n

iixn

X1

1

2

1

2

1

1

n

ii Xx

nS

2SS

X

SCV

Mean,

Variance,

Standard deviation,

Coeff. of variation,

for continuous data

for continuous data

for continuous data

Also included in summary statistics are median, skewness, correlation coefficient,

Graphical display Time Series plots

Histograms/Frequency distribution

Cumulative distribution functions

Flow duration curve

10

Time series plot Plot of variable versus time (bar/line/points) Example. Annual maximum flow series

11

0

100

200

300

400

500

600

1905 1908 1918 1927 1938 1948 1958 1968 1978 1988 1998

Year

An

nu

al M

ax F

low

(10

3 c

fs)

Colorado River near Austin

0

100

200

300

400

500

600

1900 1900 1900 1900 1900 1900 1900

Year

An

nu

al M

ax F

low

(10

3 c

fs)

Histogram Plots of bars whose height is the number ni, or fraction (ni/N), of data falling into one of several intervals of equal width

12

0

10

20

30

40

50

60

70

80

90

100

0 50 100 150 200 250 300 350 400 450 500

Annual max flow (103 cfs)

No

. of

occ

ure

nce

s Interval = 50,000 cfs

0

10

20

30

40

50

60

Annual max flow (103 cfs)

No

. of

occ

ure

nce

s

Interval = 25,000 cfs

0

5

10

15

20

25

30

0 50 100 150 200 250 300 350 400 450 500

Annual max flow (103 cfs)

No

. of

occ

ure

nce

s

Interval = 10,000 cfs

Dividing the number of occurrences with the total number of points will give Probability Mass Function

Using Excel to plot histograms

14

1) Make sure Analysis Tookpak is added in Tools.

This will add data analysis command in Tools

2) Fill one column with the data, and another with the intervals (eg. for 50 cfs interval, fill 0,50,100,…)3) Go to ToolsData AnalysisHistogram

4) Organize the plot in a presentable form (change fonts, scale, color, etc.)

Probability density function

Continuous form of probability mass function is probability density function

15

0

10

20

30

40

50

60

70

80

90

100

0 50 100 150 200 250 300 350 400 450 500

Annual max flow (103 cfs)

No

. of

occ

ure

nce

s

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 100 200 300 400 500 600

Annual max flow (103 cfs)

Pro

bab

ility

pdf is the first derivative of a cumulative distribution function

Cumulative distribution function

Cumulate the pdf to produce a cdf Cdf describes the probability that a random variable is less than or equal to specified value of x

17

0

0.2

0.4

0.6

0.8

1

0 100 200 300 400 500 600

Annual max flow (103 cfs)

Pro

bab

ility

P (Q ≤ 50000) = 0.8

P (Q ≤ 25000) = 0.4

Flow duration curve A cumulative frequency curve that shows the percentage of time that specified discharges are equaled or exceeded.

22

StepsSteps Arrange flows in chronological order Arrange flows in chronological order Find the number of records (N)Find the number of records (N) Sort the data from highest to lowest Sort the data from highest to lowest Rank the data (m=1 for the highest value and m=N for the lowest value)Rank the data (m=1 for the highest value and m=N for the lowest value) Compute exceedance probability for each value using the following Compute exceedance probability for each value using the following

formulaformula

Plot p on x axis and Q (sorted) on y axisPlot p on x axis and Q (sorted) on y axis

1100

N

mp

Flow duration curve in Excel

23

0

100

200

300

400

500

600

0 20 40 60 80 100

% of time Q will be exceeded

Q (

1000

cfs

) Median flow

Statistical analysis

Regression analysis

Mass curve analysis

Flood frequency analysis

Many more which are beyond the scope of this class!

24

Linear Regression

A technique to determine the relationship between two random variables.◦ Relationship between discharge and velocity in a stream◦ Relationship between discharge and water quality constituents

25

A regression model is given by :A regression model is given by :

yi = ith observation of the response (dependent variable)

xi = ith observation of the explanatory (independent) variable

0 = intercept

1 = slope

i = random error or residual for the ith observation

n = sample size

nixy iii ,...,2,110

Least square regression We have x1, x2, …, xn and y1,y2, …, yn observations of independent and dependent variables, respectively.

Define a linear model for yi,

Fit the model (find b0 and b1) such at the sum of the squares of the vertical deviations is minimum

◦ Minimize

26

nixy ii ,...,2,1ˆ 10

nixyyy iiii ,...,2,1)(ˆ 210

2

Regression applet: http://www.math.csusb.edu/faculty/stanton/m262/regress/regress.html

Linear Regression in Excel Steps:

◦ Prepare a scatter plot◦ Fit a trend line

27

TDS = 0.5946(sp. Cond) - 15.709R2 = 0.9903

0

300

600

900

1200

1500

1800

0 500 1000 1500 2000 2500 3000

Specific Conductance ( S/cm)

TD

S (

mg

/L)

Alternatively, one can use ToolsAlternatively, one can use ToolsData AnalysisData AnalysisRegressionRegression

Data are for Brazos River near Highbank, TX

Coefficient of determination (R2)

It is the proportion of observed y variation that can be explained by the simple linear regression model

28

SST

SSER 12

2)( yySST i Total sum of squares, Ybar is the mean of yi

2)ˆ( ii yySSE Error sum of squares

The higher the value of RThe higher the value of R22, the more successful is the model in explaining y , the more successful is the model in explaining y variation.variation.

If RIf R22 is small, search for an alternative model (non linear or multiple regression is small, search for an alternative model (non linear or multiple regression model) that can more effectively explain y variationmodel) that can more effectively explain y variation