ucla stat 110b ivo dinov text: statistics for …dinov/courses_students.dir/03/spr...text:...

20
1 Stat 110B, UCLA, Ivo Dinov Slide 1 UCLA STAT 110B Applied Statistics for Engineering and the Sciences zInstructor : Ivo Dinov, Asst. Prof. In Statistics and Neurology zTeaching Assistants: Brian Ng, UCLA Statistics University of California, Los Angeles, Spring 2003 http://www.stat.ucla.edu/~dinov/courses_students.html Stat 110B, UCLA, Ivo Dinov Slide 2 Course Organization http://www.stat.ucla.edu/~dinov/courses_students.html Software : No specific software is required. SYSTAT, R, SOCR resource, etc. Text : Introduction to Probability and Statistics for Engineering and the Sciences 5 th edition -- Jay Devore Course Description, Class homepage, online supplements, VOH’s, etc. Stat 110B, UCLA, Ivo Dinov Slide 3 S Material Covered : (Devore, Chapters 7-14) S Review of Key Concepts (ch 01-06) S Confidence Intervals (ch 07) S Single Sample Hypotheses testing (ch 08) S Inferences based on 2 samples (ch 09) S One- Two- and Three-Factor ANOVA (ch 10) S 2 k Factorial Designs (ch 11) S Linear Regression (ch 12) S Multiple & Nonlinear Regression (ch 13) S Goodness-of-Fit Testing (ch 14) Course Organization Stat 110B, UCLA, Ivo Dinov Slide 4 Overall Review What is a statistic? - Any quantity whose value can be calculated from sample data. It does not depend on any unknown parameter. - Examples – What are Random Variables? - A function from the sample space to the real number line. Before any data is collected, we view all observations and statistics as random variables Stat 110B, UCLA, Ivo Dinov Slide 5 Properties of Expectation and Variance S Let X be a random variable and a,b be constants. It follows that: ( ) ] [ ) ( ] [ ] [ ] [ ] [ ] [ ] [ ] [ 2 2 2 2 X Var X SD X E X E X Var X Var a b aX Var b X aE b aX E = = = + + = + Stat 110B, UCLA, Ivo Dinov Slide 6 Linear Combinations of Random Variables Consider the collection of the independent random variables X 1 ,…,X n where E[X i ]=µ i and Var[X i ]=σ i 2 , and let a 1 ,…,a n be constants. Define a random variable Y by which is a linear combination of the X i ’s. It follows that n n n n n n a a X E a X E a X a X a E µ µ + + = + + = + + ... ] [ ... ] [ ] ... [ 1 1 1 1 1 1 2 2 2 1 2 1 2 1 2 1 1 1 ... ] [ ... ] [ ] ... [ n n n n n n a a X Var a X Var a X a X a Var σ σ + + = + + = + + n n X a X a Y + + = ... 1 1 What if dependent?!?

Upload: others

Post on 08-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: UCLA STAT 110B Ivo Dinov Text: Statistics for …dinov/courses_students.dir/03/Spr...Text: Introduction to Probability and Statistics for Engineering and the Sciences 5th edition--

1

Stat 110B, UCLA, Ivo Dinov Slide 1

UCLA STAT 110BApplied Statistics for Engineering

and the Sciences

�Instructor: Ivo Dinov, Asst. Prof. In Statistics and Neurology

�Teaching Assistants: Brian Ng, UCLA Statistics

University of California, Los Angeles, Spring 2003http://www.stat.ucla.edu/~dinov/courses_students.html

Stat 110B, UCLA, Ivo DinovSlide 2

Course Organization

http://www.stat.ucla.edu/~dinov/courses_students.html

Software: No specific software is required. SYSTAT, R, SOCR resource, etc.

Text: Introduction to Probability and Statistics for Engineering and the Sciences 5th edition -- Jay Devore

Course Description, Class homepage,

online supplements,VOH’s, etc.

Stat 110B, UCLA, Ivo DinovSlide 3

� Material Covered: (Devore, Chapters 7-14)

�Review of Key Concepts (ch 01-06)�Confidence Intervals (ch 07)�Single Sample Hypotheses testing (ch 08)� Inferences based on 2 samples (ch 09)�One- Two- and Three-Factor ANOVA (ch 10)�2k Factorial Designs (ch 11)�Linear Regression (ch 12)�Multiple & Nonlinear Regression (ch 13)�Goodness-of-Fit Testing (ch 14)

Course Organization

Stat 110B, UCLA, Ivo DinovSlide 4

Overall Review

What is a statistic?- Any quantity whose value can be calculated from

sample data. It does not depend on any unknown parameter.

- Examples –

What are Random Variables?- A function from the sample space to the real number

line.Before any data is collected, we view all observations

and statistics as random variables

Stat 110B, UCLA, Ivo DinovSlide 5

Properties of Expectation and Variance

� Let X be a random variable and a,b be constants. It follows that:

( )][)(

][][][

][][

][][

2

22

2

XVarXSDXEXEXVar

XVarabaXVarbXaEbaXE

=

−=

=+

+=+

Stat 110B, UCLA, Ivo DinovSlide 6

Linear Combinations of Random Variables

Consider the collection of the independent random variables X1,…,Xn where E[Xi]=µi and Var[Xi]=σi

2, and let a1,…,an be constants. Define a random variable Y by

which is a linear combination of the Xi’s. It follows that

nnnnnn aaXEaXEaXaXaE µµ ++=++=++ ...][...][]...[ 111111

2221

21

21

2111

...

][...][]...[

nn

nnnn

aa

XVaraXVaraXaXaVar

σσ ++=

++=++

nn XaXaY ++= ...11

What if dependent?!?

Page 2: UCLA STAT 110B Ivo Dinov Text: Statistics for …dinov/courses_students.dir/03/Spr...Text: Introduction to Probability and Statistics for Engineering and the Sciences 5th edition--

2

Stat 110B, UCLA, Ivo DinovSlide 7

Random Sample

X1,…,Xn are an IID random sample of size n if:

1. The Xi’s are independent random variables

2. Every Xi has the same (identical) probability distribution

These conditions are equivalent to the Xi’s being independent and identically distributed (iid) random variables

Stat 110B, UCLA, Ivo DinovSlide 8

Sample Mean and Total of a Random Sample

The sample mean is given by the random variable defined as

The sample total is given by the random variable To defined as

�=

=n

iiX

nX

1

1

�=

=n

iio XT

1

X

Stat 110B, UCLA, Ivo DinovSlide 9

Mean and Variance of To

For the total-sum random variable

T0 = X1+…+Xn

T0~N(nµ, nσ2 ).

Stat 110B, UCLA, Ivo DinovSlide 10

Mean and Variance of X

For the total-sum random variable

= (1/n) (X1+…+Xn)

~N(µ, σ2/n ).

X

X

Stat 110B, UCLA, Ivo DinovSlide 11

Let X1,…,Xn be a random sample from a normally distributed population with mean µand variance σ2, i.e. Xi~N(µ, σ2). It follows that the random variable Y = a1X1+…+anXn is normally distributed with mean a1µ,…,an µand variance a1

2σ2+…+an2 σ2. Hence, the

sample mean and the sample total of the random sample will be normally distributed.

Linear Combinations of Normal Random Variables from a Random Sample

Stat 110B, UCLA, Ivo DinovSlide 12

The central limit theorem gives us information about the sample mean and the sample total for a “large” (n>30) random sample from a population that is not normally distributed. Specifically, it tells us that these will be approximately normally distributed. The larger n is, the better the approximation.

Central Limit TheoremArguably the most important theorem in Statistics (GUT theory)

Page 3: UCLA STAT 110B Ivo Dinov Text: Statistics for …dinov/courses_students.dir/03/Spr...Text: Introduction to Probability and Statistics for Engineering and the Sciences 5th edition--

3

Stat 110B, UCLA, Ivo DinovSlide 13

When a certain type of electrical resistor is manufactured, the mean resistance is 4 ohms with a standard deviation of 1.5 ohms. If 36 batches are independently produced, what is the probability that the sample average resistance of the batch is between 3.5 and 4.5 ohms. What is the probability that the sample total resistance is greater than 140 ohms?

Do InteractiveNormalCurve & CLT_Sampling Distribution Applets from SOCR resource

Example – Central Limit Theorem

Stat 110B, UCLA, Ivo DinovSlide 14

Uni- vs. Multi-modal histograms

� Number of clear humps on the frequency histogram plot determines the modality of a histogram plot.

0

5

10

15

20

25

30

35

0 40 80 130 More

02468

10

121416

2 30 58 86

Stat 110B, UCLA, Ivo DinovSlide 15

Skewness & Symmetry of histograms

� A histogram is symmetric is the bars (bins) to the left of some point (mean) are approximately mirror images of those to the right of the mean.

� Histogram is skewed if it is not symmetric, the histogram is heavy to the left or right, or non-identical on both sides of the mean.

0

20

40

60

80

100

1 2 3 4 5 6 7 8 9 100

10

20

30

40

50

60

1 2 3 4 5 6 7 8 9 100

10

20

30

40

50

60

1 2 3 4 5 6 7

file:///C:/Ivo.dir/UCLA_Classes/Applets.dir/HistogramApplet.html

Stat 110B, UCLA, Ivo DinovSlide 16

Skewness & Kurtosis

� What do we mean by symmetry and positive and negative skewness? Kurtosis? Properties?!?

� Skewness is linearly invariant Sk(aX+b)=Sk(X)

� Skewness is a measure of unsymmetry

� Kurtosis is (also linearly invariant) a measure of flatness

� Both are used to quantify departures from StdNormal

� Skewness(StdNorm)=0; Kurtosis(StdNorm)=3

( ) ( )4

1

4

31

3

)1(Kurtosis ;

)1(Skewness

SDN

YY

SDN

YYN

kk

N

kk

−=

−=

��==

Stat 110B, UCLA, Ivo DinovSlide 17

Comparing 3 plots of the same data

Stem-and-leaf of strength N = 33Leaf Unit = 10

1 19 8 5 20 0334 5 20 10 21 00233 (8) 21 55668899 15 22 000111112 6 22 5 5 23 014 2 23 2 24 2 24 2 25 2 1 25 9

2000 2100 2200 2300 2400 2500 2600strength

2000 2100 2200 2300 2400 2500 2600strength

Figure 2.4.5 Three graphs of the breaking-strength data for}gear-teeth in positions 4 & 10 (Minitab output).

Stat 110B, UCLA, Ivo DinovSlide 18

Important points

1. The distinction between a randomized experimentand an observational study is made at the time of result interpretation. The very same statistical analysis is carried for the two situations.

2. We’ve already stressed the importance of plotting data prior to stat-analysis. Plots have many important roles – prevent dangerous misconceptions from arising (data overlaps, clusters, outliers, skewness, trends in the data, etc.)

Page 4: UCLA STAT 110B Ivo Dinov Text: Statistics for …dinov/courses_students.dir/03/Spr...Text: Introduction to Probability and Statistics for Engineering and the Sciences 5th edition--

4

Stat 110B, UCLA, Ivo DinovSlide 19

Analyzing Histogram Plots

� Modality – uni- vs. multi-modal (Why do we care?)

� Symmetry – how skewed is the histogram?

� Center of gravity for the Histogram plot – does it make sense?

� If center-of-gravity exists quantify the spread of the frequencies around this point.

� Strange patterns – gaps, atypical frequencies lying away from the center.

Stat 110B, UCLA, Ivo DinovSlide 20

Measures of central tendency (location)

� Mean – sum of all observations divided by their number� Median – (second quartile, Q2) is the half-way-point for

the distribution, 50% of all data are greater than it and 50% are smaller than Q2.

� Mode – the (list of) most frequently occurring observation(s).

25% 25%25%

median25%

mean

Range [min : max]

mode

Stat 110B, UCLA, Ivo DinovSlide 21

Measures of variability (deviation)

� Mean Absolute Deviation (MAD) –

� Variance –

� Standard Deviation –

�=

−−

=n

i i yyn

MAD11

1

( )�=

−−

==n

i i yyn

sVar1

22

11

( )�=

−−

===n

i i yyn

sVarSD1

2

11

Stat 110B, UCLA, Ivo DinovSlide 22

Measures of variability (deviation)

� Example:� Mean Absolute Deviation–

� Variance –

� Standard Deviation –

� X={1, 2, 3, 4}.

�=

−−

=n

i i yyn

MAD11

1

( )�=

−−

==n

i i yyn

sVar1

22

11

( )�=

−−

=n

i i yyn

SD1

2

11

1 2 3 4m=2.5

MAD=4/3=1.33Var=5/3=1.67SD=1.3

Stat 110B, UCLA, Ivo DinovSlide 23

Trimmed, Winsorized means and Resistancy

� A data-driven parameter estimate is said to be resistant if it does not greatly change in the presence of outliers.

� K-times trimmed mean

� Winsorized k-times mean:

�−

+=−=

kn

ki itk ykn

y1 )(2

1

[ ])(

1

2 )()1( )1()1(1kn

kn

ki ikwk ykyykn

y −

−−

+=+ ++++= �

Orderstatistic

Stat 110B, UCLA, Ivo DinovSlide 24

Stationary or Non-Stationary Process?

�To assess stationarity:� Rigorous assessment: A stationary process has a constant

mean, variance, and autocorrelation through time/place.

� Visual assessment: (Plot the data – observed vs. time/place – the parameter we argue stationarity with respect to).

Time-Series Plot of the KWH Data

0

10

20

30

40

50

60

1 16 31 46 61 76 91 106

121

136

151

166

181

196

Date

KWH

Series1

Page 5: UCLA STAT 110B Ivo Dinov Text: Statistics for …dinov/courses_students.dir/03/Spr...Text: Introduction to Probability and Statistics for Engineering and the Sciences 5th edition--

5

Stat 110B, UCLA, Ivo DinovSlide 25

Stationary or Non-Stationary Process?

� Visual assessment: (Plot the data – observed vs. time/place, etc., – parameter we argue stationarity with respect to).

Scatter Plot of the KWH Data

0

10

20

30

40

50

60

0 50 100 150 200 250

Date

KWH Series1

Stat 110B, UCLA, Ivo DinovSlide 26

Moving Averages

� Signal, Noise, Filtering: Oftentimes high frequency oscillations in the data make it difficult to read/interpret the data.

117 33 49 65 81 97

113

129

145

161

177

193 S1

0

10

20

30

40

50

60

Series1

Stat 110B, UCLA, Ivo DinovSlide 27

Moving Averages – next 10 values are averaged

� Signal, Noise, Filtering: Oftentimes high frequency oscillations in the data make it difficult to read/interpret the data.

Moving Average Effects on the Raw Data (KWH)

0102030405060

1 15 29 43 57 71 85 99 113

127

141

155

169

183

197

Date

KW

H

Raw KWH data Moving AverageStat 110B, UCLA, Ivo DinovSlide 28

Properties of probability distributions

� A sequence of number {p1, p2, p3, …, pn } is a probability distribution for a sample space S = {s1, s2, s3, …, sn}, if pr(sk) = pk, for each 1<=k<=n. The two essential properties of a probability distribution p1, p2, … , pn?

� How do we get the probability of an event from the probabilities of outcomes that make up that event?

� If all outcomes are distinct & equally likely, how do we calculate pr(A) ? If A = {a1, a2, a3, …, a9} and pr(a1)=pr(a2)=…=pr(a9 )=p;then

pr(A) = 9 x pr(a1) = 9p.

1 ;0 =≥ �k kk

pp

Stat 110B, UCLA, Ivo DinovSlide 29

The conditional probability of A occurring given that B occurs is given by

pr(A | B) =pr(A and B)

pr(B)

Conditional Probability

Suppose we select one out of the 400 patients in the study and we want to find the probability that the cancer is on the extremitiesgiven that it is of type nodular: P = 73/125 = P(C. on Extremities | Nodular)

patientsnodular #sextremitieon cancer with patientsnodular #

Stat 110B, UCLA, Ivo DinovSlide 30

pr(A and B) = pr(A | B)pr(B) = pr(B | A)pr(A)

Multiplication rule- what’s the percentage of Israelis that are poor and Arabic?

00.0728

0.14 1.0

All people in Israel

14% of these are Arabic

52% of this 14% are poor

7.28% of Israelis are both poor and Arabic(0.52 .014 = 0.0728)

Figure 4.6.1 Illustration of the multiplication rule.From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.

Page 6: UCLA STAT 110B Ivo Dinov Text: Statistics for …dinov/courses_students.dir/03/Spr...Text: Introduction to Probability and Statistics for Engineering and the Sciences 5th edition--

6

Stat 110B, UCLA, Ivo DinovSlide 31

Permutation: Number of ordered arrangements of robjects chosen from n distinctive objects

e.g. P63 = 6·5·4 =120.

Permutation & Combination

rr

rnn

nn PPP ·−=

)1()2)(1( +−…−−= rnnnnP rn

Stat 110B, UCLA, Ivo DinovSlide 32

Combination: Number of non-orderedarrangements of r objects chosen from n distinctive objects:

Or use notation of e.g. 3!=6 , 5!=120 , 0!=1

Permutation & Combination

!)!(!!/

rrnnrPC r

nrn −

==

( ) rn

nr C=

( ) 35!3!4

!773 ==

Stat 110B, UCLA, Ivo DinovSlide 33

Combinatorial Identity:

Analytic proof: (expand both hand sides)Combinatorial argument: Given n object focus on one of

them (obj. 1). There are groups of size r that contain obj. 1 (since each group contains r-1 other elements out of n-1). Also, there are groups of size r, that do not contain obj1. But the total of all r-size groups of n-objects is !

Permutation & Combination

( ) ( ) ( )111

−−− += n

rnr

nr

����

��

−−

11

rn

����

�� −

rn

1

����

��rn

Stat 110B, UCLA, Ivo DinovSlide 34

Combinatorial Identity:

Analytic proof: (expand both hand sides)Combinatorial argument: Given n objects the number of

combinations of choosing any r of them is equivalent to choosing the remaining n-r of them (order-of-objs-not-important!)

Permutation & Combination

( ) ( )nrn

nr −=

Stat 110B, UCLA, Ivo DinovSlide 35

Examples

1. Suppose car plates are 7-digit, like AB1234. If all the letters can be used in the first 2 places, and all numbers can be used in the last 4, how many different plates can be made? How many plates are there with no repeating digits?

Solution: a) 26·26·10·10·10·10

b) P262 · P10

3 = 26·25·10·9·8·7

Stat 110B, UCLA, Ivo DinovSlide 36

Examples

2. How many different letter arrangement can be made from the 11 letters of MISSISSIPPI?

(((( ))))(((( ))))(((( ))))(((( )))) (((( ))))(((( ))))(((( ))))(((( ))))11

54

94

112

22

64

104

111 ... ========

Solution: There are: 1 M, 4 I, 4 S, 2 P letters.Method 1: consider different permutations:

11!/(1!4!4!2!)=34650Method 2: consider combinations:

Page 7: UCLA STAT 110B Ivo Dinov Text: Statistics for …dinov/courses_students.dir/03/Spr...Text: Introduction to Probability and Statistics for Engineering and the Sciences 5th edition--

7

Stat 110B, UCLA, Ivo DinovSlide 37

Examples

3. There are N telephones, and any 2 phones are connected by 1 line. Then how many lines are needed all together?

Solution: C2N = N (N - 1) / 2

If, N=5, complete graph with 5 nodes hasC2

5=10 edges.

Stat 110B, UCLA, Ivo DinovSlide 38

Binomial theorem & multinomial theorem

Deriving from this, we can get such useful formula (a=b=1)

Also from (1+x)m+n=(1+x) m(1+x)n we obtain:

On the left is the coeff of 1kx(m+n-k). On the right is the same coeff in the product of (…+ coeff * x(m-i) +…) * (…+coeff * x(n-k+i) +…).

( ) knkn

k

nk

n baba −

=�=+ )(

0

( ) ( ) ( ) ( )nnnn

nn 112...10 +==+++

( ) ( )( )nik

k

i

mi

nmk −

=

+�=

0

Binomial theorem

Stat 110B, UCLA, Ivo DinovSlide 39

Multinomial theorem

( ) knkn

k

nk

n baba −

=�=+ )(

0

( ) r

r

nr

nnnnnn

nr xxxxxx ... )...( 21

21 21

,...,,21 �=+++

Generalization: Divide n distinctive objects into r groups, with the size of every group n1 ,…,nr, and n1+n2+…+nr = n

( ) ( )( ) ( )!!...!

!...21

...

,...,,

1112121

r

nnnn

nnn

nn

nnnn nnn

nr

rr== −−−−−where

Stat 110B, UCLA, Ivo DinovSlide 40

Sterling Formula for asymptotic behavior of n!

Sterling formula:

n

en

nn �

���

�×= π2!

Stat 110B, UCLA, Ivo DinovSlide 41

Probability and Venn diagrams

Proposition

P(A1U A2U… U An)=

( )

( )( )���

���

iniin

nirji iriir

nii ii

n

i i

AAAP

AAAP

AAPAP

....)1(

.......)1(

...)(

211

...211 211

211 211

+

≤<<<≤

+

≤<≤=

−+

+−+

+−

��

Stat 110B, UCLA, Ivo Dinov Slide 42

Discrete Variables, Probabilities

Page 8: UCLA STAT 110B Ivo Dinov Text: Statistics for …dinov/courses_students.dir/03/Spr...Text: Introduction to Probability and Statistics for Engineering and the Sciences 5th edition--

8

Stat 110B, UCLA, Ivo DinovSlide 43

Binomial Probabilities –the moment we all have been waiting for!

� Suppose X ~ Binomial(n, p), then the probability

� Where the binomial coefficients are defined by

nxppxn

xXP xnx ≤≤≤≤≤≤≤≤−−−−��������

������������

����======== −−−− 0 ,)1()( )(

nnnxxn

nxn

××××−−−−××××××××××××××××====−−−−

====��������

������������

���� )1(...321! ,! )!(

!

n-factorial

Stat 110B, UCLA, Ivo DinovSlide 44

Prize ($) x 1 2 3Probability pr(x) 0.6 0.3 0.1

$ won

Total prize money = Sum; Average prize money = Sum/100 = 1 0.6 + 2 0.3 + 3 0.1 = 1.5

SumNumber of games won

What we would "expect" from 100 games add across row0.6 100 0.3 100 0.1 100

2 0.3 100 3 0.1 1001 0.6 100

Expected values

� The game of chance: cost to play:$1.50; Prices {$1, $2, $3}, probabilities of winning each price are {0.6, 0.3, 0.1}, respectively.

� Should we play the game? What are our chances of winning/loosing?

Theoretically Fair Game: price to play EQ the expected return!

Stat 110B, UCLA, Ivo DinovSlide 45

)-1( = )sd( pnpX

For the Binomial distribution . . . mean

E(X) = n p,

X~Binomial(n, p) ����

X=Y1+Y2+Y3+..+Yn,where Yk ~Bernoulli(p),

E(Y1)=p ����E(X) = E(Y1+Y2+Y3+..+Yn)=np

Stat 110B, UCLA, Ivo DinovSlide 46

Poisson Distribution – Definition

� Used to model counts – number of arrivals (k) on a given interval …

� The Poisson distribution is also sometimes referred to as the distribution of rare events. Examples of Poisson distributed variables are number of accidents per person, number of sweepstakes won per person, or the number of catastrophic defects found in a production process.

Stat 110B, UCLA, Ivo Dinov Slide 47

Functional Brain Imaging - Positron Emission Tomography (PET)

http://www.nucmed.buffalo.eduStat 110B, UCLA, Ivo DinovSlide 48

Poisson Distribution – Mean

� Used to model counts – number of arrivals (k) on a given interval …

� Y~Poisson( ), then P(Y=k) = , k = 0, 1, 2, …

� Mean of Y, µY = λ, since!

ek

k λλ −

λ

λλλλλλ

λλλ

λλλλ

λλλ

===−

=

=−

===

−∞

=

−∞

=

−−

=

−∞

=

−∞

=

��

���

ee!

e)!1(

e

)!1(e

!e

!e)(

01

1

100

k

k

k

k

k

k

k

k

k

k

kk

kkk

kkYE

Page 9: UCLA STAT 110B Ivo Dinov Text: Statistics for …dinov/courses_students.dir/03/Spr...Text: Introduction to Probability and Statistics for Engineering and the Sciences 5th edition--

9

Stat 110B, UCLA, Ivo DinovSlide 49

Poisson Distribution - Variance

� Y~Poisson( ), then P(Y=k) = , k = 0, 1, 2, …

� Variance of Y, σY = λ½, since

� For example, suppose that Y denotes the number of blocked shots (arrivals) in a randomly sampled gamefor the UCLA Bruins men's basketball team. Then a Poisson distribution with mean=4 may be used to model Y .

!ek

k λλ −λ

λλλσλ

==−== �∞

=

...!

e)()(0

22

k

k

Y kkYVar

Stat 110B, UCLA, Ivo DinovSlide 50

Poisson as an approximation to Binomial

� Suppose we have a sequence of Binomial(n, pn)models, with lim(n pn) � λλλλ, as n�infinity.

� For each 0<=y<=n, if Yn~ Binomial(n, pn), then

� P(Yn=y)=�But this converges to:

� Thus, Binomial(n, pn) � Poisson(λλλλ)

ynn

y

n ppyn −−����

�� )1(

!)1(

yey

ppyn

npn

nyn

n

y

n

λλλ

− →−�

���

��

→×

∞→−

WHY?

Stat 110B, UCLA, Ivo DinovSlide 51

Poisson as an approximation to Binomial

� Rule of thumb is that approximation is good if:

� n>=100� p<=0.01� λλλλ =n p <=20

� Then, Binomial(n, pn) � Poisson(λλλλ)

Stat 110B, UCLA, Ivo DinovSlide 52

Example using Poisson approx to Binomial

� Suppose P(defective chip) = 0.0001=10-4. Find the probability that a lot of 25,000 chips has > 2 defective!

� Y~ Binomial(25,000, 0.0001), find P(Y>2). Note that Z~Poisson(λλλλ =n p =25,000 x 0.0001=2.5)

456.0!25.2

!15.2

!05.21

!5.21)2(1)2(

5.22

5.21

5.20

2

0

5.2

=��

���

� ++−

=−=≤−=>

−−−

=

−�

eee

ez

ZPZPz

z

Stat 110B, UCLA, Ivo DinovSlide 53

Geometric, Hypergeometric, Negative Binomial

� X ~ Geometric(p), then the probability mass function is

Probability of first failure at xth trial.

� Ex: Stat dept purchases 40 light bulbs; 5 are defective.

Select 5 components at random.

Find: P(3rd bulb used is the first that does not work) = ?

21 1)( ;1)( ;)1()(

ppXVar

ppXEppxXP x −=−=−== −

Stat 110B, UCLA, Ivo DinovSlide 54

Geometric, Hypergeometric, Negative Binomial

� Hypergeometric – X~HyperGeom(x; N, n, M)Total objects: N. Successes: M. Sample-size: n (without

replacement). X = number of Successes in sample

Ex: 40 components in a lot; 3 components are defectives.

Select 5 components at random.

P(obtain one defective) = P(X=1) = ?

��

���

��

���

−−

��

���

==

nN

xnMN

xM

xXP )(

NMN

NMn

NnNXVar

NMnXE

−×××−−=

=

1)(

)(

Page 10: UCLA STAT 110B Ivo Dinov Text: Statistics for …dinov/courses_students.dir/03/Spr...Text: Introduction to Probability and Statistics for Engineering and the Sciences 5th edition--

10

Stat 110B, UCLA, Ivo DinovSlide 55

Hypergeometric Distribution & Binomial

�Binomial approximation to Hyperheometric�

Ex: 4,000 out of 10,000 residents are against a new tax. 15 residents are selected at random.

PHyperGeom(at most 7 favor the new tax) = ? (0.78706)Demo: Applets.dir/ProbCalc.htm (PBin(Y<=7)=0.7869]HyperGeom(x; N=104, n=15, M=4x103) � Bin(x;n=15,p=0.4)

pNM

Nn ≈< then0.1),(usually small is

),;(/

),,;( pnxBinpNM

MnNxHyperGeomapproaches

≈�

Stat 110B, UCLA, Ivo DinovSlide 56

Geometric, Hypergeometric, Negative Binomial

� Negative binomial pmf [X ~ NegBin(r, p), if r=1 �Geometric (p)]

Number of trials until the rth success (negative, since number of successes (r) is fixed & number of trials (X) is random)

ppxXP x 1)1()( −−==

2)1()( ;)(

)1(11

)(

pprXVar

prXE

pprn

nXP rnr

−==

−��

���

−−

== − Find E(X) and Var(X)X=# of times one mustThrow a dice until theOutcome 1 occurs 4Times:X~NegBin(x;r=4,p=1/6)E(X)=24; Var(X)=120

Stat 110B, UCLA, Ivo DinovSlide 57

Continuous RV’s

� A RV is continuous if it can take on any real value in a non-trivial interval (a ; b).

� PDF, probability density function, for a cont. RV, Y, is a non-negative function pY(y), for any real value y, such that for each interval (a; b), the probability that Y takes on a value in (a; b), P(a<Y<b) equals the area under pY(y) over the interval (a: b).

pY(y)

a b

P(a<Y<b)

Stat 110B, UCLA, Ivo DinovSlide 58

Convergence of density histograms to the PDF

� For a continuous RV the density histograms converge to the PDF as the size of the bins goes to zero.

Stat 110B, UCLA, Ivo DinovSlide 59

Measures of central tendency/variability for Continuous RVs

� Mean

� Variance

� SD

�∞

∞−

×= dyypy YY )(µ

�∞

∞−

×−= dyypy YYY )()( 22 µσ

�∞

∞−

×−= dyypy YYY )()( 2µσStat 110B, UCLA, Ivo DinovSlide 60

Facts about PDF’s of continuous RVs

� Non-negative

� Completeness

� Probability

yypY ∀≥ ,0)(

1)( =�∞

∞−

dyypY

� ×=<<b

aY dyypybYaP )()(

Page 11: UCLA STAT 110B Ivo Dinov Text: Statistics for …dinov/courses_students.dir/03/Spr...Text: Introduction to Probability and Statistics for Engineering and the Sciences 5th edition--

11

Stat 110B, UCLA, Ivo DinovSlide 61

Continuous Distributions

� Uniform distribution

� Normal distribution

� Student’s T distribution

� F-distribution

� Chi-squared ( )

� Cauchy’s distribution

� Exponential distribution

� Poisson distribution, …

Stat 110B, UCLA, Ivo DinovSlide 62

(Continuous) Uniform Distribution

f(x) =,αβ −

1βα <<x

0 , otherwise

2)( βα +=XE

12)()(

2αβ −=XVar,

Ex) Uniform, α = 2, β = 7

(a)

(b)

=≥ )4(XP=<< )5.53( XP

βα

αβ −1

x

f(x)

• X ~ Uniform Distribution with parameters α and β if

• random numbers follow Uniform between 0 and 1

Stat 110B, UCLA, Ivo DinovSlide 63

(General) Normal Distribution

� Normal Distribution PDF: Y~Normal(µ, σµ, σµ, σµ, σ2222) ��

��∞−

−−

∞−

−−

==

∞<<∞−∀=

yx

y

YY

y

Y

dxedxxpyF

yeyp

2

2)(

2

2)(

2)()(

,2

)(

2

2

2

2

πσ

πσσµ

σµ

Stat 110B, UCLA, Ivo DinovSlide 64

Continuous Distributions – Student’s T

� Student’s T distribution [approx. of Normal(0,1)]�Y1, Y2, …, YN IID from a Normal(µ;σ)�Variance σ2 is unknown

� In 1908, William Gosset (pseudonym Student) derived the exact sampling distribution of the following statistics

� T~Student(df=N-1), where Y

YYTσ

µˆ−=

( )1

ˆ 1

2

−=�

=

N

YYN

kk

Stat 110B, UCLA, Ivo DinovSlide 65

Density curves for Student’s t

∞∞∞∞

0 2 4- 2- 4

df = ×[i.e., Normal(0,1)]

df = 5df = 2

Figure 7.6.1 Student(df) density curves for various df.

We will come back to theT-distribution at the endof this chapter!

Stat 110B, UCLA, Ivo DinovSlide 66

Continuous Distributions – χχχχ2 [Chi-Square]

�χ2 [Chi-Square] goodness of fit test:�Let {X1, X2,…, XN} are IID N(0, 1)�W = X1

2 + X22 + X3

2 + …+ XN2

�W ~ χ2(df=N)�Note: If {Y1, Y2, …, YN} are IID N(µ, σµ, σµ, σµ, σ2222), then

�And the Statistics W ~ χ2(df=N-1)�E(W)=N; Var(W)=2N

( )�=

−−

=N

kk YY

NYSD

1

221

1)(

)(1 22 YSDNW

σ−=

χ2

Page 12: UCLA STAT 110B Ivo Dinov Text: Statistics for …dinov/courses_students.dir/03/Spr...Text: Introduction to Probability and Statistics for Engineering and the Sciences 5th edition--

12

Stat 110B, UCLA, Ivo DinovSlide 68

Continuous Distributions – F-distribution

� F-distribution is the ratio of two χχχχ2 random variables.

� Snedecor's F distribution is most commonly used in tests of variance (e.g., ANOVA). The ratio of two chi-squares divided by their respective degrees of freedom is said to follow an F distribution

( ) ( )��==

−−

=−−

=M

ll

N

kk XX

MXSDYY

NYSD

1

22

1

221

1)( ;1

1)(

);(1 );(1 22

22 XSDMWYSDNW

XX

YY

σσ−=−=

)1,1(F~)1/()1/(

21 −=−=−−= MdfNdf

MWNWF

XY

o

Stat 110B, UCLA, Ivo DinovSlide 71

Continuous Distributions – Cauchy’s

� Cauchy’s distribution, X~Cauchy(t,s), t=location; s=scale� PDF(X):

� PDF(Std Cauchy’s(0,1)):

� The Cauchy distribution is (theoretically) important as an example of a pathological case. Cauchy distributions look similar to a normaldistribution. However, they have much heavier tails. When studying hypothesis tests that assume normality, seeing how the tests perform on data from a Cauchy distribution is a good indicator of how sensitive the tests are to heavy-tail departures from normality. The mean and standard deviation of the Cauchy distribution are undefined!!! The practical meaning of this is that collecting 1,000 data points gives no more accurate of an estimate of the mean and standard deviation than does a single point (Cauchy=Tdf=0�Tdf�Normal).

( ) (reals) x;)/)(1

1)(2

R∈−+

=stxs

xfπ

( )211)(

xsxf

+=

π

Stat 110B, UCLA, Ivo DinovSlide 72

Continuous Distributions – Exponential

� Exponential distribution, X~Exponential(λ)� The exponential model, with only one unknown parameter, is the

simplest of all life distribution models.

� E(X)=1/ λ; Var(X)=1/ λ2; � Another name for the exponential mean is the Mean Time To Fail

or MTTF and we have MTTF = 1/ λ. � If X is the time between occurrences of rare events that happen on the average

with a rate l per unit of time, then X is distributed exponentially with parameter λ. Thus, the exponential distribution is frequently used to model the time interval between successive random events. Examples of variables distributed in this manner would be the gap length between cars crossing an intersection, life-times of electronic devices, or arrivals of customers at the check-out counter in a grocery store.

0 ;)( ≥= − xexf xλλ

Stat 110B, UCLA, Ivo DinovSlide 73

Continuous Distributions – Exponential

� Exponential distribution, Example:

� On weeknight shifts between 6 pm and 10 pm, there are an average of 5.2 calls to the UCLA medical emergency number. Let X measure the time needed for the first call on such a shift. Find the probability that the first call arrives (a) between 6:15 and 6:45 (b) before 6:30. Also find the median time needed for the first call ( 34.578%; 72.865% ). �We must first determine the correct average of this exponential

distribution. If we consider the time interval to be 4x60=240 minutes, then on average there is a call every 240 / 5.2 (or 46.15) minutes. Then X ~ Exp(1/46), [E(X)=46] measures the time in minutes after 6:00 pm until the first call.

By-hand vs. ProbCalc.htm

Stat 110B, UCLA, Ivo DinovSlide 74

Normal approximation to Binomial

� Suppose Y~Binomial(n, p)� Then Y=Y1+ Y2+ Y3+…+ Yn, where

� Yk~Bernoulli(p) , E(Yk)=p & Var(Yk)=p(1-p) ����

� E(Y)=np & Var(Y)=np(1-p), SD(Y)= (np(1-p))1/2

� Standardize Y:� Z=(Y-np) / (np(1-p))1/2

� By CLT ���� Z ~ N(0, 1). So, Y ~ N [np, (np(1-p))1/2]

� Normal Approx to Binomial is reasonable when np >=10 & n(1-p)>10(p & (1-p) are NOT too small relative to n).

Stat 110B, UCLA, Ivo DinovSlide 75

Normal approximation to Binomial – Example

� Roulette wheel investigation:� Compute P(Y>=58), where Y~Binomial(100, 0.47) –

�The proportion of the Binomial(100, 0.47) population having more than 58 reds (successes) out of 100 roulette spins (trials).

� Since np=47>=10 & n(1-p)=53>10 Normal approx is justified.

�Z=(Y-np)/Sqrt(np(1-p)) = 58 – 100*0.47)/Sqrt(100*0.47*0.53)=2.2

� P(Y>=58) ���� ���� P(Z>=2.2) = 0.0139� True P(Y>=58) = 0.177, using SOCR (demo!)� Binomial approx useful when no access to SOCR avail.

Roulette has 38 slots18red 18black 2 neutral

Page 13: UCLA STAT 110B Ivo Dinov Text: Statistics for …dinov/courses_students.dir/03/Spr...Text: Introduction to Probability and Statistics for Engineering and the Sciences 5th edition--

13

Stat 110B, UCLA, Ivo DinovSlide 76

Normal approximation to Poisson

� Let X1~Poisson(λλλλ) & X2~Poisson(µµµµ) ����X1+ X2~Poisson(λ+µλ+µλ+µλ+µ)

� Let X1, X2, X3, …, Xk ~ Poisson(λλλλ), and independent,� Yk = X1 + X2 + ··· + Xk ~ Poisson(kλλλλ), E(Yk)=Var(Yk)=kλλλλ.

� The random variables in the sum on the right are independent and each has the Poisson distribution with parameter λλλλ.

� By CLT the distribution of the standardized variable (Yk − kλλλλ) / (kλλλλ)1/2 ���� N(0, 1), as k increases to infinity.

� So, for kλλλλ >= 100, Zk = {(Yk − kλλλλ) / (kλλλλ)1/2 } ~ N(0,1).����� Yk ~ N(kλλλλ, (kλλλλ)1/2).

Stat 110B, UCLA, Ivo DinovSlide 77

Normal approximation to Poisson – example

� Let X1~Poisson(λλλλ) & X2~Poisson(µµµµ) ����X1+ X2~Poisson(λ+µλ+µλ+µλ+µ)

� Let X1, X2, X3, …, X200 ~ Poisson(2222), and independent,� Yk = X1 + X2 + ··· + Xk ~ Poisson(400), E(Yk)=Var(Yk)=400.

� By CLT the distribution of the standardized variable (Yk − 400) / (400)1/2 ���� N(0, 1), as k increases to infinity.

�Zk = (Yk − 400) / 20 ~ N(0,1)���� Yk ~ N(400, 400).�P(2 < Yk < 400) = (std’z 2 & 400) = �P( (2−400)/20 < Zk < (400−400)/20 ) = P( -20< Zk<0)

= 0.5

Stat 110B, UCLA, Ivo DinovSlide 78

Poisson or Normal approximation to Binomial?

� Poisson Approximation (Binomial(n, pn) � Poisson(λλλλ) ):

�n>=100 & p<=0.01 & λλλλ =n p <=20� Normal Approximation

(Binomial(n, p) � N ( np, (np(1-p))1/2) )�np >=10 & n(1-p)>10

!)1(

yey

ppyn

npn

nyn

n

y

n

λλλ

− →−�

���

��

→×

∞→− WHY?

Stat 110B, UCLA, Ivo DinovSlide 79

Areas under Standard Normal Curve – Example

� Many histograms are similar in shape to the standard normal curve. For example, persons height. The height of all incoming female army recruits is measured for custom training and assignment purposes (e.g., very tall people are inappropriate for constricted space positions, and very short people may be disadvantages in certain other situations). The mean height is computed to be 64 in and the standard deviation is 2 in. Only recruits shorter than 65.5 in will be trained for tank operation and recruits within ½ standard deviations of the mean will have no restrictions on duties.� What percentage of the incoming recruits will be trained to operate

armored combat vehicles (tanks)?

� About what percentage of the recruits will have no restrictions on training/duties?

Stat 110B, UCLA, Ivo DinovSlide 80

Areas under Standard Normal Curve - Example � The mean height is 64 in and the standard deviation is 2 in.

� Only recruits shorter than 65.5 in will be trained for tank operation.What percentage of the incoming recruits will be trained to operate armored combat vehicles (tanks)?

� Recruits within ½ standard deviations of the mean will have no restrictions on duties. About what percentage of the recruits will have no restrictions on training/duties?

60 62 64 65.5 66 68

X ���� (X-64)/265.5 ���� (65.5-64)/2 = ¾Percentage is 77.34%

X ���� (X-64)/265 ���� (65-64)/2 = ½63 ���� (63-64)/2 = -½

Percentage is 38.30%60 62 63 64 65 66 68

Stat 110B, UCLA, Ivo DinovSlide 81

Gamma and Exponential Distributions

�Gamma Distribution�Gamma function :

Properties

- X ~ Gamma with parameters α and β if

for positive integer n

0>α 0>β0 , otherwise

=)(xf

βαα αΓβ

x

ex−

−1

)(1

0>x,

where ,

dxex x−∞

−�=Γ

0

1)( αα 0>α

)1()1()( −−= αΓααΓ)!1()( −= nnΓ

1)1( =Γ π=Γ )5.0(

Page 14: UCLA STAT 110B Ivo Dinov Text: Statistics for …dinov/courses_students.dir/03/Spr...Text: Introduction to Probability and Statistics for Engineering and the Sciences 5th edition--

14

Stat 110B, UCLA, Ivo DinovSlide 82

Gamma and Exponential Distributions

�Exponential Distribution (cont’d)CDF : ββ

β

xxx

edxexXPxF−−

−==≤= � 11)()(0

βx

exFxXP−

=−=> )(1)(

-

Ex 1) X = response time at a certain on-line computer terminal

X ~ exponential with E(X) = 5(sec.).

(a)

(b)

, x > 0

, x > 0

=≤ )10( XP=≤≤ )105( XP

- Tail probability

P{X>x}β1

f(x)

xF(x)

Stat 110B, UCLA, Ivo DinovSlide 83

#events in t: Poisson w. mean λt

Gamma and Exponential Distributions

Relationship to the Poisson Process# of events in any time interval t has a Poisson distribution w/parameter ���� the distribution of the elapsed time between two successive events is exponential with parameter

tλλ

β 1=

Why? Poisson : P(no events in t) =

Let X = time until the first event.

Then P(no events in t) =

i.e., = CDF of exponential with or

tt

etetP λλ λλ −

==!0

)();0(0

tetXP λ−=> )(

tetXP λ−−=≤≤ 1)0(β

λ 1=λ

β 1=

Exponential w. 1/λ

tX

Stat 110B, UCLA, Ivo DinovSlide 84

Lognormal Distribution

� X ~ lognormal with parameters µ and σ, if

� f (x) =

� E(X) = exp( µ + σ 2/ 2)

Var(X) = exp ( 2µ + σ 2) {exp (σ 2) –1}Ex) Let X ~ lognormal with parameter µ = 3.2 and σ, = 1

P( X > 8) =

),;(~)ln( σµxNX

2

2

2)(ln

21 σ

µ

σπ

−− x

ex

0≥x

0 , otherwise

Stat 110B, UCLA, Ivo DinovSlide 85

Weibull Distribution

� X ~ Weibull Distribution with parameters α and β if

f (x) =

� If β = 1 ; (exponential with parameter )

� Useful in Reliability, life testing problems

βαβαβ xex −−1 , x > 0

0 , otherwise

xexf αα −=)(α1

)11()(1

βΓα β +=

−XE

})]11([)21({)( 22

βΓ

βΓα β +−+=

−XVar

.βαxexF −−=1)(

Stat 110B, UCLA, Ivo DinovSlide 86

Beta Distribution

� Provides positive density only in an interval of finite length

X ~ Beta Distribution with parameters α and β if

11 )1()()(

)( −− −+ βα

βΓαΓβαΓ xx , 0 < x < 1 (α>0, β>0 )

0 , otherwise

βαα+

=)(XE)1()(

)(2 +++

=βαβα

αβXVar,

Ex)

X = proportion of TV sets requiring service during the first year

~ beta, α = 3 , β = 2 .

P(at least 80% of the model sold this year will require service in 1 year)

f (x) =

Stat 110B, UCLA, Ivo Dinov Slide 87

Marginal & Joint PDF’sCentral Limit Theorem (CLT)

Page 15: UCLA STAT 110B Ivo Dinov Text: Statistics for …dinov/courses_students.dir/03/Spr...Text: Introduction to Probability and Statistics for Engineering and the Sciences 5th edition--

15

Stat 110B, UCLA, Ivo DinovSlide 88

Joint probability mass function

� The joint probability mass function of the discrete random variables X and Y, denoted as fXY(x,y) satisfies:

),(),()3(

1),()2(

0),()1(

yYxXPyxf

yxf

yxf

XY

xXY

y

XY

===

=

��

Stat 110B, UCLA, Ivo DinovSlide 89

Joint probability mass function – exampleThe joint density, P{X,Y}, of the number of minutes waiting to catch the first fish, X , and the number of minutes waiting to catch the second fish, Y, is given below.

P {X = i,Y = k } k 1 2 3

Row Sum P{ X = i }

1 i 2 3

0.01 0.02 0.08 0.01 0.02 0.08 0.07 0.08 0.63

0.11 0.11 0.78

Column Sum P {Y =k }

0.09 0.12 0.79 1.00

• The (joint) chance of waiting 3 minutes to catch the first fish and 3 minutes to catch the second fish is:

• The (marginal) chance of waiting 3 minutes to catch the first fish is: • The (marginal) chance of waiting 2 minutes to catch the first fish is (circle all

that are correct): • The chance of waiting at least two minutes to catch the first fish is (circle

none, one or more): • The chance of waiting at most two minutes to catch the first fish and at most

two minutes to catch the second fish is (circle none, one or more):

Stat 110B, UCLA, Ivo DinovSlide 90

Marginal probability distributions

� Individual probability distribution of a random variable is referred to as its Marginal Probability Distribution.

� Marginal probability distribution of X can be determined from the joint probability distribution of X and other random variables.

� Example: Marginal probability distribution of X is found by summing the probabilities in each column,

for y, summation is done in each row.

Stat 110B, UCLA, Ivo DinovSlide 91

Marginal probability distributions (Cont.)

� If X and Y are discrete random variables with joint probability mass function fXY(x,y), then the marginal probability mass function of X and Y are

where Rx denotes the set of all points in the range of (X, Y) for which X = x and Ry denotes the set of all points in the range of (X, Y) for which Y = y

�===xR

XYX YXfxXPxf ),()()(

�===Ry

XYY YXfyYPyf ),()()(

Stat 110B, UCLA, Ivo DinovSlide 92

Mean and Variance

� If the marginal probability distribution of X has the probability function f(x), then

� R = Set of all points in the range of (X,Y).

� Example.

��� �� =���

��

�===

x RXY

x RXY

xXX

xx

yxxfyxfxxxfXE ),(),()()( µ

�=R

XY yxxf ),(

),()(),()(

),()()()()(

22

222

yxfxyxfx

yxfxxfxXV

XYR

Xx R

XYX

x RXYX

xXXX

x

x

���

� ��

−=−=

−=−==

µµ

µµσ

Stat 110B, UCLA, Ivo DinovSlide 93

Central Limit Theorem:When sampling from almost any distribution,

is approximately Normally distributed in large samples.X

Central Limit Theorem – heuristic formulation

Show Sampling Distribution Simulation Applet:file:///C:/Ivo.dir/UCLA_Classes/Winter2002/AdditionalInstructorAids/SamplingDistributionApplet.html

Page 16: UCLA STAT 110B Ivo Dinov Text: Statistics for …dinov/courses_students.dir/03/Spr...Text: Introduction to Probability and Statistics for Engineering and the Sciences 5th edition--

16

Stat 110B, UCLA, Ivo DinovSlide 94

Let be a sequence of independentobservations from one specific random process. Let and and and both be finite ( ). If , sample-avg,

Then has a distribution which approaches N(µ, σ2/n), as .

Central Limit Theorem –theoretical formulation

{{{{ }}}},...,...,X,XXk21

µµµµ====)(XE σσσσ====)(XSD∞∞∞∞<<<<∞∞∞∞<<<<<<<< || ;0 µµµµσσσσ ����

========

n

k kX

nnX

1

1

X∞∞∞∞→→→→n

Stat 110B, UCLA, Ivo DinovSlide 95

The standard error of the mean

The standard error of the sample mean is an estimate of the SD of the sample mean

� i.e. a measure of the precision of the sample mean as an estimate of the population mean

�given by SE( )size Sampledeviation standard Sample =

ns

xS x =)E(

x

� Note similarity with

� SD( ) = . X n

σσσσ

Stat 110B, UCLA, Ivo DinovSlide 96

TABLE 7.2.1 Cavendish's Determinations of the Mean Density of the Earth (g/cm3)

5.50 5.61 4.88 5.07 5.26 5.55 5.36 5.29 5.58 5.655.57 5.53 5.62 5.29 5.44 5.34 5.79 5.10 5.27 5.395.42 5.47 5.63 5.34 5.46 5.30 5.75 5.68 5.85

So urce : C avendis h [1798].

Cavendish’s 1798 data on mean density of the Earth, g/cm3, relative to that of H2O

Sample mean

and sample SD =

Then the standard error for these data is:

3/ 447931.5 cmgx ====

3/ 2209457.0 cmgX

S ====

04102858.029

2209457.0)( ============n

SXSE X

Stat 110B, UCLA, Ivo DinovSlide 97

� For random samples from a Normal distribution,

is exactly distributed as Student(df = n - 1)� but methods we shall base upon this distribution for T work

well even for small samples sampled from distributions which are quite non-Normal.

� df is number of observations –1, degrees of freedom.

)()(

XSEXT µµµµ−−−−====

Student’s t-distribution

Recall that for samples from N( µ , σ )

)1,0(~/

)()()( N

nX

XSDXZ

σσσσµµµµµµµµ −−−−====−−−−====

Approx/ExactDistributions

Stat 110B, UCLA, Ivo Dinov Slide 98

Inference & Estimation

Stat 110B, UCLA, Ivo DinovSlide 99

Parameters, Estimators, Estimates …

�E.g., We are interested in the population mean diameter (parameter) of washers the sample-average formula represents an estimator we can use, where as the value of the sample averagefor a particular dataset is the estimate (for the mean parameter).

{ }( )

( )1900.01913.01896.032 .1903.0y

1900.01913.01896.031yestimate

0.1900 0.1913, 0.1896, :Data

1 estimator ;parameter

about How

1

++==++==

==== �

=

y

YY

NY

N

kkYµ

Page 17: UCLA STAT 110B Ivo Dinov Text: Statistics for …dinov/courses_students.dir/03/Spr...Text: Introduction to Probability and Statistics for Engineering and the Sciences 5th edition--

17

Stat 110B, UCLA, Ivo DinovSlide 100

A 95% confidence interval

� A type of interval that contains the true value of a parameter for 95% of samples taken is called a 95%confidence interval for that parameter, the ends of the CI are called confidence limits.

� (For the situations we deal with) a confidence interval (CI) for the true value of a parameter is given by

estimate t standard errors (SE)±

TABLE 8.1.1 Value of the Multiplier, t , for a 95% CI

df : 7 8 9 10 11 12 13 14 15 16 17t : 2.365 2.306 2.262 2.228 2.201 2.179 2.160 2.145 2.131 2.120 2.110

df : 18 19 20 25 30 35 40 45 50 60 t : 2.101 2.093 2.086 2.060 2.042 2.030 2.021 2.014 2.009 2.000 1.960

Stat 110B, UCLA, Ivo DinovSlide 101

(General) Confidence Interval (CI)

� A level L confidence interval for a parameter (θ), is an interval (θ1^ , θ2^), where θ1^ & θ2^, are estimators of θ, such that P(θθθθ1111^ < θ <θ <θ <θ < θθθθ2222^) = L.

� E.g., C+E model: Y = µ+ε. Where ε ∼ Ν(0, σε ∼ Ν(0, σε ∼ Ν(0, σε ∼ Ν(0, σ 2222)))), then by CLT we have Y_bar ~ Ν(µ, σΝ(µ, σΝ(µ, σΝ(µ, σ2222/n)

� n½(Y_bar - µµµµ)/σ σ σ σ ~ Ν(0, σΝ(0, σΝ(0, σΝ(0, σ2222)))).� L = P ( z(1-L)/2 < n½(Y_bar - µµµµ)/σ σ σ σ < z(1+L)/2 ),

where zq is the qth quartile.

� E.g., 0.95 = P ( z0.025 < n½(Y_bar - µµµµ)/σσσσ < z0.975 ),

Area=?

Stat 110B, UCLA, Ivo DinovSlide 102

Most of

the table

24.83

o

24.83

500th

100th

10th9th8th7th6th5th4th3rd2nd1st

1000th999th998th997th996th995th994th993rd992nd991st

502nd501st

96.0%

94.0%

90.0%88.9%100%100%100%100%100%100%100%100%

95.2%95.2%95.2%95.2%95.2%95.2%95.2%95.2%95.2%95.2%

96.0%96.0%

..........

..........

.......... ..........

..........

..........

SampleCoverage

to date

True mean

True mean

24.8424.82

Figure 8.1.2 Samples of size 10 from a Normal(µ=24.83, s=.005) distribution and their 95% confidence intervals for µ..

From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 1999.

How many of the previous samples contained the true mean?

Stat 110B, UCLA, Ivo DinovSlide 103

Confidence Interval for the true (population) mean µ:sample mean t standard errors

or

±

1 and )SE( where),se( −==± ndfn

sxxtx x

CI for population mean

TABLE 8.1.1 Value of the Multiplier, t , for a 95% CI

df : 7 8 9 10 11 12 13 14 15 16 17t : 2.365 2.306 2.262 2.228 2.201 2.179 2.160 2.145 2.131 2.120 2.110

df : 18 19 20 25 30 35 40 45 50 60 t : 2.101 2.093 2.086 2.060 2.042 2.030 2.021 2.014 2.009 2.000 1.960

Stat 110B, UCLA, Ivo DinovSlide 104

Effect of increasing the confidence level

80% CI, x ± 1.282 se(x)

90% CI, x ± 1.645 se(x)

95% CI, x ± 1.960 se(x)

99% CI, x ± 2.576 se(x)

Figure 8.1.3 The greater the confidence level, the wider the interval

From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.

80% CI, x ± 1.282 se(x)

90% CI, x ± 1.645 se(x)

99% CI, x ± 2.576 se(x)

95% CI, x ± 1.960 se(x)

ConfidenceLevel

Increase

Increases the size

of the CI

Why?

Stat 110B, UCLA, Ivo DinovSlide 105

Effect of increasing the sample size

Passage time

n = 90 data points

n = 40 data points

n = 10 data points

24.83 24.8424.82

Figure 8.1.4 Three random samples from a Normal(µ=24.83, s =.005) distribution and their 95% confidence intervals for µ.

From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000,

To double the precision we need four times as many observations.

IncreaseSample

Size

Decreases the size

of the CI

Page 18: UCLA STAT 110B Ivo Dinov Text: Statistics for …dinov/courses_students.dir/03/Spr...Text: Introduction to Probability and Statistics for Engineering and the Sciences 5th edition--

18

Stat 110B, UCLA, Ivo DinovSlide 106

Confidence intervals – non-symmetric case

� A marine biologist wishes to use male angelfish for an experiment and hopes their weights don't vary much. In fact, a previous random sample of n = 16 angelfish yielded the data below

� {y1; … ; yn} = { 5.1; 2.5; 2.8; 3.4; 6.3; 3.6; 3.9; 3.0; 2.7; 5.7; 3.5; 3.6; 5.3; 5.1; 3.5; 3.3}

� Sample statistics from these data include Avg. = 3.96 lbs, s2 = 1.35 lbs, n = 16.

� Problem: Obtain a 100(1- α)% CI(σ2).� Point Estimator for σ2? How about sample variance, s2?� Sampling theory for s2? Not in general, but under Normal

assumptions ...� If a random sample {Y1; …;Yn} is taken from a normal population

with mean µ and variance σ2,then standardizing, we get a sum of squared N(0,1)

Stat 110B, UCLA, Ivo DinovSlide 107

Confidence intervals – non-symmetric case

� {y1; … ; yn} = { 5.1; 2.5; 2.8; 3.4; 6.3; 3.6; 3.9; 3.0; 2.7; 5.7; 3.5; 3.6; 5.3; 5.1; 3.5; 3.3}

� Problem: Obtain a 100(1- α)% CI(σ2).� If a random sample {Y1; …;Yn} is taken from a normal

population with mean µ and variance σ2,then standardizing, we get a sum of squared N(0,1) ( )

212

12

2

2~)1(

−==� −

=−ndf

nk k YYSn χ

σσ

( )

����

����

≤−

≤=−���

���

� −=

��

���

� −−

� 2

2 ,12

12

2

21 ,1

1 αα χχσ

αn

nk k

n

YYP

For a=0.05, say. Need:100(1- αααα)% CI(σσσσ2).

Stat 110B, UCLA, Ivo DinovSlide 108

Confidence intervals – non-symmetric case

� {y1; … ; yn} = { 5.1; 2.5; 2.8; 3.4; 6.3; 3.6; 3.9; 3.0; 2.7; 5.7; 3.5; 3.6; 5.3; 5.1; 3.5; 3.3}

� Problem: Obtain a 100(1- α)% CI(σ2).

� χ2(15; 0.025)=27:49 and χ2(15; 0.975)=6:26 �� This yields the CI, the sample variance is s2=1.35. Note

the CI is NOT symmetric (0.74 ; 3.24)

( ) ( )2

21 ,1

12

2

2 ,1

12

2

����

�� −−

� = −

����

�� −

� = −≤≤

αχαχσ

n

nk YkY

n

nk YkY

Stat 110B, UCLA, Ivo DinovSlide 109

Prediction vs. Confidence intervals

� Confidence Intervals (for the population mean µµµµ):

� Prediction Intervals: L-level prediction interval (PI) for a new value of the process Y is defined by:

( )

.mean processunknown theofestimator an as obtained

is ,ˆ valuepredicted thewhereL)/2(1 1,-nL)/2(1 1,-n tˆ ; tˆ ˆˆ

µ

σσYnewY

newnew YY=

++ ×+×−

���

����

� ×+

× ++

nt

Y ; n

t– Y L)/2(1 1,-nL)/2(1 1,-n ˆˆ σσ

Stat 110B, UCLA, Ivo DinovSlide 110

Prediction vs. Confidence intervals – Differences?

� Confidence Intervals (for the population mean µµµµ):

� Prediction Intervals:( )

( )n

n

ky

ky

nY newnew

newnewnew

Y

YYYY11

12

11)(ˆˆ

ˆˆˆ

ˆ where;tˆ ; tˆL)/2(1 1,-nL)/2(1 1,-n

+�

=−

−== −

=×+×− ++

σσ

σσ

( )�

=−

−==

���

����

� ×+

× ++

n

ky

ky

nY

12

11)(ˆˆ

ˆˆn

t Y ;

nt

– Y L)/2(1 1,-nL)/2(1 1,-n

σσ

σσ

Which SDis bigger?!?

Stat 110B, UCLA, Ivo Dinov Slide 111

Significance Testing –Using Data to Test Hypotheses

Page 19: UCLA STAT 110B Ivo Dinov Text: Statistics for …dinov/courses_students.dir/03/Spr...Text: Introduction to Probability and Statistics for Engineering and the Sciences 5th edition--

19

Stat 110B, UCLA, Ivo DinovSlide 112

Example – Carbon content in Steel

Percentage of C (Carbon) in 2 random samples taken from 2 steel shipments are measured and summarized below. The question is to determine if there are statistically significant differences between the shipments.

0.0823.1882

0.0863.62101

S2Y_N#

Stat 110B, UCLA, Ivo DinovSlide 113

Measuring the distance between the true-value and the estimate in terms of the SE’s

�Intuitive criterion: Estimate is credible if it’s not far-away from its hypothesized true-value!

�But how far is far-away?�Compute the distance in standard-terms:

�Reason is that the distribution of T is known in some cases (Student’s t, or N(0,1)).

�The estimator (obs-value) is typical/atypical if it is close to the center/tail of the distribution.

SEterValueTrueParameEstimatorT −−−−====

Stat 110B, UCLA, Ivo DinovSlide 114

Comparing CI’s and significance tests

� These are different methods for coping with the uncertainty about the true value of a parameter caused by the sampling variation in estimates.

� Confidence interval: A fixed level of confidence is chosen. We determine a range of possible values for the parameter that are consistent with the data (at the chosen confidence level).

� Significance test: Only one possible value for the parameter, called the hypothesized value, is tested against the data. We determine the strength of the evidence (confidence) provided by the data against the proposition that the hypothesized value is the true value.

Stat 110B, UCLA, Ivo DinovSlide 115

Review

�Are the carbon contents in the two steel shipments any different?

0.0823.18820.0863.62101

S2Y_N#

0.025%

025.0,7 == αdft12.3

8082.0

10086.0

44.0)2ˆ1ˆ(

18.362.3

0- Est_2-Est_10t

=+

=

=−

−=

==

µµSE

SE

Stat 110B, UCLA, Ivo DinovSlide 116

Guiding principles

We cannot rule in a hypothesized value for a parameter, we can only determine whether there is evidence, provided by the data, to rule out a hypothesized value.

The null hypothesis tested is typically a skeptical reaction to a research hypothesis

Hypotheses

Stat 110B, UCLA, Ivo DinovSlide 117

STEP 1 Calculate the test statistic ,

estimate - hypothesized valuestandard error

[This te lls us ho w many s tandard e rro rs the es tima te is abo ve the hypo thes ized

va lue (t0 po s itive ) o r be lo w the hypo thes ized va lue (t0 nega tive).]

STEP 2 Calculate the P -value using the following table.

STEP 3 Interpret the P -value in the context of the data.

Using to test H 0: θθθθ = θθθθ 0 versus some alternative H 1.ˆ θ

t0 =ˆ θ −θ0

se( ˆ θ )=

The t-test

Page 20: UCLA STAT 110B Ivo Dinov Text: Statistics for …dinov/courses_students.dir/03/Spr...Text: Introduction to Probability and Statistics for Engineering and the Sciences 5th edition--

20

Stat 110B, UCLA, Ivo DinovSlide 118

Alternative Evidence against H0: θθθθ > θθθθ 0

hypothesis provided by P -value

H 1: θ > θ0 too much bigger than θ0 P = pr(T t 0)(i.e., - θ0 too large)

H 1: θ < θ0 too much smaller than θ0 P = pr(T t 0)(i.e., - θ0 too negative)

H 1: θ θ0 too far from θ0 P = 2 pr(T |t 0|)(i.e., | - θ0| too large)

where T ~ Student(df )

ˆ θ

ˆ θ

ˆ θ

≥ˆ θ

ˆ θ

ˆ θ

The t-test

Stat 110B, UCLA, Ivo DinovSlide 119

Interpretation of the p-value

TABLE 9.3.2 Interpreting the S ize of a P -Value

Translation> 0.12 (12%) No evidence against H0

0.10 (10%) Weak evidence against H0

0.05 (5%) Some evidence against H0

0.01 (1%) Strong evidence against H0

0.001 (0.1%) Very Strong evidence against H0

Approximate sizeof P -Value

Stat 110B, UCLA, Ivo DinovSlide 120

Is a second child gender influenced by the gender of the first child, in families with >1 kid?

� Research hypothesis needs to be formulated first before collecting/looking/interpreting the data that will be used to address it. Mothers whose 1st child is a girl are more likely to have a girl, as a second child, compared to mothers with boys as 1st children.

� Data: 20 yrs of birth records of 1 Hospital in Auckland, NZ.

Male Female Total Male 3,202 2,776 5,978Female 2,620 2,792 5,412Total 5,822 5,568 11,3901st

Chi

ld

Second Child Gender

Stat 110B, UCLA, Ivo DinovSlide 121

Analysis of the birth-gender data

� Samples are large enough to use Normal-approx. Since the two proportions come from totally diff. mothers they are independent � use formula 8.5.5.a

8109.1)0

tPr(

2

)2

ˆ1(2

ˆ

1

)1

ˆ1(1

ˆ2

ˆ1

ˆ

02

ˆ1

ˆ

49986.5edValueHypothesiz-Estimate0

t

−−−−××××====≥≥≥≥====−−−−

====−−−−

++++−−−−

−−−−====

������������

������������ −−−−

−−−−−−−−

============

TvalueP

n

pp

n

pp

pp

ppSE

ppSE

Stat 110B, UCLA, Ivo DinovSlide 122

Analysis of the birth-gender data

� We have strong evidence to reject the H0, and hence conclude mothers with first child a girl a more likely to have a girl as a second child.

� Practical vs. Statistical significance:� How much more likely? A 95% CI:

CI (p1- p2) =[0.033; 0.070]. And computed by:

%]7; %3[0093677.096.10515.02

)2

ˆ1(2

ˆ

1

)1

ˆ1(1

ˆ96.1

1ˆ96.1

1ˆSEestimate

====××××±±±±

====−−−−

++++−−−−

××××±±±±−−−−

====��������

������������

���� −−−−××××±±±±−−−−====××××±±±±

n

pp

n

pppp

ppSEppz

Stat 110B, UCLA, Ivo DinovSlide 123

Relation Among Various Continuous Distributions

Normal (X)2,σµ

Normal (Z)1,0

σµ−= XZ

Lognormal (Y)2,σµ

YX ln= XeY =Chi-square ( )

n

� == n

i iZ1

Gammaβα ,

2,2/ == βα n

Exponential(X)β

1=α

n=2

Weibullβγ,

1=γ

Uniform(U)1,0

UX lnβ−=

Uniform(X)βα ,

αβα

−−= XU ααβ +−= UX )(

Betaβα , 1== βα