2-16-2012

CPS 424/552Discrete-Event Simulation TechniquesSpring 2012

Chapter 4.1 Sample Statistics Zhongmei YaoDepartment of Computer ScienceUniversity of Dayton

2-16-2012

Review: Chapter 1 Models

1.1 Introduction– Model characterization, development

1.2 A Single-Server Queue1) Conceptual model 2) Specification model3) Output statistics4) Computational model

1.3 A Simple Inventory System– Conceptual model, specification model– Output statistics– Computational model

Textbook copyright © 2006, Prentice Halls

Review: Chapter 2 RNG

2.1 Lehmer Random-Number Generators– Introduction

2.2 Lehmer Random-Number Generators– Implementation

2.3 Monte Carlo Simulation2.4 Monte Carlo Simulation Examples2.5 Finite-State Sequences


Review: Chapter 3 DES

3.1 Discrete-event simulation– Exponential random variate, geometric random variate

3.2 Multi-stream Lehmer RNGs– Streams, examples

3.3 Discrete-event simulation examples– SSQ with immediate feedback– Simple inventory systems with delivery lag– Single-server machine shop


Chapter 4 Statistics

4.1 Sample statistics– Sample mean, sample standard deviation, examples

4.2 Discrete-data histograms– Histograms, empirical cumulative distribution functions

4.3 Continuous-data histograms– Histograms, empirical cumulative distribution functions

4.4 Correlation


Chapter Overview

• Discrete-event simulations generate a lot of experimental data

• This chapter considers how we can compress data into meaningful statistics and interpret sample statistics

• A sample is data collected from a much larger population• If the size of sample is small, essentially all that can be

done is compute the sample mean and standard deviation– Section 4.1

• If the size of sample is not small, a sample-data histogramcan be computed and then used to analyze the distribution of data in the sample– Section 4.2 and 4.3

Sample Mean and Standard Deviation

• How to collect data in DES?– Within-the-run (e.g., job avg and time avg used to characterize the

performance of a SSQ system)– Between-the-run: simulate the system repeatedly by simply

changing the initial seed from run to run

• Def. 4.1.1: Given a sample x1, x2, …, xn (continuous or discrete)– Sample mean:

– Sample variance:

– The sample standard deviation:

• Sample mean: a measure of central tendency of data values• Sample variance and sample standard deviation are

measures of dispersion– The spread of data about the sample– If the unit of the data is sec, then the units of the sample mean and

sample standard deviation are sec as well

From http://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg

Sample Mean and Standard Deviation

mean

variance

Sample Variance

• A common alternative definition of the sample variance s2:

rather than

• The 1/(n 1) version appears universally–The s2 is undefined for n = 1–The 1/(n 1) form is an unbiased estimate of the population variance (means that the sample variance converges to the population variance)

• Why consider the 1/n form?–The sample size n is typically large in simulations–If n is large, the difference is negligible–We will use the 1/n version

Relating the Mean and Standard Deviation

• The root-mean-square (rms) function d(x) measures dispersion about any value x

• Theorem 4.1.1– The sample mean gives the smallest possible value for d(x)– The standard deviation s is that smallest value:

Relating the Mean and Standard Deviation

• Example 4.1.1:– Collect 50 observations – The sample mean is 1.095 – The sample standard deviation is 0.354

– The smallest value of d(x) is s, as shown in the figure

Chebyshev’s Inequality

• To better understand how the mean and s are related, consider the number of points that lie within k standard deviations of the mean– The parameter k > 1

• Let the set contain the points satisfying:

• Let pk = |Sk| / n be the proportion of xi that lie within ks of the mean

• Chebyshev’s inequality states: pk 1 – 1/k2

2ks

Chebyshev’s Inequality

• For k = 2, we have from Chebyshev’s inequality that pk 1 – ¼ = 75%

– For any sample, at least 75% of data values lie within 2s of the sample mean. What is pk for k = 3?

– Example 4.1.1: 95% of points lie within 2s of the sample mean

– Chebyshev’s is very conservative for k = 2

• Chebyshev’s inequality and practical experience suggest that the is the “effective width” of a sample– Most (but not all) points will lie in this interval – Outliers must be viewed with suspicion

4s

• Often the output data generated by simulations should be converted to different units– Example 4.1.2: Suppose x1, x2, …, xn measured in seconds. To

convert to minutes, we let xi’ = xi / 60

• Let xi’ = a xi + b be the new data• Sample mean:

• Sample variance:

• Sample standard deviation:

Linear Data Transformation

• Example 4.1.2: Suppose x1, x2, …, xn measured in seconds. To convert to minutes, we let xi’ = xi / 60

– Given is 45 sec, what is ?– Given s is 15 sec, what is s’ ?

• Example 4.1.3: Standardize data by subtracting the sample mean and dividing the result by s

– For sample x1, x2, …, xn , standardized sample is

– Used to avoid issues with vary large (or small) valued data– What is ?– What is s’ ?

Linear Data Transformation

Nonlinear Data Transformation

• When data is used to generate a Boolean (1 or 0) outcome, we need nonlinear data transformation– The value of xi is not important as the effect – E.g., consider the effect: it will rain tomorrow. How much rain we

will have is not important

• Let A be a fixed set and

• Let p be the proportion of xi that fall in A

• Then, and

Nonlinear Data Transformation

• Example 4.1.4: A SSQ system– Let xi = di be the queueing delay for job i– Let A = R+ be the set of all positive numbers– Then xi’ = 1 if and only if di > 0– From Exercise 1.2.3, proportion of jobs delayed is p = 0.723– Therefore, = 0.723– What about s’ ?

Computational Considerations

• Recall that the sample standard deviation is given by

– Require two passes through the data1. Compute the sample mean2. Compute the squared differences

• The two-pass approach is undesirable for large n since we need to temporarily store data – Can we find a one-pass algorithm for computing s?

Conventional One-Pass Algorithm

• A one-pass equation for s2:

– Thus, s2 can be computed in one pass by accumulating these two partial sums:

Next Time

• Section 4.1 – Welford’s one-pass algorithm– Time-Averaged Sample Statistics

• Section 4.2

2-16-2012

Documents