review of the quantitative business analysis section of the ets exam * (probability and statistics...
DESCRIPTION
Review of the Quantitative Business Analysis Section of the ETS Exam * (Probability and Statistics and Management Science). Probability and Saistics Measure of set operations Conditional/joint probabilities Counting rules Multiplication Addition Permutations Combinations - PowerPoint PPT PresentationTRANSCRIPT
Review of the Quantitative BusinessAnalysis Section of the ETS Exam*
(Probability and Statistics and Management Science)
• Probability and Saistics– Measure of set operations– Conditional/joint probabilities– Counting rules
• Multiplication • Addition • Permutations • Combinations • Permutations when not all objects are different
– Measures of central tendency and dispersion – Distributions (including normal and binomial) – Sampling and estimation – Hypothesis testing – Correlation and regression – Time-series forecasting – Statistical concepts in quality control
*Slides available at H:\ditri\Teaching\Review for the ETS exam
• Management science– Linear programming – Project scheduling (including PERT and CPM) – Inventory and production planning – Special topics
• queuing theory• simulation, and • decision analysis
Review of the Quantitative BusinessAnalysis Section of the ETS Exam
Probability and Statistics
• Measure of set operations– “Set” is a collection of objects– Sets are defined by the elements in the set (usually
numbers)– Sets are usually labeled with letters and there will
be a universal set (U) containing all elements in question
– In statistics we are frequently interested in the set of real numbers from 0 to 1
– Usually interested in “subsets” that meet our requirements out of the universal set of 4 stems on an exam question we want the subset that contains the right answer
– There can be a null set, that is one that does not meet all the requirements and is empty
– Venn diagrams can be used to describe set operations
Venn Diagrams
BA
A union B
A intersection B
BA
A
A complement
Conditional/Joint Probabilities
• Probability– Real number from 0 to 1 mapping the likelihood of
an “event” to the set of real numbers• An event is a set of possible outcomes• Events come in two flavors
– Independent• Knowledge of one event does not provide
insight into another event– Non independent events
• Correlated events where knowledge of the outcome of one event changes the likelihood of another event
• Conditional probabilities
• Joint probabilities– Multidimensional situation where we are interested
in more than one numeric characteristic
FlipsofNumberHeadsofNumberHeadsP
____)(
26/12/1
52/1)(Re
)_&(Re)(
)()\( dP
HeartsAcedPAP
BAPABP
Counting Rules
• Multiplication– (Decision tree) where subsequent outcomes
depend on prior outcomes– N (total number of events) = n1 * n2 … nk where
• ni is the number of events at each stage
• Addition– Parallel activities– N = n1 + n2 + ….nk
• Permutations (number of distinct orders)– Ordering n distinct items– An orange, apple, and a grapefruit
• OAG, OGA, AOG, AGO, GAO, and GOA
• With 0! = 1
n1
n2
n1 n2
))...(3)(2)(1(! nnnnnnnN
Counting Rules(continued)
• Permutations (continued)– Subset of r items from n distinct items
– Number of distinct poker hands (including order dealt with each card regarded as unique)
• Combinations– Distinct hands without regard to order
– Or “n choose r”– Permutations when not all objects are different
)!(!rn
nN
200,875,311)!552(
!52)!(
!
rnnN
960,598,2)!552(!
!52)!(!
!
rrnrnN
rn
N
!!...2!1!
nknnnN
Measures of Central Tendency and Dispersion
• Random variables described by distributions– Captures the fact that more than one value per variable and
some values are more common than others
• Measures of central tendency– Mean– Mode (most frequent)– Median (“middle” value)
• Measures of dispersion– Range (largest minus the smallest)– Standard deviation
00.050.1
0.150.2
0.250.3
0.350.4
0.45
-4 -2 0 2 4
N
XN
ii
1
N
XN
ii
1
2)(
Distributions(including normal and binomial)
• As the name implies, we are describing how the frequency of various outcomes is distributed
• Comes in two flavors– Probability distribution function
• Measures the rate at which we accumulate probability (analogous to velocity)
• Usually labeled f(x)– Cumulative distribution function
• Measures the total accumulated probability (analogous to distance)
• For discrete random variables
• For continuous random variables
)()( xXPxF
j
jxpxF )()(
dssfxFs
)()(
Normal Probability and Cumulative Distribution Functions
00.050.1
0.150.2
0.250.3
0.350.4
0.45
-4 -2 0 2 4
0
0.2
0.4
0.6
0.8
1
1.2
-4 -2 0 2 4
Binomial Distribution
• Discrete, considering “yes” or “no” phenomena• Common in quality where products are “good” or
“bad”• Usual case to have a sample of size n inspected
and we are interested in the probability of seeing k (k = 0, 1, 2, …n) defects
• Formally
• Probability and Cumulative Distribution functions
00.050.1
0.150.2
0.250.3
0.350.4
0.450.5
0 2 4 6
knk ppkn
kXP
)1()(
0
0.2
0.4
0.6
0.8
1
1.2
0 2 4 6 8
Sampling and Estimation
• Defined measures of central tendencies and dispersion earlier
• Often, the parameters describing the population are not assessable– Might be too large to measure– Measuring process can be destructive– Measurement process itself might be biased (as is the case
with how the census is compiled in your country)• Addressed by drawing a representative subset (a sample)
and use inference to estimate the analogous parameters in the population
• We can estimate mu and sigma in the population , using the X-bar and the standard deviation of the sample
• X-bar and S are unbiased estimators of mu and sigma—that is, on average they accurately predict the estimated parameters
,
,
n
XX
n
ii
1
1
)(1
2
2
n
XXs
n
ii
Hypothesis Testing
• A testable idea• Based on the premise that science never knows,
only rejects the null hypothesis at higher levels of statistical significance
• My favorite example—our legal system– ‘People are presumed innocent until it is too
unlikely (“beyond a reasonable doubt”) that the person is actually innocent
– Upon rejecting the null, the person is sent to jail (there is always a chance the perp was indeed innocent)
• Commonly used to test for differences in means (ANOVA)– Question: is the difference in sales levels for
different cereal box designs the result of random chance or does one design sell better than another
– H0: Sales levels are the same– HA: There is a difference in sales levels
• Rejection of the null is based on the “p value” the probability of seeing a difference of the observed magnitude or more by chance
Correlation and Regression
• Correlation– If random variables are not independent, we are
interested in describing the relationship– Pearson Correlation Coefficient (r)
Y
XPositive r: Y increasesas X increases.
Y
XNegative r: Y decreasesas X increases.
Y
Xr = -1: a perfect negative correlation between X and Y.
Y
Xr = 1: a perfectpositive correlation between X and Y.
Correlation and Regression
• Regression– Captures the relationship between an independent
and dependent variable– Goal is to achieve a “best fit” of a math model
XY 10
Y
X
Y = 0 + 1X1
Distance
SSE y xi
ni i
0 1
2
1
^ ^
Hours of Service
Hol
e D
evia
tion
Correlation and Regression
• Regression– Objective is to explain variation in the data set– Coefficient of determination (R2) describes the
proportion of variation explained by the model– In addition to R2, quality of model is assessed by
• Significance of the model (measured with an F statistic)
• Significance of estimated beta values (measured with t statistics)
• Assurance the residuals exhibit “white noise”– Normally distributed– Mean zero– Constant variance– Independent
Time-series Forecasting
• Two issues– Creating forecasts– Evaluating forecasts
• Time-series methods – Average sets of data to make predictions of future
events– Do not address causal issues—data sets “speak”
for themselves– Issue is seeking a balance between “responsive”
forecasting methods (respond to changes rapidly but are noisy) and filtering (averaging out of random fluctuations might be over damped)
• Nomenclature:At = actual demand in period tFt = forecast for period tn = number of observations in
forecastt = number of the current period
Time-series Forecasting
• Simple moving average
• Weighted moving average
– With
• Exponential smoothing– Simple formula
– with
– Forecast is adjusted for the error– Masks a very sophisticated weighting scheme that considers
all data points– Can be modified (Holt’s and Winter’s methods) to consider
trends and seasonal effects
nAAAAF ntttt
t
321
ntnttt AwAwAwF 2211
11
n
iiw
)( 111 tttt FAFF
10
Time-series Forecasting
• Wide assortment of forecasting methods (and possible parameters)
• Wide assortment of evaluation methods• All look at averaging the errors
• Bias is a simple average of the errors• Mean Absolute Deviation is an average of the absolute
values of the errors• Standard Deviation and Mean Squared Error measures are
averages of the squared errors• Mean Average Percentage Error is an average of the error
scaled period by period• Two major groups
– Absolute (average of errors)– Relative (average errors are scaled)
tttt FAeError
Major Evaluation Procedures
EvaluationMethod
What isDone
AbsoluteVersion
RelativeCounter Part
Bias
Mean Absolute Deviation
Standard Deviation
et = Ft-Dt
et = /Ft-Dt/
et = (Ft-Dt)2
MAD = / e /
Ni=1
N
i
SD = ( F - D )
N - 1i=1
N
i i2
BiasRelative Forecast Error =
AD*
Mean Deviation = MAD AD*
Coefficient of Variation = SD
AD*
Mean AveragePercentage Error
)100(1
ND
DF
= MAPE
N
t t
tt
*AD =D
N
tt
N
1
N
e = Bias
i
N
=1i
Mean SquaredError et = (Ft-Dt)2
N
DF = MSE
N
iii
1
2
Statistical Concepts in Quality Control
• Been there, done that– Process control charts are nothing but the sampling
distributions turned on their sides– Mean as the center and then upper and lower
control limits drawn at three standard deviations (need to decide on the size of the sample)
– Acceptance sampling is based on the binomial distribution discussed earlier• Again, decide on the sample size• Set “cutoff” values for the number of defects
found in the sample
Process Control Charts
UCL
LCL
Mean
Control Chart. Used to continuously monitor a process.
Control Charts
TypeUsage
X and R bar charts Continuous data
“P” charts Data in binary form
“C” charts Data In integer form
Illustrates when there is variation due to an assignable cause
Statistical Quality Control Models
Statistical Q.C. Models
AcceptanceSampling
O.C.Curves
SingleSampling
DoubleSampling
SequentialSampling
ProcessControl
MeanChart
ControlCharts
RangeChart
ProportionDefective
DefectsPer Unit
Linear Programming
Definition:
Allocation of scarce resources among competingactivities.
Activities are variables - Xi
where Xi is "how many" of activity i
Resources are in the form of constraints
Maximize:c1X1 + c2X2 + . . . + cnXn
Subject to:a11X1 + a12X2 + . . . + a1nXn < b1a21X1 + a22X2 + . . . + a2nXn < b2
.
.
.am1X1 + am2X2 + . . . + amnXn < bm
X1 , X2 , . . . Xn > 0
Linear Progamming Topic Summary
Issue is the optimal allocation of scare resources among competing activities.
Variables are the activities
Constraints are the resources
Linear programming requires three assumptions:
AdditivityDivisibilityLinearity in the variables
Feasible region is the set of numbers that will satisfy allconstraints simultaneously
Optimization involves finding the best point(s) in the feasibleregion. Best is defined by an objective function that is madeas large or as small as possible.
Solution procedures include graphical methods (suitable for2 decision variables) and computer algorithms.
Dual or shadow prices measure the marginal value of oneadditional unit of a resource.
The dual price remains constant over a range of right hand side values. This range is defined by other constraints. Within this range
Price Shadow*RHSZ We can evaluate the effect of changing a cost coefficient on the objective function for “small” changes in the coefficients. A small change is one where the solution (value of the Xs) stays the same. For “small” changes
X* iiCZ
A reduced cost is the amount that a coefficient will have to be changed to make it worthwhile to engage that activity. i.e., if X = 0, how much do we have to change X’s
coefficient in the objective function in order to make it worthwhile to engage in that activity (X > 0).
We then considered a smattering of classical applications: Mixing problems Transportation problems Assignment problems
Project Scheduling(including PERT and CPM)
• Some history– Both date from the 1950’s
• CPM => DuPont• PERT => Polaris submarine project
• Fundamentals– activities are unique branches (or nodes)– capturing
• time and• precedence
– additional precedence relationships can be modeled using dummy activities
– CPM (critical path method) is deterministic– Project Evaluation and Review Technique is
stochastic
Example Problem
• Issue is determining the critical path• Activities where the slack is zero• Activities that determine the expected
completion time for the network
A(2)
0
0
2
2
B(5)
2
2
7
7
C(4)
2
3
6
7
D(3)
7
7
10
10
Converting Intuition into a Mean and Standard Deviation
6
4 bmate
Optimistictime (a)
Most likely(modal) time (m)
Expectedtime (te)
Pessimistictime (b)
6ab
te
Mean and Variance of theEntire Project
• Based on summing the expected times and expected variability along the critical path
• Assumptions:– Activity times are independent– There is one critical path– Project completion time is normally
distributed• Te is the sum of the tes on the critical path
– Where tei is the expected activity time for each of the N activities along the critical path.
N
iee i
tT1
Measure of the Variability in the Expected Completion Time for the
Entire Project
• Nature adds variability through the variance not the standard deviation
BABABA 22
Te Te
Normal Distribution
N
itT iee
1
2
Inventory and Production PlanningForms of Inventory
• Consider the transformation process:– Purchasing =>
• Operations =>– Distribution
– Raw materials (RM)• Work-in-Progress (WIP)
– Finished Goods (FG)• Maintenance, Repair and Operating
Expenses (MRO)
• Manufacturing Inventory– RM + WIP + MRO
Classical Production Planning Hierarchy
Business Plan
Sales Plan
Production Plan (aggregate production plan)
Master Production Scheduling
Material Requirements Planning
Purchasing(external factory)
Shop Floor Control
Rough Cut Capacity Planning
Capacity Requirements Planning
Forecasts
Cycle Inventory Levels
Time BetweenOrders(f)
Q
Time
Q/2 = Average Cycle inventory
Demand Rate
On
Hand
Inventory
in
Units
Types of Costs
• Inventory holding rate– Time value of money– Insurance– Storage space– Risk
• Obsolescence• Loss• Damage
• Ordering Cost/Setup Cost• Stockout Costs• Backorder cost
Measuring Inventory Management
Turnover = Annual Sales (at cost) Average Inventory
Generally assumed that larger numbers are better, if . . .customer service stays high.
Coverage = Average Inventory per period sales
Weeks of Supply = Average Inventory (at cost) weekly sales
Generally assumed that smaller numbers are better if . . .customer service stays high.
MRP - The System
MPS
MRPBOMFile
Inventory Records
PlannedOrders
A
C B
C
(2 req’d)
Item A
Gross Req’ts
Sched. Rec’ts
On Hand
Planned Orders
1 2 3 4 5 6
Order Q. = 20; Lead T. = 1; Safety S. = 0
5 15 18 8 12 42
21
Item B
Gross Req’ts
Sched. Rec’ts
On Hand
Planned Orders
1 2 3 4 5 6
Order Q. = 40; Lead T. = 2; Safety S. = 0
3220
Item C
Gross Req’ts
Sched. Rec’ts
On Hand
Planned Orders
1 2 3 4 5 6
Order Q. = LFL; Lead T. = 1; Safety S. = 10
50
Special TopicsQueuing Theory
• Part of field of Stochastic Processes (those based on uncertainty)
• Among others, Markov processes– Works from “states” where issue is
probability of jumping to another brand– Seeks long term equilibrium state
• “Birth Death” processes– Describe maintenance problems where– Machines “die” when they fail and are
“reborn” when repaired• Interested in cost effective trade-off—
customer’s time versus “servers”
The Queuing System
• Arrivals—entities entering the system from a population (follows a statistical distribution—usually exponential)
• Population– Finite, limited size or group of customers
• Statistical properties change when someone enters the queuing system
– Infinite size (usual case)• Queue discipline
– Service rules (FIFO, LIFO, etc)– “Balking”– Number of servers– Line switching
Common Waiting Line Models
Common Waiting Line Models
Model LayoutSourcePopulation Service Pattern
1 Single channel Infinite Exponential
2 Single channel Infinite Constant
3 Multichannel Infinite Exponential
4 Single or Multi Finite Exponential
These four models share the following characteristics: Single phase Poisson arrival FCFS Unlimited queue length
Special TopicsSimulation
SimulationDefined
• A simulation is a computer-based model used to run experiments on a real system– Typically done on a computer– Determines reactions to different operating
rules or change in structure– Can be used in conjunction with traditional
statistical and management science techniques
Major Phases in a Simulation Study
Start
Define Problem
Construct Simulation Model
Specify values of variables and parameters
Run the simulation
Evaluate results
Validation
Propose new experiment
Stop
Data Collection & Random No. Interval Example
Suppose you timed 20 athletes running the 100-yard dash and tallied the information into the four time intervals below
Seconds 0-5.996-6.997-7.998 or more
Freq.41042
You then count the tallies and make a frequency distribution
%20502010
Then convert the frequencies into percentages
Finally, use the percentages to develop the random number intervals
RN Intervals00-1920-6970-8990-99
Accum. %207090100
You then can add the frequencies into a cumulative distribution
Types of Simulation Models
• Continuous– Based on mathematical equations– Used for simulating continuous values for
all points in time– Example: The amount of time a person
spends in a queue• Discrete
– Used for simulating specific values or specific points
– Example: Number of people in a queue
Desirable Features of Simulation Software
• Be capable of being used interactively as well as allowing complete runs
• Be user-friendly and easy to understand• Allow modules to be built and then connected • Allow users to write and incorporate their own
routines• Have building blocks that contain built-in
commands• Have macro capability, such as the ability to develop
machining cells• Have material-flow capability • Output standard statistics such as cycle times,
utilization, and wait times• Allow a variety of data analysis alternatives for
both input and output data• Have animation capabilities to display graphically
the product flow through the system• Permit interactive debugging
Advantages of Simulation
• Often leads to a better understanding of the real system
• Years of experience in the real system can be compressed into seconds or minutes
• Simulation does not disrupt ongoing activities of the real system
• Simulation is far more general than mathematical models
• Simulation can be used as a game for training experience
• Simulation provides a more realistic replication of a system than mathematical analysis
• Simulation can be used to analyze transient conditions, whereas mathematical techniques usually cannot
• Many standard packaged models, covering a wide range of topics, are available commercially
• Simulation answers what-if questions
Disadvantages of Simulation
• There is no guarantee that the model will, in fact, provide good answers
• There is no way to prove reliability• Building a simulation model can take a great deal
of time• Simulation may be less accurate than
mathematical analysis because it is randomly based
• A significant amount of computer time may be needed to run complex models
• The technique of simulation still lacks a standardized approach
Overview of Decision Analysis
• Definitions– i indexes decision alternatives– j indexes states of nature
– Decision Alternative (Ai)– States of Nature (Sj)– Payoff (Vij) (intersection of Ai and Sj)– Regret (Rij)
• Without information (game theory)– MaxiMax– MaxiMin– MiniMax regret
• Decisions with uncertainty– Expected value of perfect information– Payoff tables (newsperson problem)
• Decision trees
Decision Analysis-Decision Trees
• A decision tree is a graphical representation of every possible sequence of decision and random outcomes (states of nature) that can occur within a given decision making problem.
• A decision tree is composed of a collection of nodes (represented by circles and squares) interconnected by branches (represented by lines).
Decision Analysis-Decision Trees
General Form of a Decision Tree
Two flavors of nodes: decision and event
Decision Analysis-Decision Trees
• The value assigned to an event node is the expectation of the values that correspond to adjacent nodes.
Evaluation of event nodes
V1
V2
V3
V4
p1
p2
p3
V4 = V1 x p1 + V2 x p2 + V3 x p3
Example Decision Tree
Do not retain E.I.
3.5
3.0
2.71.0
12.5
12.4-0.5
-0.25
25
Prob. Low = 0.1
Prob. Modest = 0.4
Prob. High = 0.5
Prob. Low = 0.1
Prob. Modest = 0.4
Prob. High = 0.5
Prob. Low = 0.1
Prob. Modest = 0.4
Prob. High = 0.5
Small Development
Medium Development
Large Development
2.9
11.3
12.35
12.35
3.5
3.0
2.71.0
12.5
12.4-0.5
-0.25
25
Prob. Low = 0.261
Prob. Modest = 0.522
Prob. High = 0.217
Prob. Low = 0.261
Prob. Modest = 0.522
Prob. High = 0.217
Prob. Low = 0.261
Prob. Modest = 0.522
Prob. High = 0.217
Small Development
Medium Development
Large Development
3.065
9.478
5.174
9.478
3.5
3.0
2.71.0
12.5
12.4-0.5
-0.25
25
Prob. Low = 0.070
Prob. Modest = 0.465
Prob. High = 0.465
Prob. Low = 0.070
Prob. Modest = 0.465
Prob. High = 0.465
Prob. Low = 0.070
Prob. Modest = 0.465
Prob. High = 0.465
Small Development
Medium Development
Large Development
2.895
11.651
11.477
11.651
3.5
3.0
2.71.0
12.5
12.4-0.5
-0.25
25
Prob. Low = 0.029
Prob. Modest = 0.235
Prob. High = 0.735
Prob. Low = 0.029
Prob. Modest = 0.235
Prob. High = 0.735
Prob. Low = 0.029
Prob. Modest = 0.235
Prob. High = 0.735
Small Development
Medium Development
Large Development
2.794
12.088
18.309
18.309
13.415
Retain E.I. (-.5)
12.915
Proportion of the time E.I. predicts “low” = 0.230
Proportion of the time E.I. predicts “modest” = 0.430
Proportion of the time E.I. predicts “high” = 0.340
Final Word
Good Luck!!