2007-06-17 tpc dd basicstatistics

Upload: ravichandran-srinivasan

Post on 14-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    1/91

    Tech-Pro Consultants

    Six Sigma Basic Statistics

    March 2005Dr. K.S.Ravichandran

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    2/91

    Tech-Pro Consultants

    Bas ic Stat is t ics

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    3/91

    Tech-Pro Consultants

    Objectives

    Review & Enhance The Basic Statistical & Quality Terms Needed

    For Six Sigma Process Improvement

    Begin To Enhance Minitab Operating Skills

    Politicians Promise: if elected, I'd make certain that everybody gets an above average income

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    4/91

    Tech-Pro Consultants

    What is Statistics?

    Is the science that develops methods to effectively derive

    information from numerical data

    Statistics is a collection of scientific methods for collecting,organizing and interpreting data, usually with the goal of inferring

    certain properties of the population from a representative sample of

    the population

    science of collecting and classifying a group of facts according totheir relative number and determining certain values that represent

    characteristics of the group

    There are three kinds of Lies: Lie, Damned Lie and Statistics Mark Twain

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    5/91

    Tech-Pro Consultants

    Types of data

    Measures of the Center of the data

    Mean

    Median

    Mode

    Measures of the Spread of Data

    Range

    Variance

    Standard Deviation

    Normal Distribution and Normal Probabilities

    Process Stability and Process Capability

    Basic Statistics

    Used With Permission

    AlliedSignal 1995 -D r. Steve Zinkgraf

    Ask a statistician for her phone number... and get an estimate with 95% confidence

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    6/91

    Tech-Pro Consultants

    What sorts of data do you see beingcollected around your area?

    (List them below)

    ___________________________________________________

    ______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

    In God we trust. All others must bring data.

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    7/91Tech-Pro Consultants

    wo enera n sof Data(but 3 families)

    ATTRIBUTE DATA - The data is discrete (counted).Results from using go/no-go gages, or from the inspection ofvisual defects, visual problems, missing parts, or frompass/fail or yes/no decisions.

    VARIABLE DATA - The data is continuous(measured). Results from the actual measuring of acharacteristic such as impedance of a motor winding, tensilestrength of steel, diameter of a pipe, flow rate of a pump, etc.

    Statisticians do it discretely and continuously.

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    8/91Tech-Pro Consultants

    ATTRIBUTE DATA (Count Data)

    (#1) Number of Items in a Category (Count-Based Proportions) Heads / Tails (i.e., counting # of Heads and # of Tails)

    Yes / No (Order Form Filled Out Accurately or Not)

    Pass / Fail; Good / Bad (Accurate Billing/Overcharged)

    (#2) Counts of Discrete Event Occurrences

    # of Scratches on a Car Hood # of Errors on a Form

    # of Insulation Breaks in a Spool of Wire

    # of times customer hangs up before receiving response

    2 General Kinds of Data (but 3 families)

    Different Types Of Data Require Different Analysis Tools

    VARIABLE DATA (Continuous Measurement Scale) (#3) Continuous Data

    Decimal subdivisions are meaningful

    Ex: Time to answer the telephone ( Exact # of secs. per call)

    Just ask

    yourself,Am I

    counting

    things,

    here?

    If yes, you

    haveattributes

    data.

    Type-IAttributes

    Data

    (Binomial)

    Type-IIAttributes

    Data

    (Poisson)

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    9/91Tech-Pro Consultants

    VARIABLES

    DATA

    TYPE-I

    Any Bubbles?(accept / reject

    the entire item)

    TYPE-II

    Number ofBubbles?

    Reject Reject Accept Reject

    3 2 0 4ATTR

    IBUTESDATA

    Sample#1 Sample#2 Sample#3 Sample#4

    3 Families of Data:

    AmIC

    ountingThings

    ?

    (D

    iscreteData)

    (ContinuousD

    ata)

    (Measurement

    Data)

    Poisson

    Distrib

    ution

    Binomial

    Distribut

    ion

    NormalDistribu

    tion

    orOther

    Manufacturing Process: Making Sheets of Glass

    Weight = 12.2 Weight = 12.4 Weight = 12.1

    Glass

    Weight

    Weight = 11.9

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    10/91Tech-Pro Consultants

    VARIABLES

    DATA

    TYPE-I

    Any Errors?(accept / reject

    the entire item)

    TYPE-II

    Number ofErrors on Form?

    Reject Reject Accept Reject

    3 2 0 4ATTR

    IBUTESDATA

    Form#1 Form#2 Form#3 Form#4

    3 Families of Data:

    AmI

    CountingThings

    ?

    (D

    iscreteData)

    (ContinuousD

    ata)

    (Measurement

    Data)

    Poisson

    Distrib

    ution

    Binomial

    Distribut

    ion

    NormalDistribution

    orOther

    Transactional Process: Converting an expense account forminto a reimbursement check

    Time to

    Reimburse

    Employee36.1 hrs 24.6 hrs 21.0 hrs 29.2 hrs

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    11/91Tech-Pro Consultants

    Sample at 8:00am Sample at 9:00am Sample at 10:00am

    Sample

    (n)

    Number

    (np, c)

    Proportion

    (p,u)

    Date

    (Shift, Time, etc.)

    30%

    20%

    10%

    40%

    8:00am

    Pass/FailData

    9:00am

    She tells you are just Average: never mind, she is just being Mean

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    12/91Tech-Pro Consultants

    Sample

    (n)

    Number

    (np, c)

    Proportion

    (p,u)

    Date

    (Shift, Time, etc.)

    8:00am8:10am

    3

    2

    1

    4

    Number of

    Blemishes

    Data8:00am

    8:10am8:20am

    8:50am

    9:00am

    9:10am

    etc.

    etc.8:30am

    8:40am

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    13/91Tech-Pro Consultants

    Exercise: Which Type of Data Is It?

    (1) Percent defective parts in hourly production

    (2) Percent cream content in milk bottles (comes in four-bottle container sets)

    (3) Amount of time it takes to respond to a request

    (4) Number of blemishes per square yard of cloth, where pieces of cloth may be of variablesize

    (5) Daily test of water acidity (pH)

    (6) Number of raisins per box of Raisin Bran

    (7) Number of defective parts in lots of size 100

    (8) Length of screws in samples of size ten from production lots

    (9) Number of errors on a purchase order

    DIRECTIONS: For each of the following applications, identify the type of data you

    would be investigating (Attributes Type-I, Attributes Type-II, or Variables Data)

    ... AND EXPLAIN YOUR CHOICE

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    14/91Tech-Pro Consultants

    What is the largest probability possible? _______What does this mean?

    What is the smallest probability possible? _______What does this mean?

    What does a probability of 0.50 mean? _______________

    What is the probability you will be struck by lightning during yourlifetime? _____________________

    What are your chances of appearing on The Tonight Show?___________________

    What is the probability of being killed by terrorists overseas?____________________

    What are your chances of being killed by an American in Baltimore?_______________

    The Probability Test

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    15/91Tech-Pro Consultants

    What is the largest probability possible?___1.0 = 100%__What does this mean?

    What is the smallest probability possible?___0.0 = 0%__What does this mean?

    What does a probability of 0.50 mean? 50% Just flip a coin What is the probability you will be struck by lightning during your

    lifetime? 0.000001667 = 1/600,000

    What are your chances of appearing on The Tonight Show?0.00000204 = 1/490,000

    What is the probability of being killed by terrorists overseas?0.000001538 = 1/650,000

    What are your chances of being killed by an American in Baltimore?0.00025 = 1/4,000

    The Probability Test

    Instructor Page

    Answers

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    16/91Tech-Pro Consultants

    Roll a fair die once, what is Prob(a six)? ______

    Roll a fair die twice, what is Prob(a six on the second roll)?__

    Roll two fair dice, what is Prob(get two sixes)?____________

    What do you think of the recent headline, Education

    research shows 49.5% of all American high school studentsfall below the national average!

    The Probability Test (cont.)

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    17/91Tech-Pro Consultants

    The Customer Requirements

    Suppose a certain customer permits only those

    combinations which yield 3, 4, 5, . . . , or 11.

    What is the process capability?

    What is the probability of meeting the requirements?

    Are capability and probability related?

    Probability

    Used With Permission

    6 Sigma Academy Inc. 1 995

    The Practical Problem Statement ...

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    18/91Tech-Pro Consultants

    1 2 3 4 5 6

    1

    2

    3

    4

    5

    6

    2 3 4 5 6 7

    3 4 5 6 7 8

    4 5 6 7 8 9

    5 6 7 8 9 10

    6 7 8 9 10 11

    7 8 9 10 11 12

    Computing the Risks- The Statistical Problem Statement

    Ways to form a 2 in =

    Ways to form a 12 in =

    Probability of Defect

    Used With Permission

    6 Sigma Academy Inc. 1 995

    i i i

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    19/91Tech-Pro Consultants

    Deeper Insight Into Probability

    Die 1 Die 2 Probability

    1 4 .0278

    2 3 .0278

    3 2 .0278

    4 1 .0278

    Total .1111

    What is the probability of

    rolling a 5 using a fair pair

    of dice?

    1 2 3 4 5 6

    1 .0278 .0278 .0278 .0278 .0278 .0278

    2 .0278 .0278 .0278 .0278 .0278

    3 .0278 .0278 .0278 .0278 .0278

    4 .0278 .0278 .0278 .0278 .0278

    5 .0278 .0278 .0278 .0278 .0278 .0278

    6 .0278 .0278 .0278 .0278 .0278 .0278

    .0278

    .0278

    .0278

    Used With Permission

    6 Sigma Academy Inc. 1 995

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    20/91Tech-Pro Consultants

    Establishing the Odds

    Value Combinations Probability

    2 1 .0278

    3 2 .0556

    4 3 .0833

    5 4 .1111

    6 5 .1389

    7 6 .1667

    8 5 .1389

    9 4 .1111

    10 3 .0833

    11 2 .0556

    12 1 .0278Total 36 1.0000

    Probability of any given value on Die 1 = 1/6 = .1667

    Probability of any given value on Die 2 = 1/6 = .1667

    Probability of any given combination = 1/6 x 1/6 = 1/36 = .0278 Used With Permission 6 Sigma Academy Inc. 1 995

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    21/91Tech-Pro Consultants

    Graphing the Results

    . . .Hence, the probability of Customer Satisfaction is 94.4 %

    Zone of Customer Satisfaction 94.4%

    18

    16

    14

    12

    10

    8

    6

    4

    2

    2 1210864 140Total of Dice Values

    2.8%2.8% LSL USL

    Suppose a certain customer permits only those

    combinations which yield 3, 4, 5, . . . , or 11.

    Value Combinations Probability

    2 1 .0278

    3 2 .0556

    4 3 .0833

    5 4 .1111

    6 5 .13897 6 .1667

    8 5 .1389

    9 4 .1111

    10 3 .0833

    11 2 .0556

    12 1 .0278

    Total 36 1.0000

    Used With Permission

    6 Sigma Academy Inc. 1 995

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    22/91Tech-Pro Consultants

    Statistical Distributions

    We can describe the behavior of any process or

    system by plotting multiple data points for the

    same variable

    Over time

    Across products or business

    By different people, machines, etc...

    The accumulation of these data can be viewed as

    a distribution of values

    Represented by: Dot plots

    Histograms

    Normal curve or other smoothed distributionUsed With Permission

    AlliedSignal 1995 - Dr. Steve Zinkgraf

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    23/91Tech-Pro Consultants

    Y = Weight (lbs) 220160 100

    Process = Hose

    1 Drop = 1 Unit of Output

    Histogram is ...a pile of individual values

    Dotplot: :

    : :

    . : . : :

    : : : : : : : : : : : . :

    . . ::.::::: :.:::.:.:.:.: : : : : . : . .

    -----+---------+---------+---------+---------+---------+-C1

    100 125 150 175 200 225

    D t Pl t

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    24/91Tech-Pro Consultants

    Dot Plots

    1st Observation2nd Observation

    1.11.01.15

    1.21.25

    1.31.35

    1.41.05

    Suppose we have a manufacturing line that is producing shafts.

    Diameters range from 1.0 to 1.4 inches. As we make a measurement of a

    shaft, we record the value with a dot on the above scale

    Ex:

    1st Observation = 1.4 inches

    2nd Observation = 1.1 inches

    Diameter

    D t Pl t

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    25/91Tech-Pro Consultants

    And Suppose we continue sampling until 150 shafts have been measured

    What Statements Can You Make About Our Process ?

    :: :::. . .

    :.. :::::: : :

    . :.. ..::::::::::::::: :: ::.:..: .::::::::::::::::::::::::.:::..:.: .

    Dot Plots

    1.11.01.15

    1.21.25

    1.31.35

    1.41.05

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    26/91

    Tech-Pro Consultants

    :: :::. . .

    :.. :::::: : :

    . :.. ..::::::::::::::: :: ::.:..: .::::::::::::::::::::::::.:::..:.: .

    Dot Plots

    1.11.01.15

    1.21.25

    1.31.35

    1.41.05

    Now imagine the same data, grouped into intervals

    with bars used to represent how the data looks.

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    27/91

    Tech-Pro Consultants

    Histogram Distribution

    1.11.051.0

    35

    30

    25

    20

    15

    10

    5

    0

    Frequency

    1.15 1.2 1.25 1.3 1.35 1.4

    Data represented just with the dots is called a Dot Plot

    Using data represented in the above bar format is called a Histogram

    Hi

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    28/91

    Tech-Pro Consultants

    Histogram

    Now weve combined the Histogram with our Lower and Upper Specifications.

    Question #1 : What are Specifications ? Where do they come from ?

    Question #2: What can you say about our process now ?

    1.11.01.15

    1.21.25

    1.31.35

    1.41.05

    Upper SpecificationLower Specification

    .001 2.0

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    29/91

    Tech-Pro Consultants

    Histogram

    Suppose the customer has given us new specifications !

    Question: What can you say about our process now ?

    1.11.01.15

    1.21.25

    1.31.35

    1.41.05

    Lower Specification

    1.1

    Upper Specification

    1.3

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    30/91

    Tech-Pro Consultants

    Dotplot Distribution

    Imagine a customer service help line in which the business knows that to

    stay competitive, it must return the customers telephone calls in less

    than 30 minutes. The actual response time was measured 150 times and

    plotted above.

    : : :::. . .

    :.. :::::: : :

    . :.. ..::::::::::::::: :: ::.:..: .::::::::::::::::::::::::.:::..:.: .

    -+---------+---------+---------+---------+-------

    28.0 29.0 30.0 31.0 32.0

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    31/91

    Tech-Pro Consultants

    3432302826

    Upper SpecLower Spec

    Time

    Smoothed (Normal) Distribution

    Finally, we can view the data as a smoothed distribution (red line), in this

    example using the normal distribution assumption. It provides an

    approximation of how the data might look if we were to collect an infinite

    number of data pointsUsed With Permission

    AlliedSignal 1995 - Dr. Steve Zinkgraf

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    32/91

    Tech-Pro Consultants

    Forming the Normal Curve

    Uni ts of M easurem

    Center of the bar

    Smooth curve interconnecting

    the center of each bar

    Area of Yield

    Performance

    Limit

    Probability

    of a Defect

    p(x > a) = 1 2

    e-(1/2)[(x - m)/]2

    a

    dx

    + infinity- infinity

    Given that 100% of the area

    under the normal curve liesbetween , we may

    calculate that area which lies

    beyond the performance limit.

    Doing so would reveal the

    random chance probability of

    creating a defect.

    Note: The tails of the normal curve will touch the baseline at infinity. Used With Permission

    6 Sigma Academy Inc. 1 995

    a

    Basic Statistics

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    33/91

    Tech-Pro Consultants

    Types of data

    Measures of the Center of the Data

    Mean

    Median

    Mode

    Measures of the Spread of Data

    Range

    Variance

    Standard Deviation

    Shape: Normal Distribution and Normal Probabilities

    Process Stability and Process Capability

    Basic Statistics

    Used With Permission

    AlliedSignal 1995 - Dr. Steve Zinkgraf

    D t E l

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    34/91

    Tech-Pro Consultants

    Data Example(Actual # of Days from Order to Ship)

    140 170 215 130 136 130 150

    145 175 150 155 123 120 110

    160 175 145 150 155 130 116

    190 170 155 148 140 131 108 155 180 155 155 120 120 95

    165 135 150 150 130 118 125

    150 170 155 140 138 125 133

    190 157 150 180 121 135 110

    195 130 180 190 125 125 150 138 185 160 145 116 118 108

    160 190 135 150 145 122

    155 155 160 164 150 115

    153 170 140 112 102

    145 155 142 125 115

    Where is the Center of the Data?

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    35/91

    Tech-Pro Consultants

    Mean = The average value(the Center of Gravity)

    Where is the Center of the Data?

    Decribed in 2 ways:

    - Uses all data points- Heavily influenced byextreme values

    X =Sum of the data points

    Number of data points

    Median = the 50% point,(or the middle number)

    To find the median of a data set,

    (1) arrange data in order fromsmallest to largest

    (2) the middle number is the median!

    1, 2, 3, 14, 85

    The median is 3

    - Not heavily influenced byextreme values

    As head of the universitys Communications Dept. you are asked

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    36/91

    Tech-Pro Consultants

    What is the average income

    (or center of gravity)?

    $10, 20, 30, 40, 50 ($ in thousands)

    What is the median

    income?

    to summarize the average starting salaries of Communications

    graduates.

    $10, 20, 30, 40, 5000 ($ in thousands)

    What is the average income

    (or center of gravity)?

    What is the median

    income?

    However, under the advice of the Public Relations Dept. you consider

    to including one of your former Communications majors:

    Shaquille ONeal (a rather wealthy rookie basketball star)

    Where is the Center?

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    37/91

    Tech-Pro Consultants

    Mode (not used as much): The value that occurs most often.

    The Mode may not exist; and if does exist, it may

    not be unique.

    -Can be used with categorical/attribute data

    Where is the Center?

    What is the mode for the following set of defect data?

    # of change notices issued:

    -Price change: 13

    -Spec change: 112

    -Ship to address change: 40

    -Delivery date changed: 79

    What doesBimodal

    mean?

    rea ou

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    38/91

    Tech-Pro Consultants

    rea ouExample

    Suppose your son or daughter isconsidering going to work for a small, familyowned business after graduation. The

    owner of the business proudly states that,of the last 7 college graduates hired, themean salary was $25,000; the salaries werebimodal, with modes of $18,000 and

    $20,000; and the median salary was$19,000. He refuses to identify theindividual salaries

    From Introductory Statistics William D. Ergle

    Exercise

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    39/91

    Tech-Pro Consultants

    Exercise

    Minitab can easily calculate the Mean and Median

    1. Open up Minitab

    2. Open file: Distskew.mtw

    3. Perform The Following

    Stat>Basic Statistics>

    Descriptive Statistics>

    4. Enter The Variables Names

    5. Evaluate Results

    D i ti St ti ti F 3 Di t ib ti

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    40/91

    Tech-Pro Consultants

    TABULAR FORM

    Variable N Mean Median TrMean StDev

    Normal 500 70.000 69.977 70.014 10.000

    Pos Skew 500 70.000 65.695 68.554 10.000

    Neg Skew 500 70.000 73.783 71.368 10.000

    Descriptive Statistics For 3 Distributions

    Look For This In Your Session Window !

    Graphical Form

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    41/91

    Tech-Pro Consultants

    Graphical Form

    Different Distributions

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    42/91

    Tech-Pro Consultants

    1101009080706050403020

    100

    50

    0

    C1

    Frequency

    Comparison of Distributions.

    Sketch in the Means and Medians on each Distribution.

    Negative Skew Positive Skew

    Symmetric

    Distribution

    80706050403020100

    300

    200

    100

    0

    C3

    Frequency

    Comparison of Distributions.

    Tail

    13012011010090807060

    300

    200

    100

    0

    C2

    Frequency

    Comparison of Distributions.

    Tail

    Different Distributions

    Used With Permission

    AlliedSignal 1995 - Dr. Steve Zinkgraf

    Graphical Reminder

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    43/91

    Tech-Pro Consultants

    Graphical Reminder

    * The 3 Charts On The Previous Page

    Were Created Under The Minitab Histogram OptionGraph>Histogram

    Relationship Of The Mean & Median

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    44/91

    Tech-Pro Consultants

    1101009080706050403020

    100

    50

    0

    Normal

    Frequency

    Mean, Median

    80706050403020100

    300

    200

    100

    0

    Neg Skew

    Frequency

    MedianMean

    13012011010090807060

    300

    200

    100

    0

    Pos Skew

    Frequency

    Median Mean

    Relationship Of The Mean & Median

    Used With Permission

    AlliedSignal 1995 - Dr. Steve Zinkgraf

    Basic Statistics

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    45/91

    Tech-Pro Consultants

    Types of data

    Measures of the Center of the Data

    Mean

    Median

    Mode

    Measures of the Spread of Data

    Range

    Variance

    Standard Deviation

    Normal Distribution and Normal Probabilities

    Process Stability and Process Capability

    Basic Statistics

    Used With Permission

    AlliedSignal 1995 - Dr. Steve Zinkgraf

    Population Parameters vs Sample Statistics

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    46/91

    Tech-Pro Consultants

    Population Parameters vs Sample Statistics

    m = Population Mean = Population Standard Deviation

    Examples of

    POPULATION:

    Entire United States

    Yrs. Worth of Acct. Payable

    Every Grain of Sand On The Beach

    Examples of SAMPLE:

    1000 US Citizens

    Hrs. Worth of Acct.

    Pay

    Handful of Sand

    ^ = Sample Standard DeviationX= Sample Mean

    s =

    3 a s to describe ho far the

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    47/91

    Tech-Pro Consultants

    Range = R the difference between largest

    and smallest observations

    Standard Deviation = s

    Variance = s2 (just the square of the std dev!)

    3 ways to describe how far the

    data is spread:

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    48/91

    Tech-Pro Consultants

    Avg = ___

    Sum of thelast column

    = _______

    Divide theSum by (n-1):= Variance = S2

    = __________

    X =Sum of the data points

    Number of data points

    X5

    4

    3

    1

    2

    X2

    -1

    X X4

    1

    X X2

    Square Root ofthe Variance= Std.Dev. = S= _________

    S S 2

    Calculate manually the Variance and Standard

    Deviation of These 5 Data Points

    S2

    CLASS EXERCISE

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    49/91

    Tech-Pro Consultants

    Avg = 3

    Sum of thelast column

    = 10

    Divide theSum by (n-1):= Variance = S2

    = 2.5

    X =Sum of the data points

    Number of data points

    X5

    4

    3

    1

    2

    X2

    1

    0

    -2

    -1

    X X4

    1

    0

    4

    1

    X X2

    Square Root ofthe Variance= Std.Dev. = S= 1.58

    S S 2

    Calculate manually the Variance and Standard

    Deviation of These 5 Data Points

    S2

    CLASS EXERCISE

    Instructor Page

    Computational Equations

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    50/91

    Tech-Pro Consultants

    Computational Equations

    Population Mean

    m =

    X

    N

    i

    i

    N

    1

    Sample Mean

    Population Standard

    Deviation

    m

    =

    (X )

    N

    i

    2

    i=1

    N

    Sample Standard

    Deviation

    x =

    x

    n

    i

    i=1

    n

    s =

    (X )

    n -1

    i

    2

    i=1

    N

    X

    Used With Permission

    6 Sigma Academy Inc. 1 995

    The Standard Deviation

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    51/91

    Tech-Pro Consultants

    The Standard Deviation

    mPoint of Inflection

    1

    T USL

    p(d)

    Upper Specification Limit (USL)

    Target Specification (T)

    Lower Specification Limit (LSL)Mean of the distribution (m)Standard Deviation of the distribution () 3

    The distance between the point of inflection and

    the mean constitutes the size of a standard

    deviation. If three such deviations can be fit

    between the target value and the specification limit,

    we would say the process has three sigma

    capability.

    Used With Permission

    6 Sigma Academy Inc. 1 995

    Basic Statistics

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    52/91

    Tech-Pro Consultants

    Types of data

    Measures of the Center of the Data

    Mean

    Median

    Mode

    Measures of the Spread of Data

    Range

    Variance

    Standard Deviation

    Normal Distribution and Normal Probabilities

    Process Stability and Process Capability

    Basic Statistics

    Used With Permission

    AlliedSignal 1995 - Dr. Steve Zinkgraf

    The Normal Distribution

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    53/91

    Tech-Pro Consultants

    The Normal Distribution

    The Normal Distribution is a distribution ofdata which has certain consistent properties

    These properties are very useful in our

    understanding of the characteristics of the

    underlying process from which the data wereobtained

    Most natural phenomena and man-made

    processes are distributed normally, or can be

    represented as normally distributed

    Used With Permission

    AlliedSignal 1995 - Dr. Steve Zinkgraf

    The Normal Distribution

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    54/91

    Tech-Pro Consultants

    Property 1: A normal distribution can bedescribed completely by knowing only the:

    mean, and

    standard deviation

    The Normal Distribution

    Distribution One

    Distribution

    Two

    Distribution Three

    What is the difference among these three normal distributions?

    Used With Permission

    AlliedSignal 1995 - Dr. Steve Zinkgraf

    Statistical Number Line

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    55/91

    Tech-Pro Consultants

    X Axis

    m3 m2 m1 m m+1 m+2 m+3

    300

    Suppose the weights of players on a footballteam had m=300 lbs and =10 lbs

    You fill in the X-axis values (weights) above

    Exercise(pounds)

    add 10 add 10 add 10

    Statistical Number Line

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    56/91

    Tech-Pro Consultants

    Statistical Number Line

    X Axis

    m3 m2 m1 m m+1 m+2 m+3

    300 310 320 330270 280 290

    Suppose the weights of a football teamhad m=300 lbs and =10 lbs

    You fill in the X-axis values (weights)

    Exercise

    Instructor Page

    (pounds)

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    57/91

    Tech-Pro Consultants

    X Axism3 m2 m1 m m+1 m+2 m+3300 310 320 330270 280 290 (pounds)

    68%

    m + 1= 68%ofthe individuals

    Instructor Page

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    58/91

    Tech-Pro Consultants

    X Axism3 m2 m1 m m+1 m+2 m+3300 310 320 330270 280 290 (pounds)

    95%

    m + 2= 95%of the individuals

    Instructor Page

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    59/91

    Tech-Pro Consultants

    X Axism3 m2 m1 m m+1 m+2 m+3300 310 320 330270 280 290

    m + 3= 99.7%of the individuals

    (pounds)

    99.7%

    Instructor Page

    The Normal Curve and Probability Areas

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    60/91

    Tech-Pro Consultants

    Associated with the Standard Deviation

    43210-1-2-3-4

    40%

    30%

    20%

    10%

    0%

    68%

    95%

    Probabilityofsampleva

    lue

    Number of standard deviations from the mean

    99.73%

    Property 2: The area under sections of the curve

    can be used to estimate the cumulative probability

    of a certain event occurring

    Cumulative probability

    of obtaining a valuebetween two values

    Used With Permission

    AlliedSignal 1995 - Dr. Steve Zinkgraf

    Empirical Rule of Standard Deviation

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    61/91

    Tech-Pro Consultants

    p

    Number of

    StandardDeviations

    TheoreticalNormal

    EmpiricalNormal

    +/- 168% 60-75%

    +/- 2 95% 90-98%

    +/- 3 99.7% 99-100%

    The previous rules of cumulative probability apply even when a set of data is

    not perfectly normally distributed. Lets compare the values for a theoretical

    (perfect) normal distributions to empirical (real-world) distributions

    Used With Permission

    AlliedSignal 1995 - Dr. Steve Zinkgraf

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    62/91

    Tech-Pro Consultants

    How can I tell if my data is bell-shaped?(i.e., Normally Distributed)

    Normal Probability Plots

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    63/91

    Tech-Pro Consultants

    Normal Probability Plots

    We can test whether a given data set can be described as normal

    with a test called a Normal Probability Plot

    If a distribution is close to normal, the normal probability plot will be astraight line.

    Minitab makes the normal probability plot easy. Using Distskew.Mtw.Choose: Stat>Basic Stats>Normality Tests

    Produce a normal plot of each of the first 3 columns. Which appear tobe normal?

    3 Ways To See If Your Data Is NormallyDistributed

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    64/91

    Tech-Pro Consultants

    Distributed

    80706050403020100

    300

    200

    100

    0

    C3

    Frequency

    Normal Probability Plots

    13012011010090807060

    300

    200

    100

    0

    C2

    Frequency

    Normal Probability Plots

    1101009080706050403020

    100

    50

    0

    C1

    Frequen

    cy

    Normal Probability Plots

    1069686766656463626

    .999

    .99

    .95

    .80

    .50

    .20

    .05

    .01

    .001

    Probability

    Normal

    p-value: 0.328

    A-Squared: 0.418

    Anderson-Darling Normality Test

    N of data: 500

    Std Dev: 10

    Average: 70

    Normal Distribution

    13012011010090807060

    .999.99

    .95

    .80

    .50

    .20

    .05

    .01

    .001

    Probability

    Pos Skew

    p-value: 0.000

    A-Squared: 46.447

    Anderson-Darling Normality Test

    N of data: 500

    Std Dev: 10

    Average: 70

    Positive Skewed Distribution

    80706050403020100

    .999

    .99

    .95

    .80

    .50

    .20

    .05

    .01

    .001

    Probability

    Neg Skew

    p-value: 0.000

    A-Squared: 43.953

    Anderson-Darling Normality Test

    N of data: 500

    Std Dev: 10

    Average: 70

    Negative Skewed Distribution

    Used With Permission

    AlliedSignal 1995 -Dr. Steve Zinkgraf

    If the NormalityTest shows a

    P-value that is

    lessthan 0.05,then the data is

    NOT

    represented

    well by anormal

    distribution

    P Value for Normality Test

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    65/91

    Tech-Pro Consultants

    y

    If your P value is lessthat than .05, thenthe data is NOT approximately normal.

    Mystery Distribution

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    66/91

    Tech-Pro Consultants

    y y

    Generate a Normal Probability Plot for the Mystery variable

    in Mystery.mtw

    What is your conclusion? Is this a normal distribution?

    15010050

    .999

    .99

    .95

    .80

    .50

    .20

    .05

    .01

    .001

    Probability

    Mystery

    p-value: 0.000

    A-Squared: 27.108

    Anderson-Darling Normality Tes t

    N of data: 500

    Std Dev: 32.3849

    Average: 100

    Mystery Distribution

    Used With Permission

    AlliedSignal 1995 - Dr. Steve Zinkgraf

    Central Limit Theorem

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    67/91

    Tech-Pro Consultants

    The central limit theorem states that the distribution of the sample means, our estimate of m, can beapproximated with a normal distribution even though the original population may be non-normal.

    Given this, we may say that the grand average (resulting from averaging sets of samples) approachesthe universe mean as the number of sample sets approaches infinity. This property is at the core ofmany statistical tests and is very important for resolving a wide array of industrial problems.

    Random sample of g sets with n measurements assigned to each set

    Various sampling distributions of individual measurements

    XX

    Used With Permission

    6 Sigma Academy Inc. 1995

    For more detail, see

    the next few pages.

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    68/91

    Tech-Pro Consultants

    The Distribution of Averages

    The Distribution of Individuals

    VS

    Important Distinctions:

    What would the Distribution ofIndividuals look like?

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    69/91

    Tech-Pro Consultants

    Individuals look like?

    = Individual Measurement

    = Average of the SubgroupFlashlight

    Y = Lifetime(Hrs)96 85 74

    Y = Lifetime(Hrs)96 85 74

    ? ?

    The Distribution

    of Individuals

    What would the Distribution of

    I di id l l k lik ?

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    70/91

    Tech-Pro Consultants

    Individuals look like?

    = Individual Measurement

    = Average of the SubgroupFlashlight

    Y = Lifetime(Hrs)96 85 74

    Y = Lifetime(Hrs)96 85 74

    The Distribution

    of Individuals

    What would the Distribution ofAverages look like?

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    71/91

    Tech-Pro Consultants

    Averages look like?

    = Individual Measurement

    = Average of the Subgroup

    Y = Weight (lbs)

    10.5 10 9.5

    The Distribution of Averages?

    What would the Distribution of Averages look like?

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    72/91

    Tech-Pro Consultants

    What would the Distribution of Averages look like?

    = Individual Measurement

    = Average of the Subgroup

    Y = Weight (lbs)

    10.5 10 9.5

    The Distribution of Averages

    Distribution of Individuals Distribution of Averages

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    73/91

    Tech-Pro Consultants

    X

    Distribution of Individuals Distribution of Averages

    A Pile of Individuals A Pile of X-Bars

    Spread is...

    X

    X

    n

    Histogram is...

    1 Individual 1 Avg (i.e., 1 X-Bar)1 point is ...

    What is the probability that theaverage lifetime of an n=20 samplewill exceed 87 hours?

    What is the probability thatan individual battery will lastbeyond 87 hours?

    The questionmight be...

    8574 96 8574 96

    Compressed by n

    Graphically...

    SE(Mean)

    Dist of Avgs spread compresses by factor of n

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    74/91

    Tech-Pro Consultants

    97

    95

    93

    91

    89

    87

    85

    83

    81

    79

    77

    75

    73

    __

    _

    _

    __

    _

    _

    __

    __

    __

    _

    _

    __

    _

    _

    __

    _

    _

    __

    _

    _

    __

    _

    _

    __

    _

    _

    __

    __

    __

    _

    _

    __

    _

    _

    __n=20 n=50n=12n=4n=2n=1

    Dist. of Avgs spread compresses by factor of n

    X

    X

    n

    Individ

    uals

    97

    95

    93

    91

    89

    87

    85

    83

    81

    79

    77

    75

    73

    __

    _

    _

    __

    _

    _

    __

    __

    __

    _

    _

    __

    _

    _

    __

    _

    _

    ___

    _

    __

    _

    _

    __

    _

    _

    __

    __

    __

    _

    _

    __

    _

    _

    __

    Basic Statistics

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    75/91

    Tech-Pro Consultants

    Types of data

    Measures of the Center of the Data Mean

    Median

    Mode

    Measures of the Spread of Data

    Range

    Variance

    Standard Deviation

    Normal Distribution and Normal Probabilities

    Process Stability and Process Capability

    Used With Permission

    AlliedSignal 1995 - Dr. Steve Zinkgraf

    Basic Statistics

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    76/91

    Tech-Pro Consultants

    Variability Is the process on target with minimum variability?

    We use the mean to determine if process is on target.

    We use the Standard Deviation determine variability Stability

    How does the process perform over time?Represented by a constant mean and predictable variability over time.

    Which process is the best process? Used With PermissionAlliedSignal 1995 - Dr. Steve Zinkgraf

    2520151050

    80

    70

    60

    50

    Sample Number

    Sample

    Mean

    X-Bar Chart for Process B

    X=70.98

    UCL=77.27

    LCL=64.70

    2520151050

    75

    70

    65

    Sample Number

    Sample

    Mean

    X-Bar Chart for Process A

    X=70.91

    UCL=77.20

    LCL=64.62

    Variation

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    77/91

    Tech-Pro Consultants

    While every process displays Variation, some processes display

    controlled variation, while other processes display uncontrolledvariation (Walter Shewhart).

    . Controlled Variation is characterized by a stable and consistentpattern of variation over time. Associated with Common Causes.

    Uncontrolled Variation is characterized by variation that changesover time. Associated with Special Causes.

    Process A shows controlled variation.

    Process B shows uncontrolled variation

    Special Causes

    2520151050

    75

    70

    65

    SampleNumber

    SampleMean

    X-Bar Chart for Process A

    X=70.91

    UCL=77.20

    LCL=64.62

    2520151050

    80

    70

    60

    50

    SampleNumber

    Sample

    Mean

    X-Bar Chart for Process B

    X=70.98

    UCL=77.27

    LCL=64.70

    Used With Permission

    AlliedSignal 1995 - Dr. Steve Zinkgraf

    Can We Tolerate Variability ?

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    78/91

    Tech-Pro Consultants

    There will always be variability present in any process We can tolerate variability if

    The total variability of the Output is relatively small compared to theprocess specifications and the process is on target

    The process is stable over time

    LSL USLNom USL

    LSL USLNom

    Acceptable

    Cost

    Cost

    OLD

    New

    Traditional

    Goal Post

    Mentality

    Used With Permission

    AlliedSignal 1995 - Dr. Steve Zinkgraf

    Expanding On The Goal Post Mentality

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    79/91

    Tech-Pro Consultants

    LSL USLNom

    UNDER THE OLD RULES,

    The field goal kicker gets 3 points for his team as long as

    the ball falls between the LSL and USL.

    3 Points

    Expanding On The Goal Post Mentality

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    80/91

    Tech-Pro Consultants

    LSL USLNom

    UNDER THE NEW RULES,

    The Field Goal Kicker Might Get...3 points Target & +/-12 points Between +/-1 & +/-21 point > +/-2 Out To The LSL & USL

    321 2 1Points

    Data Analysis Tasks For Improvement

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    81/91

    Tech-Pro Consultants

    Determine If Process Is Stable

    If process is no tstable, identify and remove causes of

    instability

    Determine The Location Of The Process Mean.

    Is It On Target?

    If not, identify the variables which affect the mean and

    determine optimal settings to achieve target value

    Estimate The Magnitude Of The Total Variability. Is

    i t acceptable with respect to the c ustom er requirements (spec l imi ts)? If not, identify the sources of the variability and eliminate or

    reduce their influence on the process

    Used With Permission

    AlliedSignal 1995 - Dr. Steve Zinkgraf

    Visualizing the Process Dynamics - Is TheProcess Stable ?

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    82/91

    Tech-Pro Consultants

    Inherent Capability of the

    Process

    General Assumptions::

    Over time, a typical process

    will shift and drift by approx. 1.5

    . . . also called short-term capability

    Time 1

    Time 2

    Time 3

    Time 4

    TLSL USL

    Sustained Capability of theProcess . . . also called long-term capability

    Used With Permission

    6 Sigma Academy Inc. 1 995

    The Goal Is ...

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    83/91

    Tech-Pro Consultants

    Variables Data

    0% Rejected

    Target

    Attributes Data

    How We Progress Toward The Goal

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    84/91

    Tech-Pro Consultants

    PHASE ONE - Unpredictable Performance

    - VARIATION (SPECIAL / NATURAL CAUSES)

    - UNPREDICTABLE (HOURLY, DAILY)

    - DETECT AND ELIMINATE SPECIAL CAUSES

    PHASE TWO - Stability

    - IN CONTROL

    - NATURAL VARIATION ONLY

    Not capable of getting

    all the water output into

    the clowns mouth?

    How We Progress Toward The Goal UpperSpecificationLower Specification

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    85/91

    Tech-Pro Consultants

    IN CONTROL, BUT NOT CAPABLE

    (Variation from common causes excessive)

    IN CONTROL AND CAPABLE

    (Variation from common causes reduced)

    SIZE

    LOWERSPECIFICATION

    LIMIT UPPER

    SPECIFICATION

    LIMIT

    Now it is capable of

    getting all the water output

    into the clowns mouth

    1.11.01.15

    1.21.25

    1.31.35

    1.41.05.001 2.0

    Is The Process on Target ? - Accurate ?

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    86/91

    Tech-Pro Consultants

    USL

    Part

    T

    LSL

    Recognize that the process center (m) is

    independent of the design center (T). In

    other words, the ability of a process to

    repeat any given centering condition is

    independent of the design specifications.

    1.233 1.235 1.239 1.241 1.243 1.245 1.2471.237

    m ManufacturingDistribution of the Widget

    Part

    54321

    Increase in nonconformance due

    to shift in process centering

    Used With Permission

    6 Sigma Academy Inc. 1 995

    Is The Process on Target ? - Precise?

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    87/91

    Tech-Pro Consultants

    1.235 1.237 1.239 1.241 1.243 1.245 1.247

    USL

    Part

    T

    LSL

    Recognize that the process width is

    independent of the design width. In

    other words, the inherent precision of

    a process is not determined by the

    design specifications.

    Manufacturing Distribution

    of the Widget Part

    Used With Permission

    6 Sigma Academy Inc. 1 995

    Is The Variability AcceptableTo Customer Requirements ?

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    88/91

    Tech-Pro Consultants

    USL Y = f (X1 . . . XN)

    The variation inherent to any dependent variable (Y) is determined by

    the variations inherent to each of the independent variables.

    LSL

    Poor Process

    Capability

    LSL USL

    Very High

    Probabilityof Defects

    Very High

    Probabilityof Defects

    LSL USL

    ExcellentProcess

    Capability

    Very Low

    Probabilityof Defects

    Very Low

    Probabilityof Defects

    Used With Permission

    6 Sigma Academy Inc. 1 995

    Summary

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    89/91

    Tech-Pro Consultants

    Reviewed & Enhanced The Basic Statistical & Quality TermsNeeded For Six Sigma Process Improvement

    Began to Build Up Minitab Operating Skills

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    90/91

    Tech-Pro Consultants

    Six Sigma

    Q&A

  • 7/30/2019 2007-06-17 TPC DD BasicStatistics

    91/91

    Six Sigma

    Thank You