4. six sigma descriptive statistics

44
QUALITY TOOLS & TECHNIQUES 1 T Q T SIX SIGMA: STATISTICS By: - Hakeem–Ur–Rehman MS-TQM, M.I.O.M(Operations Research) Certified Six Sigma Black Belt (Singapore) Lead Auditor ISO 9001 (UK) IQTM–PU

Upload: hakeem-ur-rehman

Post on 12-Jan-2017

1.066 views

Category:

Data & Analytics


8 download

TRANSCRIPT

Page 1: 4. six sigma descriptive statistics

QUALITY TOOLS & TECHNIQUES

1

TQ TSIX SIGMA: STATISTICS

By: -Hakeem–Ur–Rehman

MS-TQM, M.I.O.M(Operations Research)Certified Six Sigma Black Belt (Singapore)

Lead Auditor ISO 9001 (UK)

IQTM–PU

Page 2: 4. six sigma descriptive statistics

WHAT IS STATISTICS?

Why?1. Collecting Data

– e.g. Survey

2. Presenting Data

– e.g., Charts & Tables

3. Characterizing Data

– e.g., Average

Data Analysis

Decision-Making

© 1984-1994 T/Maker Co.

Page 3: 4. six sigma descriptive statistics

KEY TERMS

Population (Universe)

All Items of Interest

Sample

Portion of Population

Parameter

Summary Measure about Population

Statistic

Summary Measure about Sample

• P in Population & Parameter

• S in Sample & Statistic

Page 4: 4. six sigma descriptive statistics

TYPES OF DATA Attribute Data (Qualitative)

Is always binary, there are only two possible values (0, 1)1. Yes, No2. Success, Failure3. Go, No Go4. Pass, Fall

Variable Data (Quantitative)Discrete (Count) Data: Can be categorized in a classification and is based on counts.

1. Number of defects2. Number of defective units3. Number of Customer Returns

Continuous Data: Can be measured on a scale, it has decimal subdivisions that are

meaningful1. Time, Pressure2. Money3. Material feed rate

Page 5: 4. six sigma descriptive statistics

DISCRETE & CONTINUOUS VARIABLES

DISCRETE VARIABLE POSSIBLE VALUES FOR THE VARIABLE

The number of defective needles in boxesof 100 diabetic syringes

0,1,2,3 … 100

The number of individuals in groups of 30with a Type–A Personality

0,1,2,3 … 30

The number of surveys returned out of 300mailed in a customer satisfaction study.

0,1,2,3 … 300

CONTINUOUS VARIABLE POSSIBLE VALUES FOR THE VARIABLE

The length of prison time servedfor individuals convicted

All the real numbers between ‘a’ and ‘b’, where‘a’ is the smallest amount of time served and ‘b’is the largest.

The household income forhouseholds with incomes lessthan or equal to $30,000

All the real numbers between ‘a’ and $30,000,where ‘a’ is the smallest household income in thepopulation.

Page 6: 4. six sigma descriptive statistics

DEFINITIONS OF SCALED DATAUnderstanding the nature of data and how to represent it can affect the types of statistical testspossible.

1. NOMINAL SCALE: “Numbers representing nominal data can be used only to classify or categorize”;

Data consists of Names, Labels, or categories. A player with number 30 is not more of anything than a player with number 15,

and is certainly not twice whatever number 15 is. Few examples of Nominal Data are:

Sex, Religion, Geographic Location, Place of Birth, employee ID Numbersetc.

2. ORDINAL SCALE: “Ordinal Level data measurement is higher than the nominal level. In addition to

the nominal level capabilities, Ordinal level measurement can be used to rank ororder objects”.

The Categorization of people or objects, or the ranking of items, Nominal andOrdinal data are non–metric data and are sometimes referred to as qualitativedata.

EXAMPLES: “AUTOMOBILES SIZES” Subcompact, compact, intermediate, full size,

luxury “PRODUCT RATING” Poor, Good, Excellent “CUSTOMER SATISFACTION” Very poor, Poor, Neither good or bad, Good,

Excellent.

Page 7: 4. six sigma descriptive statistics

DEFINITIONS OF SCALED DATA (Cont…)3. INTERVAL SCALE:

“The distances between consecutive numbers have meaning and the data are alwaysnumerical”.

FOR EXAMPLE, when measuring temperature (in Fahrenheit), the distance from 30-40is same as the distance from 70-80. The interval between values is interpretable.

EXAMPLE: IQ Scores of students in Black Belt Training:

100 … (the difference between scores is measureable and has meaning buta difference of 20 points between 100 and 120 does not indicate that onestudent is 1.2 times more intelligent)

4. RATIO SCALE: “Data that can be ranked and for which all arithmetic operations including division can

be performed. (Division by Zero is of course excluded) Ratio level data has anabsolute zero and a value of zero indicates a complete absence of the characteristic ofinterest”.

FOR EXAMPLE, Grams of fat consumed per adult in Pakistan

0 … (if person – A consumes 25 grams of fat and person – B consumes 50grams, we can say that person – B consumes twice as much fat as person –A. if a person – C consumes ZERO gram of fat per day, we can say there isa complete absence of fat consumed on that day. Note that a ratio isinterpretable and an absolute zero exists.)

OTHER EXAMPLE: Production Cycle time, Work measurement time, Number of trucks sold, Number

of employees etc.

Page 8: 4. six sigma descriptive statistics

DEFINITIONS OF SCALED DATA (Cont…)

TYPE OF DATA OPERATOR DESCRIPTION EXAMPLES

Nominal =, ≠ Categories Types of defects, Types of colors

Ordinal <, > Rankings Severity of defects: critical, major, minor

Interval +, - Differences but no absolute zero

Temperature of a ship

Ratio / Absolute zero Pressure, Speed

Page 9: 4. six sigma descriptive statistics

STATISTICAL METHODS

Statistical

Methods

Descriptive

Statistics

Inferential

Statistics

Page 10: 4. six sigma descriptive statistics

DESCRIPTIVE STATISTICS

1. Involves

– Collecting Data

– Presenting Data

– Characterizing Data

2. Purpose

– Describe Data

X = 30.5 S2 = 113

0

25

50

Q1 Q2 Q3 Q4

$

Page 11: 4. six sigma descriptive statistics

INFERENTIAL STATISTICS

1. Involves

– Estimation

– Hypothesis Testing

2. Purpose

– Make Decisions About Population Characteristics

Population?

Page 12: 4. six sigma descriptive statistics

DESCRIPTIVE ANALYSIS OF QUALITATIVE DATA

12

QUALITATIVE DATA

TABLES GRAPHS NUMBERS

One Way TableTwo–Ways Table

.

.

.N – Ways Table

Bar ChartPie ChartMultiple Bar ChartComponent Bar Chart

Percentages

Page 13: 4. six sigma descriptive statistics

DESCRIPTIVE ANALYSIS OF QUANTITATIVE DATA

13

QUANTITATIVE DATA

TABLES GRAPHS NUMBERS

Frequency DistributionStem and Leaf Plot

HistogramBox and Whisker’s Plot

Center DistributionImportant

PointsVariation

MeanMedianMode

Geometric MeanHarmonic MeanTrimmed Mean

MedianQuartilesDeciles

Percentiles

RangeInter-Quartile Range

VarianceStandard Deviation

SkewnessKurtosis

Page 14: 4. six sigma descriptive statistics

MINITAB: AN INTRODUCTIONBEGINNING AND ENDING A MINITAB SESSION: To start a Minitab session from the menu, select

Start All Programs MINITAB 15 English MINITAB 15

English To exit Minitab, select

File Exit

When you first enterMinitab, the screen willappear as in the figure:

The session window containscomments, tables, descriptivesummaries, and inferentialstatistics.

The data window consists ofall the data and variable names.

Graph windows contain highresolution graphs.

SESSION WINDOW

DATAWINDOW

Page 15: 4. six sigma descriptive statistics

DESCRIPTIVE ANALYSIS USING MINITAB

In the Minitab Data

folder, open the

worksheet Pulse.mtw

Conduct Descriptive

Analysis on the pulse1

data.

Page 16: 4. six sigma descriptive statistics

MEASURES OF LOCATION Mean is:

Mean is the average of a group of numbers Applicable for interval and ratio data Not applicable for nominal or ordinal data Affected by each value in the data set, including extreme values Computed by

summing all values in the data set and dividing the sum by the number of valuesin the data set

Stat Basic Statistics Display Descriptive Statistics::: Select; Statistics (and choose appropriate measures) Select; Graphs Histogram of data, with normal curve

SAMPLE:

POPULATION:

Descriptive Statistics: Pulse1

Page 17: 4. six sigma descriptive statistics

MEASURES OF LOCATION Median is:

Median - middle value in an ordered array of numbers. For an array with an odd number of terms, the median is the middle number For an array with an even number of terms the median is the average of the

middle two numbers

Trimmed Mean is a: Compromise between the MEAN and MEDIAN

1. The Trimmed Mean is calculated by eliminating a specified percentage of thesmallest and largest observations from the data set and then calculating theaverage of the remaining observations.

2. Useful for data with potential extreme values.

MODE: Mode - the most frequently occurring value in a data set Applicable to all levels of data measurement (nominal, ordinal, interval, and

ratio) Can be used to determine what categories occur most frequently Bimodal – In a tie for the most frequently occurring value, two modes are listed Multimodal -- Data sets that contain more than two modes

Page 18: 4. six sigma descriptive statistics

MEASURES OF VARIATION RANGE: The difference between the largest and the smallest values

in a set of data Advantage – easy to compute Disadvantage – is affected by extreme values

INTER–QUARTILE RANGE: Inter-quartile Range - range of values between the first and

third quartile Range of the “middle half”; middle 50% Inter-quartile Range – used in the construction of box and

whisker plots STANDARD DEVIATION:

S =

VARIANCE: S2 = Square of S

Page 19: 4. six sigma descriptive statistics

SHAPE OF THE DISTRIBUTIONSkewness: indicator used in distribution analysis as a sign of asymmetry anddeviation from a normal distribution.

Skewness > 0 - Right skewed distribution - most values are concentratedon left of the mean, with extreme values to the right.

Skewness < 0 - Left skewed distribution - most values are concentrated onthe right of the mean, with extreme values to the left.

Skewness = 0 - mean = median, the distribution is symmetrical around the mean.

Kurtosis - indicator used in distribution analysis as a sign of flattening or"peakedness" of a distribution.

Kurtosis > 3 - Leptokurtic distribution, sharper than a normal distribution,with values concentrated around the mean and thicker tails. This means highprobability for extreme values.

Kurtosis < 3 - Platykurtic distribution, flatter than a normal distribution witha wider peak. The probability for extreme values is less than for a normaldistribution, and the values are wider spread around the mean.

Kurtosis = 3 - Mesokurtic distribution - normal distribution for example.

Page 20: 4. six sigma descriptive statistics

INTRODUCTION TO GRAPHING

The purpose of Graphing is to:1. To identify the shape of distribution of data2. To locate the Average, Spread and Outliers of

the Distribution3. To compare the shapes and variation of different

variables4. To observe the trends, drifts and shifts in the

collected data

Here we will discuss … Histogram Box Plots (Box & Whisker’s Plot)

Page 21: 4. six sigma descriptive statistics

INTRODUCTION TO GRAPHING (Cont…)

When you start Minitab–15, if your tool bars do not look like the figure below,

Do the following to get the tools where you need them. Click on Tools

Customize Toolbars tab. In the dialog box that opens, check and uncheck as needed so that it matches the

figure to the below.

Page 22: 4. six sigma descriptive statistics

WHAT IS A HISTOGRAM?

A histogram is a summary graph showing distribution ofdata points measured that falls within various class-intervals.

WHAT QUESTIONS THE ‘HISTOGRAM’ ANSWERS?

What distribution (center, variation and shape) does the datahave?

Does the data look symmetric or is it skewed to the left or right?

Does the data contain outliers?

Is Process within Specification Limits? 22

Page 23: 4. six sigma descriptive statistics

GUIDELINES FOR CONSTRUCTING A HISTOGRAM

1. Determine the number of data points in the data set. Call this number ‘n’.

2. Determine the range, R, of the values in the data set.

3. Determine the number of classes; there are no set rules; however, there aresome rules of thumb that can be used.

a) # if Classes = 1 + 3.3 log(n)

b) The logarithm (base 2) rule.

# of Classes = K = [log2n] + 1 = [(log n) / (log 2)] + 1

c) Following table [Goal 88] gives a range of classes.

# of Classes = K =

4. Determine the class width by dividing the range (R) by the number of classes(K) and rounding up.

23

Page 24: 4. six sigma descriptive statistics

THE HISTOGRAM Open Bears.MTW You will create a frequency histogram of the variable Age.

Page 25: 4. six sigma descriptive statistics

THE HISTOGRAM (Cont…) CONTROLLING HISTOGRAMS:

What you get in this case is a histogram with 10 classes. To get the right number of classes, get into the "X Scale" editing dialog box and click on

the "Binning" tab. For "Interval Type" click on "Cut point" and for "Interval Definition" click on "Number of

intervals:" and change it to 6; Now click "OK“ This graph still does not conform to standards because the class width and class

boundaries were not calculated according to rules. To get what we want, we mustdefine the class boundaries (what Minitab calls "cut points") ourselves.

The minimum value of the data is 8 and the maximum is 177. Our formula for the classwidth with 6 classes is (177–8)/6 = 28.5..., which rounds up to 29. (Remember; alwaysround up unless the fraction yields an integer.) If we choose 8 as the lowest class limit,then the lowest class boundary will be 7.5, and the rest will be 36.5, 65.5, 94.5, 123.5,152.5 and 181.5.

Now get back into the "Binning" dialog box, click on "Midpoint/Cutpoint positions:",delete the existing cutpoints then enter the first 2 class boundaries listed above intothe box (separate with spaces, not commas) and click "OK".

Page 26: 4. six sigma descriptive statistics

THE HISTOGRAM (Cont…)

EXERCISE:The data in C:\Program Files\Minitab15\English\Sample Data \Grades.MTW consists ofverbal and math SAT scores and correspondingGPA's.i. Create a frequency histogram with 7 classes of

the verbal SAT scores.ii. Create a relative frequency histogram with 7

classes of the verbal SAT scores.iii. Create a frequency polygon with 7 classes of the

verbal SAT scores.

Page 27: 4. six sigma descriptive statistics

BOX & WHISKER’S PLOT

Use a Box & Whisker’s Plot to

assess and compare

distribution characteristics

such as median, range, and

symmetry, and to identify

outliers.

A minimum of 10

observations should be

included in generating the

Box Plot.27

Page 28: 4. six sigma descriptive statistics

BOX & WHISKER’S PLOT USING MINITAB

CONSTRUCTING BOX PLOT (One Y):

You want to examine the overall

durability of your carpet products.

Samples of the carpet products are

placed in four homes and you

measure durability after 60 days.

Create a Box Plot to examine the

distribution of durability scores.

Open worksheet Carpet.mtw

Choose Graph Boxplot

Under One Y, Choose Simple, Click

Ok

In Variable, enter Durability. Click

ok

28

Page 29: 4. six sigma descriptive statistics

BOX & WHISKER’S PLOT USING MINITAB

29

Constructing Box Plot: (One Y–with Groups)

You want to assess the durability offour experimental carpet products.Samples of the carpet products areplaced in four homes and youmeasure durability after 60 days.Create a box plot with median labelsand color-coded boxes to examine thedistribution of durability for eachcarpet product.

Open the worksheet CARPET.MTW

Page 30: 4. six sigma descriptive statistics

BOX & WHISKER’S PLOT USING MINITAB

30

Constructing Box Plot: (One Y–with Groups)(Cont…)

Page 31: 4. six sigma descriptive statistics

BOX & WHISKER’S PLOT USING MINITAB

31

Constructing Box Plot: (One Y–with Groups)(Cont…)

Interpreting the results: Median durability is highest for Carpet 4 (19.75). However, this product also demonstrates

the greatest variability, with an inter-quartile range of 9.855. In addition, the distribution isnegatively skewed, with at least one durability measurement of about 10.

Carpets 1 and 3 have similar median durability's (13.52 and 12.895, respectively). Carpet3 also exhibits the least variability, with an inter-quartile range of only 2.8925.

Median durability for Carpet 2 is only 8.625. This distribution and that of Carpet 1 arepositively skewed, with inter-quartile ranges of about 5-6.

Page 32: 4. six sigma descriptive statistics

BOX & WHISKER’S PLOT

32

33.2 29.1 34.5 32.6 30.7 34.9 30.2 31.8 30.8 33.529.4 32.2 33.6 30.4 31.9 32.8 26.8 29.2 31.8 27.436.5 38.1 30.0 29.5 36.0 31.5 27.4 30.4 28.4 31.829.8 34.6 32.3 28.2 27.5 28.8 28.4 27.7 27.8 30.528.5 28.5 27.5 28.6 29.1 26.9 34.2 28.5 34.8 30.5

EXERCISE # 1:A random sample of 50 observations on the mileage per gallon of aparticular brand of gasoline is shown:

Develop Box & Whisker’s Plot for analyzing the data.

EXERCISE # 2: The following data represent the percentage of calories that come from fat for burgers and chicken items from a sample of fast food chains.

BURGER43 51 48 47 51 50 55 55 59 57

CHICKEN60 54 53 57 57 46 45 56 57

Construct the Box & Whisker’s for analyze the data.

Page 33: 4. six sigma descriptive statistics

PROBABILITY DISTRIBUTION OF DATA

Data generating process of the data is known asDistribution of the Data.

For Example: In the manufacturing sector the measurements

such as length, diameter, etc usually followNORMAL Distribution

In Service sector say Banks, the customerwaiting Time follow EXPONETIONALDistribution

In Service sector say Banks, the number ofcustomers arriving follow POISSON Distribution

Page 34: 4. six sigma descriptive statistics

NORMAL DISTRIBUTION Characteristics of the normal distribution:

Continuous distribution - Line does not break Symmetrical distribution - Each half is a mirror of the other half Asymptotic to the horizontal axis - it does not touch the x axis and goes on

forever Unimodal - means the values mound up in only one portion of the graph Area under the curve = 1; total of all probabilities = 1 Normal distribution is characterized by the mean and the Std Dev Values of μ and σ produce a normal distribution

. . . 2.71828

. . . 3.14159 =

X ofdeviation standard

X ofmean

:

2

1)(

2

2

1

e

Where

xxf e

X

Page 35: 4. six sigma descriptive statistics

STANDARD NORMAL DISTRIBUTION

A normal distribution with

a mean of zero, and

a standard deviation of one

Z Formula

standardizes any normal distribution

Z Score

computed by the Z Formula

the number of standarddeviations which a valueis away from the mean

XZ

1

0

Page 36: 4. six sigma descriptive statistics

NORMALITY TEST FROM GRAPHIC SUMMARY OF DATA

Open the worksheet CRANKSH.MTW

If Sk < 0, the distribution isnegatively skewed (skewed tothe left).

If Sk = 0, the distribution issymmetric (not skewed).

If Sk > 0, the distribution ispositively skewed (skewed to theright).

The value of Skewness showsdata is not normal.

P – Value is less than 5% (Valueof Alpha (mean level ofsignificance)); shows data is notnormal

If ‘P’ value is > alpha; Data is Normal; otherwise it will be Not-Normal

Page 37: 4. six sigma descriptive statistics

NORMALITY TEST (Cont…) NORMALLY TEST:

o Generate a normal probability plot and performs a hypothesis test toexamine whether or not the observations follow a normal distribution.For the normality test, the hypothesis are,

o Ho: Data follow a normal distribution Vs H1: Data do not follow anormal distribution

o If ‘P’ value is > alpha; Accept Null Hypothesis (Ho)

NORMALITY TEST: In an operating engine, parts of the crankshaft move up

and down. AtoBDist is the distance (in mm) from theactual (A) position of a point on the crankshaft to abaseline (B) position. To ensure production quality, amanager took five measurements each working day in acar assembly plant, from September 28 through October15, and then ten per day from the 18th through the 25th.

You wish to see if these data follow a normaldistribution,

so you use Normality test. Open the worksheet CRANKSH.MTW

Page 38: 4. six sigma descriptive statistics

38

INTERPRETING THE RESULTS:The graphical output is a plot of normal probabilities versus the data. The datadepart from the fitted line most evidently in the extremes, or distribution tails. The Anderson–Darling test’s ‘p–value’ indicates that, at a levels greater than

0.022, there is evidence that the data do not follow a normal distribution. There is a slight tendency for these data to be lighter in the tails than a normal

distribution because the smallest points are below the line and the largest point isjust above the line.

A distribution with heavy tails would show the opposite pattern at the extremes.

NORMALITY TEST (Cont…)

Page 39: 4. six sigma descriptive statistics

SCATTER PLOTWHAT IS A SCATTER PLOT?Is a graphical presentation of any possible relationshipbetween two sets of variables by a simple X-Y plot,which may or may not be dependent.

39

Page 40: 4. six sigma descriptive statistics

SCATTER PLOTWhat is the relationship between the X and Y Plot?

40

Page 41: 4. six sigma descriptive statistics

SCATTER PLOTEXAMPLE: You are interested in how well yourcompany's camera batteries are meeting customers'needs. Market research shows that customers becomeannoyed if they have to wait longer than 5.25 secondsbetween flashes.You collect a sample of batteries that have been in usefor varying amounts of time and measure the voltageremaining in each battery immediately after a flash(VoltsAfter), as well as the length of time required forthe battery to be able to flash again (flash recovery time,FlashRecov). Create a scatter plot to examine theresults. Include a reference line at the critical flashrecovery time of 5.25 seconds.

Open the worksheet BATTERIES.MTW 41

Page 42: 4. six sigma descriptive statistics

SCATTER PLOTEXAMPLE (Cont…):

42

Page 43: 4. six sigma descriptive statistics

SCATTER PLOT

INTERPRETING THE RESULTS:As expected, the lower the voltage in a battery after a flash, thelonger the flash recovery time tends to be.The reference line helps to illustrate that there were many flashrecovery times greater than 5.25 seconds.

43

Page 44: 4. six sigma descriptive statistics

QUESTIONS

44