using r for statistical training: an application to six sigma methodology for process improvement

34
Using R for Statistical Training 17/04/2012 EL Cano, JM Moguerza, A Redchuk Statistical Training The Problem Approaches The R Choice The R framework Sweave Application Six Sigma Examples Environments Using R for Statistical Training An Application to Six Sigma Methodology for Process Improvement. Emilio L. Cano, Andr´ es Redchuk and Javier M. Moguerza Departamento de Estad´ ıstica e Investigaci´on Operativa Universidad Rey Juan Carlos (Madrid) XXXIII Congreso Nacional de Estad´ ıstica e Investigaci´onOperativa SEIO 2012 1/28

Upload: emilio-lopez-cano

Post on 19-Jun-2015

1.018 views

Category:

Education


2 download

DESCRIPTION

Presentation at the XXXIII Congreso Nacional de Estadística e Investigación Operativa (Madrid, April 2012)

TRANSCRIPT

Page 1: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Using R forStatistical Training

17/04/2012

EL Cano,JM Moguerza,

A Redchuk

Statistical Training

The Problem

Approaches

The R Choice

The R framework

Sweave

Application

Six Sigma

Examples

Environments

Using R for Statistical TrainingAn Application to Six Sigma Methodology

for Process Improvement.

Emilio L. Cano, Andres Redchuk and JavierM. Moguerza

Departamento de Estadıstica e Investigacion OperativaUniversidad Rey Juan Carlos (Madrid)

XXXIII Congreso Nacional de Estadıstica eInvestigacion Operativa

SEIO 2012 1/28

Page 2: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Using R forStatistical Training

17/04/2012

EL Cano,JM Moguerza,

A Redchuk

Statistical Training

The Problem

Approaches

The R Choice

The R framework

Sweave

Application

Six Sigma

Examples

Environments

Contenido

1 Statistical TrainingThe ProblemApproaches

2 The R ChoiceThe R frameworkSweave

3 ApplicationSix SigmaExamplesEnvironments

SEIO 2012 2/28

Page 3: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Using R forStatistical Training

17/04/2012

EL Cano,JM Moguerza,

A Redchuk

Statistical Training

The Problem

Approaches

The R Choice

The R framework

Sweave

Application

Six Sigma

Examples

Environments

Contenido

1 Statistical TrainingThe ProblemApproaches

2 The R ChoiceThe R frameworkSweave

3 ApplicationSix SigmaExamplesEnvironments

SEIO 2012 2/28

Page 4: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Using R forStatistical Training

17/04/2012

EL Cano,JM Moguerza,

A Redchuk

Statistical Training

The Problem

Approaches

The R Choice

The R framework

Sweave

Application

Six Sigma

Examples

Environments

Contenido

1 Statistical TrainingThe ProblemApproaches

2 The R ChoiceThe R frameworkSweave

3 ApplicationSix SigmaExamplesEnvironments

SEIO 2012 2/28

Page 5: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Using R forStatistical Training

17/04/2012

EL Cano,JM Moguerza,

A Redchuk

Statistical Training

The Problem

Approaches

The R Choice

The R framework

Sweave

Application

Six Sigma

Examples

Environments

Contenido

1 Statistical TrainingThe ProblemApproaches

2 The R ChoiceThe R frameworkSweave

3 ApplicationSix SigmaExamplesEnvironments

SEIO 2012 3/28

Page 6: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Using R forStatistical Training

17/04/2012

EL Cano,JM Moguerza,

A Redchuk

Statistical Training

The Problem

Approaches

The R Choice

The R framework

Sweave

Application

Six Sigma

Examples

Environments

The ProblemElements of Statistical Training

SEIO 2012 4/28

Page 7: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Using R forStatistical Training

17/04/2012

EL Cano,JM Moguerza,

A Redchuk

Statistical Training

The Problem

Approaches

The R Choice

The R framework

Sweave

Application

Six Sigma

Examples

Environments

Copy-paste ApproachApproaches

Inconsistencies

Errors

Out-of-date

non-reproducible

Painful changes

SEIO 2012 5/28

Page 8: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Using R forStatistical Training

17/04/2012

EL Cano,JM Moguerza,

A Redchuk

Statistical Training

The Problem

Approaches

The R Choice

The R framework

Sweave

Application

Six Sigma

Examples

Environments

Reproducible Research ApproachApproaches

Reproducible ResearchThe goal of reproducible research is to tiespecific instructions to data analysis andexperimental data so that scholarship can berecreated, better understood and verified

Literate ProgrammingLiterate programming is a methodology thatcombines a programming language with adocumentation language

SEIO 2012 6/28

Page 9: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Using R forStatistical Training

17/04/2012

EL Cano,JM Moguerza,

A Redchuk

Statistical Training

The Problem

Approaches

The R Choice

The R framework

Sweave

Application

Six Sigma

Examples

Environments

Reproducible ResearchWorkflow

SEIO 2012 7/28

Page 10: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Using R forStatistical Training

17/04/2012

EL Cano,JM Moguerza,

A Redchuk

Statistical Training

The Problem

Approaches

The R Choice

The R framework

Sweave

Application

Six Sigma

Examples

Environments

Contenido

1 Statistical TrainingThe ProblemApproaches

2 The R ChoiceThe R frameworkSweave

3 ApplicationSix SigmaExamplesEnvironments

SEIO 2012 8/28

Page 11: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Using R forStatistical Training

17/04/2012

EL Cano,JM Moguerza,

A Redchuk

Statistical Training

The Problem

Approaches

The R Choice

The R framework

Sweave

Application

Six Sigma

Examples

Environments

The R SystemChoosing R

What is R?R is a language and environment for statisticalcomputing and graphics.

Open Source

Platform independent

Huge community

Extensible

3 730 availablepackages

http://www.r-project.org

SEIO 2012 9/28

Page 12: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Using R forStatistical Training

17/04/2012

EL Cano,JM Moguerza,

A Redchuk

Statistical Training

The Problem

Approaches

The R Choice

The R framework

Sweave

Application

Six Sigma

Examples

Environments

LATEX, Beamer, PDFChoosing R

LATEXLaTeX is a high-quality typesetting system; itincludes features designed for the productionof technical and scientific documentation

BeamerBeamer is a LaTeX class for creatingpresentations that are held using a projector,but it can also be used to create transparencyslides

LATEXFiles can easily be converted to PDF.SEIO 2012 10/28

Page 13: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Using R forStatistical Training

17/04/2012

EL Cano,JM Moguerza,

A Redchuk

Statistical Training

The Problem

Approaches

The R Choice

The R framework

Sweave

Application

Six Sigma

Examples

Environments

Sweave DocumentsAn Efficient Framework

SweaveA Sweave document is a plain-text file whichmerges LATEX code and R code. The Rfunction Sweave() converts the Sweavedocument (*.Rnw) into a LATEXfile (*.tex).The code chunks are executed and the resultsembedded into the LATEX file.

SEIO 2012 11/28

Page 14: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Using R forStatistical Training

17/04/2012

EL Cano,JM Moguerza,

A Redchuk

Statistical Training

The Problem

Approaches

The R Choice

The R framework

Sweave

Application

Six Sigma

Examples

Environments

Contenido

1 Statistical TrainingThe ProblemApproaches

2 The R ChoiceThe R frameworkSweave

3 ApplicationSix SigmaExamplesEnvironments

SEIO 2012 12/28

Page 15: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Using R forStatistical Training

17/04/2012

EL Cano,JM Moguerza,

A Redchuk

Statistical Training

The Problem

Approaches

The R Choice

The R framework

Sweave

Application

Six Sigma

Examples

Environments

Methodology at a GlanceSix Sigma

The EssenseThe application of the Scientific Method toprocess improvement, using an easy language.

DMAIC CycleDefineMeasureAnalyzeImproveControl

RolesChampionMaster Black BeltBlack BeltGreen Belt

SEIO 2012 13/28

Page 16: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Using R forStatistical Training

17/04/2012

EL Cano,JM Moguerza,

A Redchuk

Statistical Training

The Problem

Approaches

The R Choice

The R framework

Sweave

Application

Six Sigma

Examples

Environments

SixSigma PackageSix Sigma

Six Sigma with R | Paper Helicopter template

cut

fold ↑ fold ↓tape

?cu

t

fold

↓↓

cut

fold

↑↑

cut

tape

?

tape

?

clip?

min

(6.5cm)

std

(8cm)

max

(9.5cm)

← b

ody

leng

th →

← body width →min

(4cm)

min

(4cm)

max

(6cm)

max

(6cm)

min

(6.5cm)

std

(8cm)

max

(9.5cm)

← w

ings

leng

th →

Using packagesManuals

Data sets

Templates

Learn-by-Code

Six Sigma Process Map

Paper Helicopter Project

INPUTSX

operators tools raw material facilities

INSPECTION

INP

UT

S

sheets...

Param.(x): width NCoperator CMeasure pattern Pdiscard P

Featur.(y): ok

ASSEMBLY

INP

UT

S

sheets

Param.(x): operator Ccut Pfix Protor.width Crotor.length Cpaperclip Ctape C

Featur.(y): weight

TEST

INP

UT

S

helicopter

Param.(x): operator Cthrow Pdiscard Penvironment N

Featur.(y): time

LABELING

INP

UT

S

helicopter

Param.(x): operator Clabel P

Featur.(y): label

OUTPUTSY

helicopter LEGEND(C)ontrollable(Cr)itical(N)oise(P)rocedure

SEIO 2012 14/28

Page 17: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Using R forStatistical Training

17/04/2012

EL Cano,JM Moguerza,

A Redchuk

Statistical Training

The Problem

Approaches

The R Choice

The R framework

Sweave

Application

Six Sigma

Examples

Environments

BookSix Sigma

Six Sigma with RA live example: The entire book has beenproduced using Sweave.

The roadmap: TheDMAIC Cycle

The case study: paperhelicopter

SixSigma package: datasets, functions

Easy explanations,further readings

SEIO 2012 15/28

Page 18: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Using R forStatistical Training

17/04/2012

EL Cano,JM Moguerza,

A Redchuk

Statistical Training

The Problem

Approaches

The R Choice

The R framework

Sweave

Application

Six Sigma

Examples

Environments

Sweave Example ISix Sigma Application

\documentclass[a4paper ]{ article}

\usepackage{Sweave}

\title{Design of Experiments}

\author{EL Cano and JM Moguerza and A Rechuk}

\begin{document}

\maketitle

\section{Introduction}

Design of experiments is the most important took in the Improve phase of the

DMAIC cycle \ldots.

<<>>=

library(SixSigma)

doe.model1 <- lm(score ~ flour + salt + bakPow +

flour * salt + flour * bakPow +

salt * bakPow + flour * salt * bakPow ,

data = ss.data.doe1)

summary(doe.model1)

@

This is the general model:

\begin{equation}

\label{eq:doe:model}

SEIO 2012 16/28

Page 19: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Using R forStatistical Training

17/04/2012

EL Cano,JM Moguerza,

A Redchuk

Statistical Training

The Problem

Approaches

The R Choice

The R framework

Sweave

Application

Six Sigma

Examples

Environments

Sweave Example IISix Sigma Application

y_{ijkl }=\mu+\ alpha_i +\ beta_j +\ gamma_k +(\ alpha\beta)_{ij}+

(\alpha\gamma)_{ik}+(\ beta\gamma)_{kl}+(\ alpha\beta\gamma)_{ijk}+

\varepsilon_{ijkl},

\end{equation}

And here we have a plot of effects:

<<maineff , echo=FALSE , fig=TRUE >>=

plot(c(-1, 1), ylim = range(ss.data.doe1$score),

coef(doe.model1 )[1] + c(-1, 1) * coef(doe.model1 )[2],

type="b", pch =16)

abline(h=coef(doe.model1 )[1])

@

%\input{section2}

\end{document}

SEIO 2012 17/28

Page 20: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Design of Experiments

EL Cano and JM Moguerza and A Rechuk

April 10, 2012

1 Introduction

Design of experiments is the most important took in the Improve phase of theDMAIC cycle . . . .

> library(SixSigma)

> doe.model1 <- lm(score ~ flour + salt + bakPow +

+ flour * salt + flour * bakPow +

+ salt * bakPow + flour * salt * bakPow,

+ data = ss.data.doe1)

> summary(doe.model1)

Call:

lm(formula = score ~ flour + salt + bakPow + flour * salt + flour *

bakPow + salt * bakPow + flour * salt * bakPow, data = ss.data.doe1)

Residuals:

Min 1Q Median 3Q Max

-0.5900 -0.2888 0.0000 0.2888 0.5900

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 5.5150 0.3434 16.061 2.27e-07 ***

flour+ 1.8350 0.4856 3.779 0.005398 **

salt+ -0.8350 0.4856 -1.719 0.123843

bakPow+ -2.9900 0.4856 -6.157 0.000272 ***

flour+:salt+ 0.1700 0.6868 0.248 0.810725

flour+:bakPow+ 0.8000 0.6868 1.165 0.277620

salt+:bakPow+ 1.1800 0.6868 1.718 0.124081

flour+:salt+:bakPow+ 0.5350 0.9712 0.551 0.596779

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4856 on 8 degrees of freedom

Multiple R-squared: 0.9565, Adjusted R-squared: 0.9185

F-statistic: 25.15 on 7 and 8 DF, p-value: 7.666e-05

This is the general model:

yijkl = µ+ αi + βj + γk + (αβ)ij + (αγ)ik + (βγ)kl + (αβγ)ijk + εijkl, (1)

1

Page 21: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

And here we have a plot of effects:

−1.0 −0.5 0.0 0.5 1.0

34

56

7

c(−1, 1)

coef

(doe

.mod

el1)

[1] +

c(−

1, 1

) * c

oef(d

oe.m

odel

1)[2

]

2

Page 22: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Using R forStatistical Training

17/04/2012

EL Cano,JM Moguerza,

A Redchuk

Statistical Training

The Problem

Approaches

The R Choice

The R framework

Sweave

Application

Six Sigma

Examples

Environments

Project ExampleDivide and Conquer!

StrategiesPartial Sweave files can be compiled to getpartial LATEX files. R scripts can Sweave .Rnwfiles and “source” .R files. The final documentis obtained by compiling the “master”LATEX file.

> source("code/myoptions.R")

> source("code/myfunctions.R")

> source("code/mydata.R")

> Sweave("rnw/theorem01.Rnw")

> Sweave("rnw/lesson01.Rnw")

> Sweave("rnw/exercises01.Rnw")

> ...

> texi2pdf("master.tex")

SEIO 2012 20/28

Page 23: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Using R forStatistical Training

17/04/2012

EL Cano,JM Moguerza,

A Redchuk

Statistical Training

The Problem

Approaches

The R Choice

The R framework

Sweave

Application

Six Sigma

Examples

Environments

Some useful extensionsPackages

knitr, pgfSweave: enhanced options forSweave

RGIFT: Automatic generation ofquestionnaires for Moodle

exams: Automatic generation of printableexams

odfWeave: Open Document formatdocuments generation

More in the “Reproducible Research” TaskView at CRAN.http://cran.r-project.org/web/views/

ReproducibleResearch.htmlSEIO 2012 21/28

Page 24: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Using R forStatistical Training

17/04/2012

EL Cano,JM Moguerza,

A Redchuk

Statistical Training

The Problem

Approaches

The R Choice

The R framework

Sweave

Application

Six Sigma

Examples

Environments

R GUIIntegrated Environments

SEIO 2012 22/28

Page 25: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Using R forStatistical Training

17/04/2012

EL Cano,JM Moguerza,

A Redchuk

Statistical Training

The Problem

Approaches

The R Choice

The R framework

Sweave

Application

Six Sigma

Examples

Environments

R StudioIntegrated Environments

SEIO 2012 23/28

Page 26: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Using R forStatistical Training

17/04/2012

EL Cano,JM Moguerza,

A Redchuk

Statistical Training

The Problem

Approaches

The R Choice

The R framework

Sweave

Application

Six Sigma

Examples

Environments

EMACS + ESSIntegrated Environments

SEIO 2012 24/28

Page 27: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Using R forStatistical Training

17/04/2012

EL Cano,JM Moguerza,

A Redchuk

Statistical Training

The Problem

Approaches

The R Choice

The R framework

Sweave

Application

Six Sigma

Examples

Environments

Eclipse + StatETIntegrated Environments

SEIO 2012 25/28

Page 28: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Using R forStatistical Training

17/04/2012

EL Cano,JM Moguerza,

A Redchuk

Statistical Training

The Problem

Approaches

The R Choice

The R framework

Sweave

Application

Six Sigma

Examples

Environments

Summary

Statistical training entail some challengesregarding contents and materials.

R is the perfect partner for statisticaltraining.

Reproducible research and literateprogramming enhance training materialsquality.

The use of R and LATEX through Sweave,comprise a complete framework forstatistical documentation generation.

Extensions and integrated environmentsmake easy exploiting the R capabilities.

SEIO 2012 26/28

Page 29: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Using R forStatistical Training

17/04/2012

EL Cano,JM Moguerza,

A Redchuk

Statistical Training

The Problem

Approaches

The R Choice

The R framework

Sweave

Application

Six Sigma

Examples

Environments

Summary

Statistical training entail some challengesregarding contents and materials.

R is the perfect partner for statisticaltraining.

Reproducible research and literateprogramming enhance training materialsquality.

The use of R and LATEX through Sweave,comprise a complete framework forstatistical documentation generation.

Extensions and integrated environmentsmake easy exploiting the R capabilities.

SEIO 2012 26/28

Page 30: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Using R forStatistical Training

17/04/2012

EL Cano,JM Moguerza,

A Redchuk

Statistical Training

The Problem

Approaches

The R Choice

The R framework

Sweave

Application

Six Sigma

Examples

Environments

Summary

Statistical training entail some challengesregarding contents and materials.

R is the perfect partner for statisticaltraining.

Reproducible research and literateprogramming enhance training materialsquality.

The use of R and LATEX through Sweave,comprise a complete framework forstatistical documentation generation.

Extensions and integrated environmentsmake easy exploiting the R capabilities.

SEIO 2012 26/28

Page 31: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Using R forStatistical Training

17/04/2012

EL Cano,JM Moguerza,

A Redchuk

Statistical Training

The Problem

Approaches

The R Choice

The R framework

Sweave

Application

Six Sigma

Examples

Environments

Summary

Statistical training entail some challengesregarding contents and materials.

R is the perfect partner for statisticaltraining.

Reproducible research and literateprogramming enhance training materialsquality.

The use of R and LATEX through Sweave,comprise a complete framework forstatistical documentation generation.

Extensions and integrated environmentsmake easy exploiting the R capabilities.

SEIO 2012 26/28

Page 32: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Using R forStatistical Training

17/04/2012

EL Cano,JM Moguerza,

A Redchuk

Statistical Training

The Problem

Approaches

The R Choice

The R framework

Sweave

Application

Six Sigma

Examples

Environments

Summary

Statistical training entail some challengesregarding contents and materials.

R is the perfect partner for statisticaltraining.

Reproducible research and literateprogramming enhance training materialsquality.

The use of R and LATEX through Sweave,comprise a complete framework forstatistical documentation generation.

Extensions and integrated environmentsmake easy exploiting the R capabilities.

SEIO 2012 26/28

Page 33: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Using R forStatistical Training

17/04/2012

EL Cano,JM Moguerza,

A Redchuk

Statistical Training

The Problem

Approaches

The R Choice

The R framework

Sweave

Application

Six Sigma

Examples

Environments

Acknowledgements

R Core Team and R enthusiasts in general.Springer

This work has been partially funded by the projects:AGORANET project (IPT-430000-2010-32)VRTUOSI www.vrtuosi.org: 502869-LLP-1-2009-ES-ERASMUS-EVC)HAUS: IPT-2011-1049-430000EDUCALAB: IPT-2011-1071-430000DEMOCRACY4ALL: IPT-2011-0869-430000CORPORATE COMMUNITY: IPT-2011-0871-430000

SEIO 2012 27/28

Page 34: Using R for Statistical Training: An Application to Six Sigma Methodology for Process Improvement

Using R forStatistical Training

17/04/2012

EL Cano,JM Moguerza,

A Redchuk

Statistical Training

The Problem

Approaches

The R Choice

The R framework

Sweave

Application

Six Sigma

Examples

Environments

Discussion

Thanks for yourattention !

SEIO 2012 28/28