statistics digest - asqasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · technology...

38
Continued on page 3 STATISTICS DIGEST The Newsletter of the ASQ Statistics Division Chair’s Message ........................... 1 Editor’s Corner .............................. 3 MINI-PAPER Big Data Terminology—Key to Predictive Analytics Success ......... 4 Design of Experiments .................. 12 Statistical Process Control ............. 13 Statistics for Quality Improvement ............................... 15 Stats 101 .................................... 19 Testing and Evaluation ................. 25 Standards InSide-Out ................... 28 FEATURE Agile Teams: A Look at Agile Project Management Methods .................................. 30 Upcoming Conference Calendar ...36 Statistics Division Committee Roster 2016 ............................................37 IN THIS ISSUE Vol. 35, No. 3 October 2016 Is a picture really worth a thousand words? I would argue that a good picture or graph is worth more than a thousand words. A graph that communicates the results of data quickly, effectively, and accurately will save time and often enable a quick and informed decision. In my experience, senior managers (and most others) do not want to spend a lot of time hearing about the details of an analysis. ey do not really care about the super-cool technique that was used or that the estimators are unbiased. ey trust that as a statistician you will use the proper techniques for the given situation. To them, it is all about the bottom-line. As applied statisticians, it is our job to be able to effectively communicate the results without getting into too many details, unless they are requested. A good graph can do that. You might be asking yourself “What is a good graph?” ere has been a lot written on what a good graph looks like—one that is not too busy, has appropriate coloring, and has just enough labeling to be informative. I think an argument can be made that Charles Minard created one of the best graphs of all time when he mapped Napoleon’s Russian campaign of 1812. If you haven’t seen this graph, it is worth taking a look. Edward Tufte coined the term “chartjunk” in his book “e Visual Display of Quantitative Information.” Chartjunk refers to elements in graphs that really serve no purpose to comprehend the information contained in the graph. An example of chartjunk would be adding a third dimension to a graph when there are really only two dimensions to the data. People are notorious for doing this with bar and pie charts. Keep in mind that just because the software will let you do something doesn’t mean that you should. When communicating the results of analyses, especially to senior managers, the time and attention you have is limited. eir days are overflowing with decisions that need to be made so make it easy on them and provide them with a graph that, when viewed for about 30 seconds, tells them the story or at least the bottom-line of the story. If your graph can do that, then it is a good graph. I could go on and on about creating effective graphs and avoiding bad ones, but it is probably a good idea to talk a little about what is going on with the Statistics Division. By the time you read this, the Fall Technical Conference (FTC) will be over. It is one of my favorite conferences—it is a smaller conference so that you are able to network with almost all who attend, and each time I attend I get something useful from the talks. e Statistics Division sponsors the opening reception and I am in the middle of planning that now. Message from the Chair by Theresa I. Utlaut eresa I. Utlaut

Upload: others

Post on 15-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

Continued on page 3

STATISTICS DIGESTThe Newsletter of the ASQ Statistics Division

Chair’s Message ...........................1

Editor’s Corner ..............................3

MINI-PAPERBig Data Terminology—Key to Predictive Analytics Success .........4

Design of Experiments ..................12

Statistical Process Control .............13

Statistics for Quality Improvement ...............................15

Stats 101 ....................................19

Testing and Evaluation .................25

Standards InSide-Out ...................28

FEATUREAgile Teams: A Look at Agile Project Management Methods ..................................30

Upcoming Conference Calendar ...36

Statistics Division Committee Roster 2016 ............................................37

IN THIS ISSUE

Vol. 35, No. 3 October 2016

Is a picture really worth a thousand words? I would argue that a good picture or graph is worth more than a thousand words. A graph that communicates the results of data quickly, effectively, and accurately will save time and often enable a quick and informed decision. In my experience, senior managers (and most others) do not want to spend a lot of time hearing about the details of an analysis. They do not really care about the super-cool technique that was used or that the estimators are unbiased. They trust that

as a statistician you will use the proper techniques for the given situation. To them, it is all about the bottom-line. As applied statisticians, it is our job to be able to effectively communicate the results without getting into too many details, unless they are requested. A good graph can do that.

You might be asking yourself “What is a good graph?” There has been a lot written on what a good graph looks like—one that is not too busy, has appropriate coloring, and has just enough labeling to be informative. I think an argument can be made that Charles Minard created one of the best graphs of all time when he mapped Napoleon’s Russian campaign of 1812. If you haven’t seen this graph, it is worth taking a look.

Edward Tufte coined the term “chartjunk” in his book “The Visual Display of Quantitative Information.” Chartjunk refers to elements in graphs that really serve no purpose to comprehend the information contained in the graph. An example of chartjunk would be adding a third dimension to a graph when there are really only two dimensions to the data. People are notorious for doing this with bar and pie charts. Keep in mind that just because the software will let you do something doesn’t mean that you should.

When communicating the results of analyses, especially to senior managers, the time and attention you have is limited. Their days are overflowing with decisions that need to be made so make it easy on them and provide them with a graph that, when viewed for about 30 seconds, tells them the story or at least the bottom-line of the story. If your graph can do that, then it is a good graph.

I could go on and on about creating effective graphs and avoiding bad ones, but it is probably a good idea to talk a little about what is going on with the Statistics Division. By the time you read this, the Fall Technical Conference (FTC) will be over. It is one of my favorite conferences—it is a smaller conference so that you are able to network with almost all who attend, and each time I attend I get something useful from the talks. The Statistics Division sponsors the opening reception and I am in the middle of planning that now.

Message from the Chairby Theresa I. Utlaut

Theresa I. Utlaut

Page 2: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

2 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 asq.org/statistics

Submission Guidelines

Mini-PaperInteresting topics pertaining to the field of statistics; should be understandable by non-statisticians with some statistical knowledge. Length: 1,500-4,000 words.

FeatureFocus should be on a statistical concept; can either be of a practical nature or a topic that would be of interest to practitioners who apply statistics. Length: 1,000-3,000 words.

General InformationAuthors should have a conceptual understanding of the topic and should be willing to answer questions relating to the article through the newsletter. Authors do not have to be members of the Statistics Division. Submissions may be made at any time to [email protected].

All articles will be reviewed. The editor reserves discretionary right in determination of which articles are published. Submissions should not be overly controversial. Confirmation of receipt will be provided within one week of receipt of the email. Authors will receive feedback within two months. Acceptance of articles does not imply any agreement that a given article will be published.

VisionThe ASQ Statistics Division promotes innovation and excellence in the application and evolution of statistics to improve quality and performance.

MissionThe ASQ Statistics Division supports members in fulfilling their professional needs and aspirations in the application of statistics and development of techniques to improve quality and performance.

Strategies1. Address core educational needs of members • Assessmemberneeds • Developa“base-levelknowledgeofstatistics”curriculum • Promotestatisticalengineering • Publishfeaturedarticles,specialpublications,andwebinars

2. Build community and increase awareness by using diverse and effective communications

• Webinars • Newsletters • BodyofKnowledge • Website • Blog • SocialMedia(LinkedIn) • Conferencepresentations(FallTechnicalConference,WCQI,etc.) • Shortcourses • Mailings

3. Foster leadership opportunities throughout our membership and recognize leaders

• Advertiseleadershipopportunities/positions • Invitationstoparticipateinupcomingactivities • Studentgrantsandscholarships • Awards(e.g.Youden,Nelson,Hunter,andBisgaard) • Recruit,retainandadvancemembers(e.g.,SeniorandFellowstatus)

4. Establish and Leverage Alliances • ASQSectionsandotherDivisions • Non-ASQ(e.g.ASA) • CQECertification • Standards • Outreach(professionalandsocial)

Updated October 19, 2013

Disclaimer

The technical content of material published in the ASQ Statistics Division Newsletter may not have been refereed to the same extent as the rigorous refereeing that is undergone for publication in Technometrics or J.Q.T. The objective of this newsletter is to be a forum for new ideas and to be open to differing points of view. The editor will strive to review all articles and to ask other statistics professionals to provide reviews of all content of this newsletter. We encourage readers with differing points of view to write to the editor and an opportunity to present their views via a letter to the editor. The views expressed in material published in this newsletter represents the views of the author of the material, and may or may not represent the official views of the Statistics Division of ASQ.

Vision, Mission, and Strategies of the ASQ Statistics Division

The Statistics Division was formed in 1979 and today it consists of both statisticians and others who practice statistics as part of their profession. The division has a rich history, with many thought leaders in the field contributing their time to develop materials, serve as members of the leadership council, or both. Would you like to be a part of the Statistics Divisions’ continuing history? Feel free to contact [email protected] for information or to see what volunteer opportunities are available. No statistical knowledge is required, but a passion for statistics is expected.

Page 3: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

asq.org/statistics ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 3

Message from the ChairContinued from page 1

We had our business planning meeting in Chicago in July and are working on finalizing the plan for next year. We have a lot of great ideas and need to figure out how much we can actually get done next year. I don’t want to go into the details until they are finalized but you should expect our usual items to remain (free webinars, technical newsletter, sponsorship of Fall Technical Conference, etc.). We are meeting after the FTC to finalize the plans for next year.

This is my last newsletter article as Chair. I want to thank the incredibly talented council members that I have had the opportunity to work with this year. I have enjoyed getting to know them and working closely with them to benefit our members. They have madebeingtheChairanabsolutepleasure.IparticularlywanttothankAdamPintar,our2016past-chair.Hewasalwaysverygracious and patient with my many, many questions.

HerbMcGrathwillbetheStatisticsDivisionChairin2017.Hehasbeenveryinvolvedinthedivisioninthepastandisanaturalleader. I have every confidence that he will do an excellent job. Using Adam as an example as past-chair, I will be happy to help Herbwithanythingthatisneeded.

If you have ever considered volunteering to participate as a member leader in the Statistics Division, please don’t hesitate to contact me. We have a great group of volunteers and are always looking for more.

Editor’s Cornerby Matt Barsalou

WelcometotheOctoberissueofStatisticsDigest.Thisissue’sMini-PaperisBigDataTerminology-KeytoPredictiveAnalyticsSuccessbyourownStandardsInSide-OutcolumnistMarkJohnson.Wealsohaveourregular columns on Design of Experiments, Stats 101, Statistics for Quality Improvement, and Testing and Evaluation. This issue’s Statistical Process Control column is a reprint of Dr. Lloyd S. Nelson’s classic Technical Aidarticle,TheShewhartControlChart-TestsforSpecialCauses.OurfeatureisAgileTeams:ALookatAgileProject Management Methods by L. Allison Jones-Farmer and Timothy C. Krehbiel.

In other news, the new College of Interdisciplinary Arts and Sciences at Arizona State University has created a scholarship in honor of our former SPC columnist, Dr. Connie Borror. The scholarship is for “women pursuing academic studies in STEM fields.” Find out more about the scholarship here.

Matt Barsalou

Free Webinar: Development and Use of Standards in Metrology: Perspective from a National Metrology Institute

25Oct201612:00to25Oct201601:00

JointheStatisticsDivisiononOctober25that12:00–1:00PMEasternforawebinartobegivenbyWilliamGuthrie,Statistical Engineering Division Chief at the National Institute of Standards and Technology. Physical standards provide the basis for accurate and traceable measurements. This talk provides an overview of the development and use of physical standards, sometimes referred to as etalons, from the perspective of a researcher at the National Institute of Standards and Technology (NIST), the National Metrology Institute for the United States. Generic types of standards to be discussed include primary standards, artifact standards, transfer standards, and check standards. Development and use of these types of standards will be based on examples drawn from NIST work. Some of the statistical aspects of interest in the development of standards to be described are the assessment of heterogeneity between units of batch-produced standards and the probabilistic interpretation of the reference values and uncertainties provided to characterize the values assigned to each quantity associated with a standard. Measurement traceability and international harmonization of standards also will be briefly described. Web site: https://attendee.gotowebinar.com/register/4768952842442349569

Page 4: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

4 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 asqstatdiv.org

MINI-PAPERBig Data Terminology—Key to Predictive Analytics Successby Mark E. JohnsonDept. of Statistics, Univ. Cent. Florida

AbstractWith all of the hype surrounding Big Data, Business Intelligence, and Predictive Analytics (with the Statistics stepchild lurking in the background), quality managers and engineers who wish to get involved in the area may be quickly dismayed by the terminology in use by the various participants. Singular concepts may have multiple names depending on the discipline or problem origin (businessanalytics,machinelearning,neuralnetworks,nonlinearregression,artificialintelligence,andsoforth).Hence,thereisapressingneedtodevelopacoherentandcomprehensivestandardizedvocabulary.SubcommitteeOneofISOTC69iscurrentlydevelopingsuchaterminologystandardtoresideintheISO3534series.Inadditiontothetechnicalstatistical-typeterms,itcouldalsoincludeadiscussionofsomeofthesoftwarefacilitiesinuseindealingwithmassivedatasets(HADOOP,Tableau,etc.).A benefit of this future standard is to shorten the learning curve for a Big Data hopeful. This paper describes the initial steps in addressing the terminology challenges with Big Data and offers some descriptions of forthcoming products to assist practitioners eager to plunge into this area.

IntroductionBig Data and predictive analytics are at the forefront of discussions involving the state of the statistics profession and its future prospects. Many statisticians who formerly made a living off the Six Sigma juggernaut are gravitating to Big Data applications for theusual reason—that iswhere themoney is!Of course, to take advantageof industry thirst for consultantswhoarewell-versed in the area, considerable training may be entailed, as some of the old weapon standby’s (think the seven basic tools) arenotexactlyreadyforBigDataprimetime.Oneofthefirstbarrierstoentryintothisareaistheterminologyassociatedwiththe tools, hardware and software associated with extra actionable information from massive data sets. It is not just the statistical terms—many of which have different aliases depending on the application area, but also many other computer and information technology lingos and proper names that are emerging in the Big Data world. The following is a potpourri list of terms, packages or buzzwords that could be encountered in short order when looking into the Big Data world:

ApacheAccumulo,ApacheHadoop,BigMemory,Cask,Cloudera,DataAnalytics,datalifecycle,Distributedcomputing,federateddatabases,Hortonworks,HPCC,IaaS,InternetofThings,MapReduce,NoSQL,R,Rattle,schemaonread,Spark, SQL, Sqrrl, Tableaux, Talend, Tranreality gaming, Tuple space, Unstructured data, plus many others—not an exhaustive list by any means.

With a bit more exploring, an A-to-Z list could probably be concocted. Is it even worth sorting through these names and determining those that are worth pursuing? Does one need to go back to school to participate in the Big Data enterprise? If one asks about the job market associated with Big Data, then for the avariciously inclined, Bill Snyder of InfoWorld posted the following salaries for analysts having expertise in the following Big Data packages or software:

MapReduce: $127,315Cloudera: $126,816HBase:$126,369Pig: $124,563Flume: $123,186Hadoop:$121,313Hive:$120,873Zookeeper: $118,567Data Architect: $118,104

Page 5: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 5

MINI-PAPER

These salaries certainly get the attention of our graduate and undergraduate students. Amazingly, there are reports of upwards of200,000positionsintheBigDataindustry!Suchinformationhasnotgoneunnoticedbyuniversityadministrators.Ouruniversity is no exception having added new faculty positions in big data for the past few years. Also, the sponsored research group offered a Big Data Grants Day last fall to foster the search for external funding and to encourage collaborations with local industries and the government sector who had large data set yearning to be analyzed. No doubt other universities are also on such quests.

Knowing that packages flourish and wane and programming for others may not be a long-term objective, a broader viewpoint couldbesought.Hereterminologyplaysarole,althoughadmittedlyitisperceivedasratherdry.

As noted, terminology entails not only the lexicon of statistics but also the areas of information technology include the hardware possibilitiesinvolvedinstorageandretrieval(“thecloud”)andsoftwaretools(e.g.,commercialpackagesHadoopandTableaux)andtechniques(distributedcomputingandmethodsforhandlingunstructureddata).Hence,gettingstartedinBigDataposesformidablechallengesforthetraditionalSixSigmapractitionerortherecent/currentstatisticsstudentwhosefacultymayalsobestruggling to keep up with the changing times.

The bottom line is that there is no magic elixir available to conquer the wall of Big Data expertise. A first step is confront terminology with a healthy respect for the ecosystem in which the data resides. In this paper some efforts underway will be described.These includeworkwithin ISOTC69SC1 (InternationalStandardsOrganizationTechnicalCommittee69onApplicationofStatisticalMethods,Sub-Committee1onTerminologyandNomenclature),theAdHocWorkingGroup7withinTC69onBigData,andfinallythejointeffortsofTC69withtheIEC/ISOJTC1WG9ledbyWoChangofNIST.NancyGradyof Scientific Applications, Inc. (SAI) has also been extremely helpful in these efforts.

Section 2 of this paper addresses the case for consideration of terminology and attempts to describe Big Data, as a consensus definition is illusive. The next three sections of this paper deal with three groups actively involved in this effort. Section 2 describestheterminologyeffortswithinISOTC69,SC1inparticularwiththesomewhatnarrowfocusonpredictiveanalysisterms.Abroadergroup,namelytheISOTC69AdHocGroup7iscoveredinSection3.ThisgroupattherecentJune2016London meeting tackled a gap analysis of TC69 standards versus Big Data needs. Finally, the status of the joint efforts of TC69withIEC/ISOJTC1WG9willbehighlightedinSection4.UnderthesponsorshipofNIST,thisgrouphasproducedsome excellent documents on a tentative Big Data computational ecosystem which has propelled TC69 greatly in their own progress.

The Case for Terminology and What is Big Data?The need for terminology standards was illustrated at the Fall TCC talk upon which this paper is based by noting the following list of terms:

Figure 1: Terminology

Page 6: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

6 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 asqstatdiv.org

MINI-PAPER

In the absence of a terminology standard containing these words, one basically has a collection of seemingly unrelated gibberish. In fact, each of the above 11 items are synonyms for the same term (see the end of the paper for the answer). Terminology standardsattempttodefinetermsinacoherentsystemsothateachtermsodefinedrelatestoasingle,uniqueconcept.“Oneterm-oneconcept”isamantraofISOTC37onTerminologyandOtherLanguageandContentResources.Thisnotioniscriticalin the realm of international standards in which the English writing style of a standard should be amenable to translation to other languages. Without terminology standards, the standards community ends up wasting time haggling over the wording of documents owing frequently to an inherent disagreement on terms.

Big Data poses a particular challenge for a definition as it embodies a number of concepts which have a perspective that depends on the individual user. NIST has made a valiant effort at characterizing big data with the following abstract in NIST Big Data Interoperability Framework: Volume 1 Definitions.

“Big Data is a term used to describe the new deluge of data in our networked, digitized, sensor-laden, information-driven world. While great opportunities exist with Big Data, it can overwhelm traditional technical approaches and its growth is outpacing scientific and technological advances in data analytics.” (page iii)

The NIST document goes on to say:

“The term Big Data has been used to describe a number of concepts, in part because several distinct aspects are consistently interacting with each other. To understand this revolution, the interplay of the following four aspects must be considered: the characteristics of the datasets, the analysis of the datasets, the performance of the systems that handle the data, and the business considerations of cost effectiveness.” (page 4.)

Inlightoftheoneterm-oneconcept,adefinitionofBigDatawouldappearhopeless.However,aworkingdefinition,ifnotfollowingtherequirementsofISOTC37isprovidedbythisgroup:

Big Data consists of extensive datasets—primarily in the characteristics of volume, variety, velocity, and/or variability—that require a scalable architecture for efficient storage, manipulation, and analysis. (page 5)

Listed explicitly in this definition are the so-called four “V-s”: volume, variety, velocity and variability. Volume refers to the absolute size of the data set for which “big” is in the size of the beholder’s storage capacity. Variety refers to diverse data types (nominal, continuous, text, etc.) from various domains and residing possibly on multiple repositories. Velocity naturally concerns the rate of data generation. The NIST document refers to Variability as the change in other characteristics. What this means is elaborated in a later section in which the variability is described as the change in data over time, including the flow rate, the format, or the composition. This is not variability in the statistical sense. A change in flow rate could necessitate an increase in devoted resources or nodes to handle a surge in volume while a format change could require a separate node for special processing. Thestatisticalfolksnaturallyobjecttovariabilitytobeusedtodescribenon-constancyoftheotherV’s.Ourcurrentpreferenceis“volatility”toreflectnon-constancyintheon-goinggenerationofthedata.Other“V’s”notatthestatureoftheaforementionedones but ought to be considered in the Big Data context include:

Veracity: accuracy of the dataValue: value of the analytics to the organizationValidity: appropriateness of the data for its intended use

TheverdictisnotoutwithrespecttotheultimatesetofV’s,althoughtheNISTgangoffourofferastartingpoint.Othercandidates mentioned at the Fall TCC conference for tongue-in-cheek consideration were variations on the words—verisimilitude, vapulatory and veridicous.

Page 7: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 7

MINI-PAPER

Terminology for Predictive Analytics, ISO TC69 SC1WithinISOTC69,thehomeforterminologyisSub-CommitteeOne(officiallyISOTC69/SC1Terminologyandsymbols)for which the author is the current Chair (with term limits, this chairmanship will end in 2019). The other sub-committees in TC69 are, as follows:

SC4 Applications of statistical methods in process managementSC5 Acceptance samplingSC6 Measurement methods and resultsSC7 Applications of statistical and related techniques for the implementation of Six SigmaSC8 Application of statistical and related methodology for new technology

Even if there were an SC on Big Data, terminology related to Big Data would be the responsibility of SC1. SC1 is responsible for theISO3534seriesonterminologyconsistingofthefollowingfourparts:

ISO3534-1 Part 1: General statistical terms and terms used in probabilityISO3534-2 Part 2: Applied statisticsISO3534-3 Part 3: Design of Experiments, Geneva:ISO.ISO3534-4 Part 4: Survey Sampling, Geneva:ISO.

A new Part 5: Predictive Analytics for Big Dataisunderdevelopment.ThestructureofthisdocumentisstandardizedbyISOtobe, as follows:

ForewordIntroduction1 Scope2 Normative References3 Terms and DefinitionsAnnex A (informative) Concept diagramsAnnex B (informative) Methodology used to develop the vocabulary

Thenormativereferencesincludetheother4partsofISO3534whilethechallengeistodevelopSection3.Thefollowingisarough outline as of the London meeting in 2016 of Section 3:

InitialOutlineofTermsandDefinitionsforPredictiveAnalyticsinBigData

3.1 Supervised Problems 3.1.1 types of models (response variables) y continuous (classical regression) y binary (logistic regression) ycategorical(discriminantanalysis/classificationrules) 3.1.2 types of model fitting least squares (including linear regression) decision trees (including boosting, bagging) neural networks (from a non-linear regression perspective) principal components regression

Page 8: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

8 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 asqstatdiv.org

MINI-PAPER

3.1.3 process of fitting model selection quality of fit overfitting 3.1.4 methods for big data applications lasso to achieve zero coefficients kernel smoothing methods ridge regression to control multi-collinearity 3.1.5 other considerations partial least squares scalability of methods distributed computing aspects3.2 unsupervised models 3.2.1 outlier detection (in particular, multi-dimensional) 3.2.2 cluster analysis 3.2.3 marketbasketanalysis(graph/networks/linkages) 3.2.4 Miscellaneous

ThisoutlineissufficientatthispointtolaunchanewworkitemproposalwithinISOTC69whichwillleadtoacallformember country experts who will ultimately write the standard and respond to comments from participating countries. This effortwillproceedinparallelwiththeongoingactivitieswithintheISOTC69AdHocgrouponBigData,describedinthenext section.

ISO Ad Hoc Committee 7ThepresentstructureofISOTC69consistsofthefollowingsixsub-committees:

SC1 Terminology and symbolsSC4 Applications of statistical methods in process managementSC5 Acceptance samplingSC6 Measurement methods and resultsSC7 Applications of statistical and related techniques for the implementation of Six SigmaSC8 Application of statistical and related methodology for new technology

From these sub-committee names, the positioning of Big Data is obviously not apparent. Consequently, in 2015 TC69 established inanadhocCommittee(AHC7)toinvestigatethefutureroleofTC69withrespecttoBigDatastandards.Thisgrouphasperformed a gap analysis of existing statistical standards and has recognized that many of these standards apply to Big Data even if they were not originally conceived with Big Data in mind. In particular, this group concluded that all the statistical process control standards apply as control charts fundamentally handle streaming data. Likewise, the acceptance sampling standards could also serve since their standards can handle arbitrarily large populations.

Some isolated efforts for specific standards on Big Data have been suggested but have not advanced to the new work item status. Sub-committee 1 as noted previously is working on a terminology document with emphasis on predictive analytics.

Page 9: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 9

MINI-PAPER

Joint effort with ISO/JTC1SupportedbyNIST,theISO/JointTechnicalCommittee1hasdraftedsevendocumentsofinteresttotheBigDatacommunityby providing an Interoperability Framework consisting of the following seven volumes:

Volume 1, DefinitionsVolume 2, TaxonomiesVolume 3, Use Cases and General RequirementsVolume 4, Security and PrivacyVolume 5, Architectures White Paper SurveyVolume 6, Reference ArchitectureVolume 7, Standards Roadmap

These tomes do a great service to the statistical community since they provide a computing ecosystem in which arguably any big data situation can apply. Also, very neatly this group has designated a placeholder for statistical contributions. A much simplified representation of their framework is given in Figure 2. A much more detailed version of this figure appears in Volume 2, page 13.

A massive amount of detail is provided in the NIST Seven Volume set and is great reading to get a feel for the manner in which IT folks address Big Data (and elaborating on the contents of the four box other than the statistical provider box). The center box in Figure 2 provides the placeholder as envisioned by our IT partners, with the statistical role to be further elucidated. The five specific statistical items in a nutshell are elaborated below:

Collection: Moving data from its repository to an accessible location (large volumes with attention to confidentiality) possibly including sampling.Preparation/Curation: Validating the data as pulled from the repository, cleansing of obvious mistakes and duplicate records, partitioning the data for distributed computing (as required).Analytics: Discovering value in the large data sets (e.g., correlations and trends) and establishing a structure useful for further summaries and analyses (possibly parallel, summary tables or relational data bases), addressing complexity (execution time of methods), handling real time or streaming data, human-in-the-loop discovery lifecycle.Visualization: exploratory data presentations for understanding, explicatory views of analytical results (real-time presentation of analytics), telling the story.Access: data export for querying, consumer analytics hosting, analytics as a service hosting.

Figure 2: (Over) Simplified representation of the computing ecosystem

Page 10: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

10 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 asqstatdiv.org

MINI-PAPER

These descriptions are not necessarily likely to match the typical statisticians understanding of the five steps provided. Collection may involve some sampling from the full data set but we detect a great interest by the IT professionals to use all of the data (after all, they have gone to a great deal of trouble to generate it in their computing frameworks!). Preparation/curation is a bit like what most of us think of as data preparation, although we would emphasize preliminary calculations to facilitate some of our predictive analytics tools (e.g., discretizing some non-linear responses, grouping some variables such as regions rather than states, etc.). With regard to Analytics, we could offer considerably more than correlations and trends (not that these two are irrelevant). The five activities are placeholders rather than final prescriptions. Visualization is very reminiscent of the Tukey approach of exploratory and confirmatory analysis followed by implementation. Some might think that visualization would take place early in an analytical investigation but mega-millions of data points are not necessarily amenable to the tools used on small data sets. Access is a bit more cryptic but seems to entail implementation of the discoveries made in the full-blown process (e.g., real time scoring of future items based on the analysis with analyses to be updated efficiently). Volume 2, page 21 notes, “The access activity of the Big Data Application Provider should mirror all actions of the Data Provider, since the Data Consumer may view this system as the Data Provider for their follow-on tasks.”

ThegoodnewshereisthattheNISTexperts(WoChanginparticular)involvedwithISO/JTC1recognizetheroleofstatisticsin this enterprise and we in turn recognize the great value in having a computation ecosystem in place to guide our own work. M. Boulanger, T. Kubiak and I are collaborating with Wo Chang using an actual large data set on health care fraud to exercise the ecosystem. Already this exercise has helped us to understand their IT ecosystem and allowed us to refine what we bring to the table for the IT professionals. We hope to complete a white paper on this topic by the end of 2016.

ConclusionsSuccessful Big Data applications are due to collaborative efforts of information technologists, the computer experts and statisticians. Terminology awareness is an initial barrier for the statistics community to participate effectively. As noted, much of theheavyliftingintermsofdescribingthecomputationalinfrastructurehasbeenlaidoutbytheISO/JTC1groupwiththoughtfulconsiderationoftheroleofstatisticiansintheendeavor.Obviously,thedevelopmentofinternationalstandardsdesigneddirectlyfor Big Data problems is in its infancy. It has been argued here that terminology for predictive analytics is a necessary first start with further standards driven by continued collaborations among the interested parties.

Inclosing,perhapsonewonderswhywesimplyrelyonnormativedictionariestohandletheterminology“problem.”TheOxfordEnglish Dictionary is a recognized authority on definitions and offers as its definition of Big Data:

Computing (also with capital initials) data of a very large size, typically to the extent that its manipulation and management present significant logistical challenges; (also) the branch of computing involving such data.

Such a definition devoid of any sense of uncertainty is unpalatable to statisticians. Another partial definition encountered in preparing the paper is attributable to

CathyO’Neill:“‘Bigdata’ismorethanonething,butanimportantaspectisitsuseasarhetoricaldevice,somethingthatcanbe used to deceive or mislead or overhype.” We are clearly far from a consensus definition of Big Data so meanwhile we shall move ahead in terminology tackling those areas of Big Data that support standards and are in harmony with successful Big Data applications.

AcknowledgementsThispapercametobeowingtoapresentationthatIgaveattheJointTechnicalCommunitiesConferenceheldinOrlando,FL,October22-23,2015,entitled,“BigDataTerminology—KeytoPredictiveAnalyticsSuccess.”MatthewBarsalou,editoroftheASQ Statistics Division Statistics Digest kindly invited me to write a mini-paper based on this presentation.

This was the second year of the Joint Technical Committees Conference (partially subsidized by the ASQ Statistics Division). Gordon Clark is the ASQ Statistics Division Representative to this group and has worked with this conference since its inception. MindyHotchkisswasinstrumentalinhavingthispresentationbeapartoftheprogram.

[Answer to Section 2 riddle. Each of these expressions were Google translations of “Communication.”]

Page 11: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 11

MINI-PAPER

ReferencesISO3534-1(2006).Statistics—Vocabulary and symbols—Part 1: General statistical terms and terms used in probability,Geneva:ISO.ISO3534-2(2006).Statistics—Vocabulary and symbols—Part 2: Applied statistics.,Geneva:ISO.ISO3534-3(2013).Statistics—Vocabulary and symbols—Part 3: Design of Experiments, Geneva:ISO.ISO3534-4(2014).Statistics—Vocabulary and symbols—Part 4: Survey Sampling, Geneva:ISO.NIST Big Data Public Working Group Definitions and Taxonomies Subgroup, Draft NIST Big Data Interoperability Framework: Volume 1,

Definitions. April 6, 2015.NIST Big Data Public Working Group Definitions and Taxonomies Subgroup, Draft NIST Big Data Interoperability Framework: Volume 2,

Big Data Taxonomies. April 6, 2015. http://dx.doi.org/10.6028/NIST.SP.1500-1

Page 12: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

12 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 asqstatdiv.org

COLUMNDesign of Experimentsby Bradley Jones, PhDJMP Division of SAS

Douglas MontgomeryArizona State University

Generating Orthogonal Designs on a Sphere

Historicallyscreeningdesignshaverequiredallthefactorstohavetwolevels (coded -1 and 1). Examples are the regular fractional-factorial designsofBoxandHunter(1961).Thesedesignsallrequirethenumberruns be a power of two. Plackett-Burman designs (1946) exist for numbers of runs that are multiples of four allowing more flexibility in the design effort.

Recently Jones and Nachtsheim (2011) introduced a class of screening designs where each factor has three levels (coded -1, 0, and 1). Subsequently,

Xiao et al (2012) called these designs Definitive Screening Designs (DSDs). They showed how to construct orthogonal DSDs whenever the number of factors is even. DSDs with a minimum number of runs require 2m + 1 runs where m is the number of factors.

Using singular value decomposition, it is possible to create an orthogonal design for the first-order model for any number of runs larger than the number of factors.

Hereishowtogeneratethesmallestsuchdesignhavingm + 1 runs.

1. Generate an m + 1 by m + 1 matrix of random numbers and call it R.2. Center the columns by subtract the average of each column from each element in each column3. Create X = 1 | R where 1 is a column of ones and | is the horizontal join operator.4. Perform a singular value decomposition of X (i.e. X = U*S*V t).5. Remove the first column of U.6. Multiply each element of U by the square root of (m + 1)/m

The matrix, U, is an orthogonal design with m + 1 runs. Each row is a distance of 1 from the center of the design. That is every point in the design lies on an m–sphereofradius1.Ifm = 2, the design is any equilateral triangle inscribed in a circle.

Note that these designs are not two-level designs. Performing such a designs requires the ability to set the factors precisely.

Oneproblemwithusingtheminimum-rundesigninascreeningexperimentisthattherearenodegreesoffreedomforerror.Another drawback for using the minimum-run design is that a two-factor interaction effect can be highly correlated with a main effect. This could result in substantial bias in the main effect estimate. A way to avoid both of these problems is to stack the minimum-rundesignontopofthesamedesignwitheveryelementmultipliedby–1.Thisresultsinafoldoverdesignhaving2(m + 1) runs. All the factor columns are uncorrelated with every two-factor interaction column so that main effect estimates are unbiased by any active second-order effect. Note that the foldover of the two-factor design above yields the equiradial hexagon design. Adding a center run to this design would allow it to fit a full quadratic response surface model with high efficiency.

ReferencesBox,G.E.P.andHunter,J.S.(1961)“The2k-p Fractional Factorial Designs Part I” Technometrics 3:311–351.Jones,B.andNachtsheim,C.J.(2011)“AClassofThree-LevelDesignsforDefinitiveScreeninginthePresenceofSecond-OrderEffects”

Journal of Quality Technology, 43.1–15.Plackett,R.L.andBurman,J.P.(1946)“TheDesignofOptimumMultifactorialExperiments”.Biometrika,33:305–325.Xiao, L.; Lin, D. K. J.; and Bai, F. (2012) “Constructing Definitive Screening Designs Using Conference Matrices”. Journal of Quality

Technology 44,pp.1–7.

Bradley Jones Douglas C. Montgomery

Page 13: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 13

Technical Aids: The Shewhart Control Chart-Tests for Special Causes

Three years ago, for purposes of convenience and uniformity of application, I collected a set of tests for assignable causes (Figure 1) to be applied to Shewhart control charts for means of normally distributed data. Figure 2 is a set of comments on these tests. Deming (1982) refers to assignable causes as “special causes” in order to contrast them with what he calls “common causes.” A common cause is one that affects all the points on the chart, as when a centerline is too high. A common cause is fixed by changing the system. A special cause is fixed by removing the perturbing influence that caused the out-of-control signal.

COLUMNStatistical Process ControlLloyd S. Nelson

Figure 1: Illustrations of test for special causes applied to Shewhart control charts

Reprinted with permission from Journal of Quality Technology © 1984 ASQ, www.asq.org. No further distribution allowed without permission.

Page 14: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

14 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 asqstatdiv.org

Statistical Process Control

Figure 2: Comments on tests for special causes

For my use, Figures 1 and 2 were printed back to back on 8.5″ × 11″ yellow card stock and issued to all areas where Shewhart chartsareapplied.Oneofthemainobjectiveswastostandardizeonthisscheduleoftestssothatdiscussionwouldbefocusedon the behavior of the process rather than on what test should be used. Further, control limits are taken to be three sigma away from the mean unless specified otherwise. If it is desirable to use what otherwise might be called “two sigma control limits,” test one is simply redefined to be “one point beyond Zone B.”

Page 15: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 15

Use of Lean Six Sigma to Improve System Performance

Lean Six Sigma (LSS) has become a popular strategy for achieving continuous improvement in manysectorsofoureconomy.OncewehadseparateLeanandSixSigmaperformanceimprovementmethods, how do we combine them to get an integrated LSS performance improvement strategy? Are we implementing the integrated LSS in an effective manner?

The Lean approach to process improvement had its roots in the Toyota production system (Arnheiter and Maleyeff, 2005). A key element of Lean features the reduction of waste, such as motion, overproduction, over processing, lead time, rework, inventory and defects (Albliwi, Antony, et al, 2015). Lean also addresses the reduction of total cycle time. In addition, Lean attempts to reduce

variabililty including demand variability, manufacturing variability and supplier variability (Arnheiter and Maleyeff, 2005). LeanusestoolssuchastheKanbansystem,5SWorkplaceOrganization,CauseandEffectAnalysis,andValueStreamMapping(Albliwi, Antony, et al, 2015). The Six Sigma process improvement methodology has objectives such as reducing variation in any process, reducing costs, making savings to the bottom line, increasing customer satisfaction, improving product quality, and reducing defects (Albliwi, Antony, et al, 2015). The Six Sigma analytical and statistical tools include Quality Function Deployment, Failure Mode and Effects Analysis, Statistical Process Control, Design of Experiments, and Analysis of Variance.

HowcanLeanandSixSigmaprojectsbenefitfromanintegratedLSSprocessimprovementmethodology?

• ALeanprojectcanusethemorescientificSixSigmaapproachtoimprovingquality.Forexample,defectsandunsatisfactoryquality produce waste.

• ASixSigmaprojectcanbenefitfromtheLeanapproachtoreducewaist.Forexample,aprojecttoimprovequalitycanbenefit from reduced lead times and cost.

These examples show that an integrated LSS process improvement methodology is more holistic.

COLUMNStatistics for Quality Improvementby Gordon Clark, PhDProfessor Emeritus of the Department of Integrated Systems at The Ohio State University and Principal Consultant at Clark-Solutions, Inc.

Gordon Clark

Tests one, three, and four can be used with p, np, c, and u charts. If the distributions are close enough to being symmetrical, test two can also be used with these charts. Use binomial or Poisson tables to check specific situations.

Conditions that can cause each of these tests to give a signal are illustrated in the Western Electric Statistical Quality Control Handbook (1956). The serious user should consult this source. I am pleased to be able to say that the Society has given permission for readers to reproduce Figures 1 and 2 without copyright restriction.

ReferencesDEMING, W. E. (1982). Quality, Productivity and Competitive Position. Center for Advanced Engineering Study, Massachusetts Institute of

Technology, Cambridge, MA, Chapter 7. WESTERN ELECTRIC (1956). Statistical Quality Control Handbook. American Telephone and Telegraph Company, Chicago, IL.

Statistical Process Control

Page 16: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

16 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 asqstatdiv.org

Statistics for Quality Improvement

Possible Lean Six Sigma Frameworks

Snee-Hoerl LSS FrameworkSneeandHoerl(2007)proposedthedevelopmentofaholisticimprovementapproachthat:

• Worksinallareasofthebusiness• Addressesallkeymeasuresofperformance• Addressesalltypesofimprovementincludingmanufacturing,serviceandadministrative• IncludesbothLeanandSixSigmaprinciplesandtools.

TheSnee-HoerlframeworkidentifiesthreemajorLSSprojecttypeswhichrequiredifferentcompletiontimes.Theyare:

• Quick-hit projects take the least amount of time to complete, and they can be accomplished almost immediately.• Kaizen projects take about 30 days or less to complete.• Six Sigma projects usually require four to six months to complete but might be accomplished more quickly.

All of the above projects types are LSS projects and may use the DMAIC process employed by Six Sigma. That is, the steps in the DMAIC process are Define, Measure, Analyze, Improve and Control. Also, the tools used in the project types would include both Six Sigma and Lean tools.

ProjectselectionwouldbeakeystepintheSnee-HoerlLSSframework.

Figure 1: Project selection

Page 17: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 17

Statistics for Quality Improvement

Note that a project originally identified to be Six Sigma project type requiring the complete DMAIC process may be found to be completed more quickly as a Kaizen of Quick-hit project.

Mader LSS FrameworkMader (2008) describe four major Six Sigma and Lean frameworks. They are:

• TraditionalSixSigma(TSS)• LeanSixSigmaplus(LSS+)• LeanSixSigmalight(LSSL)• TraditionalLean(TL)

All of the above frameworks use the DMAIC process. A major difference between the frameworks is the skill sets and tools required for each framework. Mader presents these skill sets for each phase of the DMAIC process and each framework in Table 1 of his paper.

Figure2presentsaflowchartofprocessingstepsfortheLSS+andLSSLframeworks.AtypicalLSS+projectwouldtakeabout16 weeks to complete and a LSSL project about 6 weeks.

TheSnee-HoerlandMaderLSSframeworksaresimilar;however,Mader(2008)providesmoredetailaboutitsimplementationby including the skill sets for each phase of the DMAIC process. Notice that both LSS frameworks start by using a Value Stream map to define the project. The Value Stream map is a lean management tool.

Pepper and Spedding (2010) propose another LSS framework. It simply uses Lean tools to determine key areas for improvement, called “hot spots”. Then it use Six Sigma to target these hot spots and improve system performance.

Figure 2: LSS + Flowchart

Page 18: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

18 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 asqstatdiv.org

Statistics for Quality Improvement

Review of Manufacturing Sector Lean Six Sigma PapersAlbliwi and Antony et al (2015) reviewed 37 LSS papers published in top journals from 2000 to 2013. The authors state that LSS has become the most popular strategy for continuous improvement (CI) in manufacturing and service sectors. Nineteen of these papers included case studies in the manufacturing sector from seven different countries which are the USA, the UK, India, the Netherlands, China and New Zealand.

The top five most common tools that were used in the case studies are:

• CauseandEffectAnalysis(13casestudies)• ValueStreamMapping(12casestudies)• 5S(Workplaceorganization)(11casestudies)• DesignofExperiments(DOE,8casestudies)• ParetoChart(7casestudies)

TheCauseandEffectAnalysis,DOEandParetoChartareSixSigmatools.

Twelve papers cited limitations to applying LSS in the manufacturing sector. The top five limitations they cited are:

1. Absence of clear guidelines for LSS in the early stage of implementation2. Availability of LSS Curricula.3. An understanding of the usage of LSS tools and techniques.4. Availability of a roadmap to be followed.5. Few practical applications of the LSS integrated framework.

What actions should we take to enhance the application of LSS and increase its use?

• Agreeonanintegratedframeworkthatuserscanimplementeffectively.• Preparewrittenuserguidesthatuserscanunderstand.

o Describe the framework and the steps to implement LSS in diverse situations.o Describe the usage of LSS tools.o Describe example applications and case studies.

• PresentwebinarsthatpresenttheLSSframeworkandexampleapplications.

ReferencesAlbliwi, S., J. Antony, et al. (2015). “A Systemic Review of Lean Six Sigma for the manufacturing industry.” Business Process Management

Journal 21(3): 665-691.Arnheiter, E. D. and J. Maleyeff (2005). “The Integration of Lean Management and Six Sigma.” The TQM Magazine 17(1): 5-18Mader, D. P. (2008). “Lean Six Sigma’s Evolution: Integrated Method Uses Different Deployment Models.” Quality Progress 41(1):

40-48.Pepper, M. P. J. and T. A. Spedding (2010). “The Evolution of Lean Six Sigma.” International Journal of Quality & Reliability Management

27(2): 138-155Salah, S., A. Rahim, et al. (2010). “The Integration of Six Sigma and Lean Management.” International Journal of Lean Six Sigma 1(3):

249-274.Snee,R.andR.Hoerl(2007).“IntegratingLeanandSixSigma—AHolisticApproach.”Six Sigma Forum Magazine 6(3): 15-21.

Page 19: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 19

Histogram

A histogram is a graphical representation of the distribution of variable data (data generated by taking measurements) using a bar chart. It is commonly used to visually communicate information about a process or a product as well as to help make decisions regarding prioritization of improvement initiatives.

This information is represented by a series of equal-width columns of varying heights. The columns are of equal width because they all represent a specific (class) interval within a range of observations. Column height is a function of (directly proportional to) the number of observations (frequency of occurrence) within the interval covered by each column. Thus, column height varies according to the number of “items” within a specified interval.

With most naturally-occurring data, there is a tendency for many observations to occur towards the center of the distribution (known in statistical circles as the “central tendency”) with progressively fewer points occurring further from the center.

Histogramsofferaquicklookatdataatasinglepointintime,e.g.,forthelasthour,lastshift,lastday,etc.Theydonotdisplayvariation or trends over time. A histogram displays how the cumulative data looks now. It is useful in understanding the relative frequencies (percentages) or frequency (quantity) of the data and how that data are distributed.

Many candidate processes or products for improvement can be identified using this elementary tool. The frequency and shape of thedatadistributionprovideinsightsthatwouldnotbeapparentfromdatatables(CheckSheets/Lists)alone.HistogramsalsoformthebasisfortwootherfrequentlyusedTQM/CItools:ParetoAnalysis,andProcessCapabilityAnalysis(Cp).

DescriptionA Histogram is a graphical representation of the distribution of Variable Data (data generated by taking measurements) using a Bar Chart. It is commonly used to visually communicate information about a process or a product as well as to help make decisions regarding prioritization of improvement initiatives.

This information is represented by a series of equal-width columns of varying heights. The columns are of equal width because they all represent a specific (class) interval within a range of observations. Column height is a function of (directly proportional to) the number of observations (frequency of occurrence) within the interval covered by each column. Thus, column height varies according to the number of “hits” within a specified interval.

With most naturally-occurring data, there is a tendency for many observations to occur towards the center of the distribution (known in statistical circles as the “central tendency” with progressively fewer points occurring further from the center.

Histograms offer a quick look at data at a single point in time, e.g., for the last hour, last shift, last day, etc. They do not display variation or trends over time. A Histogram displays how the cumulative data looks now. It is useful in understanding the relative frequencies (percentages) or frequency (quantity) of the data and how that data is distributed.

Many candidate processes or products for improvement can be identified using this one basic tool. The frequency and shape of the data distribution provide insights that would not be apparent from data tables (Check Sheets/Lists) alone. Histograms also form the basis for two other frequently used Six Sigma/TQM/CI tools: Pareto Analysis, and Process Capability Analysis (Cp).

COLUMNStats 101by Jack B. ReVelle, PhDConsulting Statistician at ReVelle Solutions, LLC

Jack ReVelle

Reprinted with permission from Quality Press © 2004 ASQ; www.asq.org. No further distribution allowed without permission.

Page 20: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

20 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 asqstatdiv.org

Stats 101

Types of HistogramsUseful information about a population can be obtained by examining the shape and spread of a histogram constructed from either sampling or census data drawn from the population. There are many typical shapes, each with its own subset of possible spreads of the data. There are in fact far more Histograms than there is space to discuss them here. The following is a limited discussion regarding several combinations of Histogram shapes and spreads.

• Normal (symmetrical or bell-shaped): The mean, median and mode are all of approximately the same value and located in the center of the range of data. The frequency of occurrence is the greatest in the center and gradually gets less and less towards the skirts or tails of the distribution of the data.

• Positive Skew (asymmetrical): The mean value is located to the left of the center of the range of data. The frequency of occurrence declines rapidly to the left of the mean and gradually to the right of the mean. Note: This shape is likely to occur when the lower limit is either theoretically restricted or by a specification value, or when values less than a specific value cannot occur.

• Negative Skew (asymmetrical): The mean value is located to the right of the center of the range of data. The frequency of occurrence declines rapidly to the right of the mean and gradually to the left of the mean. Note: This shape is likely to occur when the upper limit is either theoretically restricted or by a specification value, or when values greater than a certain value cannot occur.

• Plateau (rectangular): The frequency of occurrence in each class is approximately the same except for those at the extremes of the range of data. This type of Histogram appears to forms a plateau or mesa. Note: This shape can occur naturally when dealing with the number of different types of cards in a deck or when counting the frequency of occurrence of a number of dots on the face of a single die or faces of multiple dies.

• Twin-peak (bi-modal): The frequency of occurrence is low near the center of the range of data with a peak on either side. This shape is known to occur when two frequency distributions with quite different mean values are mixed together.

Figure 1: Histogram – Several types

Page 21: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 21

Stats 101

ConstructionUnderstanding how to construct a Histogram is best explained using an example. Let’s suppose that a Process Improvement Team (PIT) collects a random sample composed of 70 data points (n = 70), and that the greatest value is 120 (Xmax = 120) and the smallest value is 80 (Xmin = 80). The first step is to determine the range (R) which is mathematically defined as R = Xmax –Xmin. InthiscaseR=120–80=40.

The second step is to determine the number of columns in the Histogram. This is done by taking the square root of the number of data points which in this example is n = 70. If you’re like most folks who hear this for the first time, you’re probably saying to yourself, “Where did that come from?” Up front, there is no scientific basis for this technique, but it’s time tested and it works, so bear with me on this. The number of columns in a Histogram is important because if there are too few or too many, you don’t really gain an understanding of the true shape or profile of the frequency distribution of the data that has been collected. Taking the square root of the sample size normally provides a useful guideline as to the appropriate number of columns to use. In this case the square root of 70 is between 8 (82 = 64) and 9 (92 = 81) and so we know the number of columns in the Histogram will be in that general area.

Thethirdstepistocalculatethesizeoftheclassintervals(CI)intheHistogram.Tocompletethisstep,dividetherange(R)bythe square root of the sample size. For this example, we’ll divide R = 40 by the square root of n (we’ll use 8 since it evenly divides into 40). Since 40 divided by 8 = 5, this is our class interval.

The fourth step is to develop the full range of class intervals, each with a class size of 5.

As is traditional, we start with the smallest value, Xmin = 80, and proceed to the largest value,

Xmax = 120, in increments equal to the class interval of 5, as follows:

Table 1: Increments equal to 5

The fifth step is to create a Tally Sheet. Since this is a simulated example, we’ll create a frequency of occurrence figure for each class.

Page 22: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

22 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 asqstatdiv.org

ThesixthandfinalstepistodeveloptheHistogramwhichinthiscaseappearsthisway.

Stats 101

Figure 2: Histogram – Correct number of classes

As was noted earlier in this discussion, be careful not to use too many or too few classes. Either choice can disguise the true frequency distribution of your sampling data as the following figures demonstrate.

Figure 3: Histogram – Too many classes

Page 23: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 23

Stats 101

Figure 4: Histogram – Too few classes (4)

Page 24: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

24 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 asqstatdiv.org

Figure 5: Histogram – Too few classes (5)

Stats 101

When There Are Specification Limits It is not unusual for there to be specification limits associated with a Histogram. When this is the case the limits (upper, lower, or both), should be drawn on the Histogram so as to compare the frequency distribution with the specification limit(s). The intent of including thespecificationlimit(s)istodeterminethespreadandpositioningoftheHistogramwithrespecttothespecificationlimit(s).Thiscan be accomplished mathematically using CP (the Process Capability Index) and CPK (the Process Performance Index).

Figure 6: “Histogram – With specification limits

Page 25: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 25

Censored Data Analysis for Performance Data

Previously, I emphasized the importance of using continuous metrics to make testing efficient and informative. A common challenge that arises when converting from probability based requirements to continuous variables is figuring out how to account for failed outcomes. For example, if chemical agent detector has a requirement to detect 85% of agents within one minute there may be runs when the detection time exceeds one minute and runs when the detector fails to detect the agent at all. If we simply convert to the continuous metric of detection time, how do we account for the trials when there is no detection time?

A common mistake is to treat the detection probability and the detection time as two separate analyses. Consider the following 10 data points to illustrate the problems with this approach.

COLUMNTesting and Evaluationby Laura Freeman, PhDAssistant Director of the Operational Evaluation Division and Test Science Task Leader at the Institute for Defense Analyses

with Dr. Rebecca Dickinson (also at IDA)

Laura Freeman

Stats 101

WhentherearespecificationlimitsassociatedwithaHistogram,anyofthefollowingfivesituationscanoccur:

1. When the Histogram satisfies the specification limit(s):a. Maintenance of the existing process is sufficient unless there is a compelling need for continuous improvement.b. Thespecificationlimit(s)is/aremarginallysatisfied,however,thereisnoroomforashiftingoftheMean or an increase

in the Standard Deviation.2. When the Histogram does not satisfy the specification limit(s):

a. Corrective action is necessary to shift the Meanclosertothetarget/nominalspecification.b. Corrective action is necessary to decrease the Standard Deviation.c. Corrective action is necessary to shift the Meanclosertothetarget/nominalspecificationandtodecreasetheStandard

Deviation.

Table 1: Detection times

Page 26: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

26 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 asqstatdiv.org

The two separate analyses proceed by first analyzing the eight detection times; then a second analysis calculates the percentage ofmetricsthatwereunderoneminute.Thetablebelowsummarizesapotentialoutcomefromthistwostageanalysis.Herewe use a lognormal distribution fit to the data and extract quantiles, which accounts for the skewness in the data. Notice, the 85th percentile of the detection times is below the requirement of 60 seconds, but the overall percentage estimate is lower than 85 percent.

Testing and Evaluation

Table 2: Metric, estimate and confidence interval

These results are hard to interpret and leave decision makers with an unclear understanding if the system is meeting requirements. Additionally, the wide confidence intervals on the probability estimate make the decision fairly ambiguous. Finally, we have inadvertently double counted one piece of our data. The 75 second detection time ended up counting against the system twice by both influencing the lognormal fit to the data and the probability based binomial calculation.

The next step someone might take is to try to combine the analyses using conditional probability:

P(Detection ∩ (Time < 60 seconds)) = P(Time < 60 second⎪Detect) ∗ P(Detect)

The first obvious problem is that we need to stop double counting the 75 second detection time. Since there is valuable information in the detection time, we can include it in the lognormal analysis and drop it from the binomial probability calculation. We can fit the lognormal distribution to the data by maximized the likelihood function:

8

L = ∏ f (μ,σ|ti) i=1

Where f (μ,σ) is the probability density of the lognormal distribution, μ and σ are the location and scale parameters of that distribution, and ti are the eight observed detection times. Using the lognormal fit we can then calculate:

P (Time < 60 second ⎪Detect ) = F (60⎪μ̂ = 3.6, σ̂ = 0.36) = 0.924

Where μ̂ and σ̂ are the maximum likelihood estimates of the parameters of a lognormal distribution. Plugging the lognormal and binomial components in to the conditional probability gives:

P(Detection ∩ (Time < 60 seconds)) = 0.924 * 0.80 = 0.74

Now, our analysis shows a 74% probability of detection before 60 seconds, which is below the 85% requirement within one minute.Thisanalysisismoreinformativethanasimple7/10binomialapproachtoaddressingtherequirementbecauseittakesinto account the information in the shape of the continuous detection times.

However, there are challenges implementing this approach. First, themethod for calculating confidence intervals is notstraightforward. They could be constructed using bootstrapping or propagation of error, but neither of those techniques is generally available in software and therefore not widely accessible. Additionally, for this illustrative example we are only considering a single set of conditions. Most tests actually include multiple conditions, ideally from an experimental design. The goal of the analysis then is to develop a regression model to characterize what factors impact the detection time. For example for an actual chemical agent detector we are interested in how agent concentration level, temperature, and water vapor concentration impact the detection time and probability.

Page 27: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 27

Testing and Evaluation

As a solution to these challenges, an ad hoc approach we have used leverages a technique from reliability analysis in the form of censored data. In a reliability context, right censored data are observations of units that have not failed at the end of the test. Meeker and Escobar (1998) provide an excellent introduction to censored data analysis. This ad hoc solution has been adopted in the DoD because it can be implemented in software as opposed to a potentially more rigorous mixture model.

To implement the censored data analysis in this example, we assume the last two detection times are censored, that is that a detection will occur at some future time, then we can update the likelihood function:

10

L = ∏ ( f (μ,σ|ti))δi(1–F (μ,σ|ti)1–δi

i=1

Where F (μ,σ) is the cumulative distribution function for the lognormal distribution and δi is an indicator variable that is equal to 1 when a detection occurs and 0 when there is no detection. We can now proceed just as before and maximize the likelihood function to obtain estimates of μ and σ and distributions percentiles to answer the requirement.

The primary advantages of this methodology over the previous conditional probability approach is that it widely implemented in many statistical software packages including JMP and Minitab, which enables widespread use. A challenge is that we have to carefully think about what information we are imparting into our inferences based on the right censoring. This is because the shape and the scale of the continuous distribution are now dependent not only on the observed detection times, but also the “unobserved” detections.

Consider three difference analyses of the detection time data. In the first analysis, the censoring time is set to the last observed detection time of 75 seconds. In the second the detection time is set to the end of the test period at 2 minutes. In the third analysis we extend the duration of the test to 5 minutes and assume the last two points still failed to detect.

Table 3: Detection time data

Clearly, there are large practical differences in the interpretation of the results based on the different censoring times. In comparing back to the conditional probability approach you will note that setting the time to the last observed detection time closely matches the conditional probability, but now we have confidence intervals from the software, which are also slightly narrower than the binomial intervals. Using the larger values for censoring time (i.e., 2 or 5 minutes) decreases the detection probability. This is because the parameters of the distribution must now accommodate potential values that are much larger that all of the observed detection times.

So which value is appropriate if you want to use this right censored data approach? The answer depends on the situation and if the censoring time provides meaningful information to the analysis. The critical assumption we are making when employing a right censored data analysis is that for the detections that have not yet occurred will eventually occur. In this case, one could argue that if the detector had not detected by 75 seconds, it may never detect, which would make using the conditional probability approach the most appropriate (an therefore the lowest censoring time if a censored analysis is used for computational convenience).

ReferencesMeeker, W. Q., & Escobar, L. A. (1998). Statistical Methods for Reliability Data. John Wiley & Sons.

Page 28: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

28 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 asqstatdiv.org

COLUMNStandards InSide-Outby Mark Johnson, PhDUniversity of Central FloridaStandards Representative for the Statistics Division

TC69 London Recap

The annual ISOTC69 (Applications of Statistical Methods) international meeting was heldJune 6-10, 2016 in London, England at the British Standards Institute (BSI) headquarters near Hammersmith.ForthosefamiliarwiththeLondonmetrosystem,BSIisabovetheGunnersburyStation on the District (green) Line, making it accessible for those who can mind the gap. The local meeting arrangements were handled expertly by Bernd Borchert of BSI.

There were approximately 60 official delegates representing China, Denmark, Germany, India, Italy, Japan, Malaysia, South Africa, United Kingdom and the United States. Some delegates participated by WebEx in spite of the time difference, although this arrangement was poorly implemented at times

leaving some folks in the lurch in the middle of the night.

TheUSwaswellrepresentedwithitsdelegationofexperts.TheUSholdstheSecretariat(formerlyheldbyFrance)ofISOTC69sothatASQstaffsupportingthiseffortinattendancewereJenniferAdmussen(SecretarytoISOTC69)andJulieSharp(SecretarytoISOTC69/SC1).TheChairpersonofISOTC69isDr.MichèleBoulanger,whoisonthefacultyofRollinsCollege,WinterPark, Florida. US TAG Leader Kelly Black of Neptune and Company, Inc. had to cancel just before attending the meeting unfortunately.BrendaBishopofMeadJohnsonNutritionablysubstitutedastheHeadoftheUSDelegationinadditiontoherregular duties with WG3.

The other US delegates and their primary role in TC 69 are, as follows:

Brian Dodson, SKF, SC7Nancy Grady, SAIC, ad hoc Big Data CommitteeMark Johnson, Univ. Central Florida, SC1 ChairTom Kubiak, SC7 and ad hoc Big Data CommitteeGlenn Mazur, QFD Institute, SC8Michael Morton, Altria Client Services, SC6Daniel Tholen, Dan Tholen Statistical Consulting, SC6John Vandenbemden, SC5Nien-fan Zhang, NIST, SC6

ISOTC69hasthefollowingsubcommitteesandoneworkinggroupreportingtoTC69:

SC1 Terminology and SymbolsSC4 Applications of statistical methods in product and process managementSC5 Acceptance SamplingSC6 Measurement methods and resultsSC7 Applications of statistical and related techniques for the implementation of Six SigmaSC8 Application of statistical and related methodology for new technology and product developmentWG3 Statistical interpretation of data

Mark Johnson

Page 29: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 29

Standards InSide-Out

Since the previous meeting in Dalian, China in 2015, the US TAG was joined by John Vandenbemden, who participated in SC5 work(principallyinvolvingISO2859andISO39511).TheformerSC5ChairDavidBaillee,OBE,whoretiredlastyearfromstandards work made a brief appearance (he lives in the London area) and we learned that he is completing work on a Ph.D. dissertationataripeoldageof75+(onacceptancesampling,ofcourse).AlthoughtheSC5gapwithintheUSTAGhasbeenfilled thanks to John, we are always interested in attracting additional experts to serve on the various subcommittees. Any US expert interested statistical standards and capable of attending international meetings would be welcome to apply to join the US TAG (contact Jennifer Admussen at [email protected] if interested).

Hereisabriefre-capofsomeoftheactivitiesandhighlightsofthevarioussubcommitteesandworkinggroupsfromtheLondonmeeting. Efforts at the meeting typically involved progressing documents through the various stages of standards development (New Work Item, Committee Draft, Draft International Standard, Final Draft International Standard) depending on the working groups and the state of the standards or technical reports.

WG3 is under the auspices of TC69 (this working group was formerly under SC2 which no longer exists so was assigned to TC69 many years ago). The convenor of this working group has been Jørgen Grandfeldt from Denmark who announced at this meeting thatheisretiringfromstandardswork.JørgenhasbeeninvolvedformanyyearsonamultipartstandardISO16269“StatisticalInterpretation of Data.” Interestingly, experts in Denmark who wish to work on statistical standards must pay a large annual fee of over $2000 US equivalent simply to be a representative and their travel expenses are picked up by their employer or themselves! In this sense Americans have it rather better than the Danes.

SC1 of which I am Chair delivered its annual terminology workshop for the benefit of the new delegates and as a refresher of the regulars. This subcommittee also reviewed a detailed outline in support of a new work item proposal on terminology in Predictive Analytics for Big Data and interacted and consulted with other SCs.

SC4 has a particular interest in statistical process control and control charts. They had a lengthy discussion on some documents in whichtheISOSecretariatinGenevahadmadesomechangestoanearfinaldocument,followingadirectivethatallplotshavesimplexandylabelsforaxestofacilitatetranslationpurposes.Ofcourse,thisisnotsensibleforcontrolchartsingeneralandespeciallywhen produced from statistical software, so there was a resolution developed to make an exception regarding this generic rule.

SC-7focusesontopicsofinteresttotheSixSigmaCommunity.ThelatestprojectsincludeselectedapplicationsofANOVAandselected applications in distribution selection (particularly, reliability distributions). We were reminded at this meeting that SC-7 has existed now for nine years as its current co-chair Dr. Jing Sun from China is stepping down (there is now a 9 year total termlimitforChairs).Dr.WenxingDingofChinawillbecomethenextCo-chairtoservealongwithChrisHarrisoftheUK.

SC-8 continues to develop its multi-part standard on the application of statistical and related methods to new technology and product development. The parts are:

Part 1: General principles and perspectives of Quality Function Deployment (QFD)Part2: AcquisitionofVoiceofCustomerandVoiceofStakeholder– Non-quantitative approachesPart3: AcquisitionofVoiceofCustomerandVoiceofStakeholder– Quantitative approachesPart 4: Analysis of non-quantitative and quantitative Voice of Customer and Voice of StakeholderPart 5: Solution strategyPart6: Optimization—RobustParameterDesignPart7: Optimization—ToleranceDesignPart 8: Guidelines for commercialization and life cycle

Page 30: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

30 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 asqstatdiv.org

FEATUREAgile Teams: A Look at Agile Project Management Methodsby L. Allison Jones-Farmer, PhDDepartment of Information Systems & Analytics, Miami University

Timothy C. Krehbiel, PhDDepartment of Management, Miami University

Agile Project Management

Agile project management methods emerged from the software development community, but have migrated into many aspects of business and industrial project management. Software development and IT projects are often large and uncertain undertakings where planning is difficult, changes in scope are frequent, and customer requirements are unknown until solutions are released. Unfortunately, the complexities and uncertainties of these projects have led to historically high and costly failure rates (Charette, 2005 ). Most of us can recall large software releases that have resulted in customer backlash followed by numerous patches and updates. Recent approaches in software development have migrated to more evolutionary development cycles, where new releases incorporate fewer changes and are rapidly deployed, allowing for more flexibility and responsiveness to customer feedback.

In “Why Evolutionary Software Development Works,” the author discusses research into successful software development projects, and noted that the best outcomes were obtained from an evolutionary development approach (MacCormack, 2001a). When companies first released a low-functioning version to select customers, obtained rapid feedback, and quickly incorporated that feedback into subsequent versions, success was much more likely. McCormack (2001b) noted four best practices for software development: (1) an early release of the evolving product design; (2) daily incorporation of new code and rapid feedback; (3) a team with broad-based experience; (4) major investment in the design of the product architecture.

In 2001, seventeen software developers who were passionate about evolutionary methods to software development joined together and wrote the Agile Manifesto (Beck et al., 2001) in which they described twelve principles of Agile software development. These include prioritizing customer satisfaction, having working software as the primary measure of success, working in cross-functional teams, attention to technical excellence, andperiodically reflectingon teamperformance.Outof thismanifesto grew thewidespread implementation of Agile project management methods in software development and IT organizations. Many internet companies use Agile project management to accelerate time to market, manage changing priorities, and better align projects with businessobjectives(VersionOne,2014).

Standards InSide-Out

This subcommittee has both an ambitious and productive work program.

OneofthehighlightsformeatthemeetingwastheprogressthatwasmadeinconjunctionwiththeTC69adhocgrouponBigData. An applicability analysis of the TC69 standards for Big Data was conducted and it was determined that almost a majority of the TC69 statistical standards are relevant to Big Data, even though they were not originally developed with massive data sets in mind. In fact, the entire suite of control chart and acceptance sampling standards apply to observational data and are scalable to arbitrarily large data sets. Future standards to address Big Data issues specifically are being anticipated (relative to validation, terminology and the computational ecosystem) and it is hoped to have a designated convenor for the ad hoc Group later this year (M. Boulanger is currently interim convenor).

Any US readers interested in contributing to international statistical standards work should contact Jennifer Admussen of ASQ at [email protected]. Non-US citizen international experts should contact their country’s official standards organization. By applying in the next few months, delegate status could be achieved to allow official participation in next year’s meeting to be tentatively held in Cape Town, South Africa.

Page 31: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 31

Agile Teams: A Look at Agile Project Management Methods

Scrum MethodologyAgile is an umbrella term that describes a philosophy behind software or product development practice. Thus, if a team is using an Agile approach, they may be using one of a number of methodologies. The most popular is a process developed by Jeff Sutherland and Ken Schwaber known as scrum (Sutherland and Sutherland, 2014). The term “scrum” was borrowed from Takeuchi and Nonaka (1986) where they compared high-functioning teams to the scrum formation in Rugby. Merriam-Webster’s defines a scrum as

a rugby play in which the forwards of each side come together in a tight mass and struggle to gain possession of the ball when it is tossed in among them (Scrum, 1986).

The scrum methodology includes a few guiding principles. These include

• DivideandConquer.Dividecomplexentitiesintosimplepieces.Thisincludesdividinglargeteamsintosmallerfocusedteams as well as large projects into small pieces that can be completed in a short period of time. These short time periods for task completion are a hallmark of the scrum method and are known as sprints.

• InspectandAdapt.Oncethesmalltasksarecompleted,rapidfeedbackisgained.Reviewsareconductedtogaininsightfrom stakeholders, and this feedback is quickly incorporated into the deliverable from the sprint. In addition, team members conduct a retrospective to discuss the process of completing the sprint. Simple, brief meetings are held to discover what worked, what did not work, and what plans the team has for improved team functioning.

• Transparency.Everyoneinvolvedinaprojectisawareofwhoisworkingonwhatandtheirprogress.Thescrummethoduses shared visual tools to track the activities, making it easier to know the status of all parts of the complex project. These tools can be as simple as a whiteboard or wall divided into categories, with post-it notes containing small tasks. Figure 1 gives an example of a scrum board

There are many more aspects to the scrum methodology that can be found in, e.g., Schwaber (2004) and Sutherland and Sutherland (2014).

Figure 1: An example scrum board

Page 32: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

32 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 asqstatdiv.org

Agile Teams: A Look at Agile Project Management Methods

When Agile Works BestAgile project management is the most applicable when a project is unstructured and the outcome is unknown or unknowable. An often used analogy is an artist developing a painting such as Da Vinci and the Mona Lisa. Using a traditional project management approach, the artist would be required to conceive of the entire completed work prior to painting. Then he would complete sequential portions of the painting as illustrated in Figure 2.

For many projects, it is difficult or impossible to know what the outcome will be at the onset of a project. Planning and budgeting are difficult, and the project scope and outcomes evolve as the team members gain more information and knowledge. Continuing with our Mona Lisa example, Figure 3 illustrates an Agile approach to creating the masterpiece.

Figure 2: Traditional Project Management. Reprinted with permission from Patton (2008), Illustration is based on concepts originally presented in Armitage (2004)

Figure 3: Agile Project Management. Reprinted with permission from Patton (2008), Illustration is based on concepts originally presented in Armitage (2004)

Is Agile New?No. The principles underlying Agile draw heavily from project management methods that have been around for decades including Lean, Six Sigma, and Total Quality Management.

Agile vs. Lean. The underlying principles of Agile project management seem to draw heavily from the Lean philosophy which focuses on creating better customer value while minimizing waste. Both Lean and Agile emphasize fast deliverables. The emphasis of Lean is on reducing waste and unnecessary steps; however, Agile emphasizes breaking large tasks into small ones and delivering in short sprints. Both Lean and Agile use some sort of an action loop. In Lean, this is the build-measure-learn cycle, while Agile’s scrum methodology uses an iterative sprint approach. In addition, both Lean and Agile were developed in specific functional areas (manufacturing and software development, respectively), but their implementation quickly spread to other functional areas. Interestingly, many practitioners and consultants have substantially blurred the lines between the use of Lean and Agile methods, using an Agile approach for smaller, more focused teams, and using a Lean approach to integrate their projects into enterprise level solutions (Woods, 2012).

Page 33: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 33

Agile Teams: A Look at Agile Project Management Methods

Agile vs. Six Sigma. The underlying philosophies behind Agile and Six Sigma differ quite a bit. Six Sigma is a set of data-driven methodologies (statistical and graphical tools as well as project management approaches) designed to reduce variation in a well-defined process. Fundamental to the Six Sigma approach is the DMAIC (Define, Measure, Analyze, Improve, Control) cycle. Like both Lean and Agile, organizational commitment is required for success, particularly from top management and project champions. Six Sigma emphasizes clear measurable objectives and decision making based on verifiable data. Compared to Six Sigma, Agile methodologies appear loose and unstructured. Agile enthusiasts, however, argue that the Agile approach encourages innovation and creativity while strict adherence to structured problem solving approaches such as Six Sigma stifles innovation and creativity. Agile methods are evolutionary and are most applicable for unstructured projects when the outcome is difficult to imagine. Six Sigma, on the other hand, works well in a more structured environment (e.g. manufacturing), when improvement of a specific process is the goal of the project.

Design for Six Sigma (DFSS) is a process methodology that is related to Six Sigma, and is more readily applied in service industries. DFSS emphasizes using a Six Sigma tools for designing products or solutions. The structure behind DFSS project management varies quite a bit across applications and industries. Although both DFSS and Agile are methodologies for designing products and solutions (rather than improving existing processes), the underlying principles differ fundamentally. DFSS emphasizes designing quality into a product and getting it “right the first time.” Agile, on the other hand, emphasizes an “inspect and adapt” approach where small releases are delivered, inspected, modified, and rereleased. To many traditionalists, the Agile approach may seem ill-advisedwhencomparedtotheDFSS“rightthefirsttime”methodology.However,wesuggestthatbothmethodshavetheirplace, and the use of one method in some circumstances does not negate the use of the other in another circumstance. If a product or process can be well-defined, there is a suitable history detailing customer expectations, and the goal and specifications are known in advance, a structured DFSS methodology will work best. If, however, the product or process is difficult to define, there is limited customer history, and specifications are undefined, incorrect, or have changed dramatically, Agile methods will prove more flexible and prevent time wasted building a perfect solution to the wrong problem.

Total Quality Management. The philosophies underlying Total Quality Management (TQM) are generally attributed to Deming’s work, including his 14 Points and System of Profound Knowledge (SoPK) (see, e.g., Deming 1986). Miller and Krehbiel (2016) provide a detailed comparison of Agile methods to Deming’s 14 points and SoPK. We will not replicate this comparison here, but simply restate their main conclusions.

The Agile philosophy underscores several of the 14 Points, but remains silent on others. Through its focus on iterative development andinspect/adaptapproach,theAgilephilosophyaddressespoints#3(Ceasedependenceonmassinspection)and#4(Endthepractice of awarding business on the basis of price tag). Through its focus on cross-functional teams, iterative development, retrospectiveevaluation,andtransparency,theAgilephilosophyaddressespoints#8(Driveoutfear),#9(breakdownbarriersbetweendepartments),#12(Removebarriersthatrobpeopleofprideofworkmanship),and#13(Encourageeducationandself-improvement).

Miller and Krehbiel (2016) noted that in comparison to the TQM philosophy, Agile is weak in a systems focus and understanding ofvariationinasystem.However,theynotedthatAgilemethodswereparticularlystrongintermsofpromotinginteractionsamong people and understanding the need for intrinsic motivation.

The Power of the Post-ItOrganizationshavehaddifferinglevelsofsuccessinintegratingtheproblemsolvingmethodologiesdiscussedabove.Eachofthe methods shares several common traits, but they all differ to some degree in their underlying philosophies and how they are implemented. We see tremendous value that can be gained by the use of Agile methods along with existing project management frameworks. Although Agile lacks a systems focus, the Agile principles apply directly to managing smaller projects within enterprise-level initiatives. Analytics and data science projects are often exploratory in nature, require cross-functional teams to work together, and the scope is often developed through team discovery. Thus, we see Agile methods as particularly suited to moving analytics and data science projects forward, preventing backlogs and roadblocks that can occur due to uncertainty and poor communication.

Page 34: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

34 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 asqstatdiv.org

Agile Teams: A Look at Agile Project Management Methods

We believe that analytics and data science projects can produce higher-quality deliverables by using scrum methodologies. As noted earlier, three of the guiding principles of scrum are Divide and Conquer, Inspect and Adapt, and Transparency. Dividing complex projects into sprints, i.e., simpler pieces that can be completed in a short period of time by smaller focused teams, produces preliminary results quickly (divide and conquer).Oncethesmallerpiecesarecompletedandresultssharedwiththelargerteamandother stakeholders, the rapid feedback on the newly discovered knowledge can help guide the next step of the discovery process (inspect and adapt). At all times, everyone involved in a project needs to be aware of who is working on what and their progress (transparency).

In our opinion, one of the most valuable principles of the scrum methodology is that of transparency. Visual project management tools such as the scrum board (see Figure 1), when used and maintained, provide a central source of communication for a team. In our experience, the use of post-it notes on a scrum board for brainstorming necessary tasks, determining the level of time or expertise involved in completing the tasks, and tracking team progress is a surprisingly powerful tool in breaking down barriers and roadblocks. Those who have facilitated process improvement teams have likely used a similar approach to project management for many years. As with many methodologies and even entire fields, these tried and true methods have been tweaked and rebranded. For better or for worse, the branding of Agile project management is a highly desired skill set in many industries.

Through our work with students on experiential learning projects in analytics and data science, we have seen the “power of the post-it” quickly transform teams from frustration, duplicated efforts, and miscommunication to well-functioning and successful teams.OurCenterforAnalyticsandDataScienceatMiamiUniversityisembracingmanyoftheAgilephilosophies.Weteachthese along with project framing, statistical, and technical skills related to data. It is an understatement to say that our students are highly sought after and have no difficulties attaining employment upon graduation. Lessons of teamwork, project management, communication skills, and importance of transparency and reproducibility are invaluable to our future analytics workforce. Although the Agile principles do not provide a comprehensive system-wide framework for quality improvement, they do promote innovation, responsiveness, and transparency in solving unstructured problems. We believe Agile methods can be important tools in the toolbox of many industrial statisticians.

ReferencesAustin-Walker, D., & Kerr, J. (2015). The Agile and Lean Mindset. Digital Transformation Conference, May 21, 2015.Beck,K.,Beedle,M.,Bennekum,A.,Cockburn,A.,Cunningham,W.,Fowler,M.,Grenning,J.,Highsmith,J.,Hunt,A.,Jeffries,R.,Kern,J.,

Marick, B. Martin, R., Mellor, S., Schwaber, K., Sutherland, J., Thomas, D. (2001). Manifesto for Agile Software Development. Retrieved July 18, 2016, from http://www.agilemanifesto.org/.

Charette, R. N. (2005). Why Software Fails. Retrieved July 18, 2016, from http://spectrum.ieee.org/computing/software/why-software-fails.Deming, W. E. (1986). Out of the Crisis, Massachusetts Institute of Technology. Center for advanced engineering study, Cambridge, MA.MacCormack,A.(2001).Product-developmentpracticesthatwork:HowInternetcompaniesbuildsoftware.MIT Sloan Management Review,

42(2), 75.MacCormack, A. (2001a). Why Evolutionary Software Development Works. Retrieved July 18, 2016, from http://hbswk.hbs.edu/item/why-

evolutionary-software-development-works.Miller, D. P., and Krehbiel, T. C. (2016). Was Deming Agile? Looking at Information Technology Frameworks and Practices. Decision Sciences

Institute Annual Meeting, Nov. 21-24, 2016.Schwaber, K. (2004). Agile project management with Scrum. Microsoft press.Scrum. 2011. In Merriam Webster’s Ninth New Collegiate Dictionary, Springfield, MA.Sutherland, J. , and Sutherland, J.J. (2014). Scrum: the Art of Doing Twice the Work in Half the Time. Crown Business.Takeuchi,H.,&Nonaka,I.(1986).TheNewNewProductDevelopmentGame.Harvard Business Review, 64(1), 137-146.VersionOne(2014).8thAnnualSurveyofAgile.RetrievedJuly18,2016fromhttps://www.versionone.com/pdf/2013-state-of-agile-survey.pdf.Woods, D. (2012). Why Lean and Agile Go Together. Retrieved July 18, 2016, from http://www.forbes.com/2010/01/11/software-lean-

manufacturing-technology-cio-network-agile.html.

Page 35: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 35

Agile Teams: A Look at Agile Project Management Methods

About the AuthorsTimothyC.KrehbielisProfessorofManagementintheFarmerSchoolofBusinessatMiamiUniversity.Hehaswonnumerousteachingawards including MBA Professor of the Year on three different occasions and the prestigious Instructional Innovation Award from the Decision Sciences Institute. Dr. Krehbiel’s current research focus is on quality management systems and methodologies including Lean, Agile,andSixSigma.HisworkappearsinnumerousjournalsincludingASQ’sQuality Management Journal and Quality Progress, and he has co-authored three statistics textbooks: Basic Business Statistics, Statistics for Managers Using Microsoft Excel, and Business Statistics: A First Course. Dr. Krehbiel earned his PhD in statistics from the University of Wyoming and is a Senior Member of ASQ.

Dr. L. Allison Jones-Farmer is the Van Andel Chair of Analytics, the founding director of the Miami University Center for Analytics and DataScience,andaProfessorintheDepartmentofInformationSystemsandAnalyticsatMiamiUniversityinOxford,Ohio.Herresearchfocuses on developing practical methods for analyzing data in industrial and business settings. She is on the editorial review board of Journal of Quality Technology, a former Associate Editor of Technometrics, and was recently awarded the Lloyd Nelson Award (2014) for the paper in Journal of Quality Technology with the most immediate impact to practitioners. Dr. Jones-Farmer enjoys developing innovative curricula and teaching analytics and statistics to both undergraduate and graduate students. She is the 2005 recipient of the Colonial Company Teaching ExcellenceAward,the2008recipientoftheMBATeacheroftheYearAward,andthe2012recipientoftheOutstandingTeachingAwardatAuburnUniversity.PriortojoiningMiamiUniversityofOhio,Dr.Jones-FarmerservedonthefacultyatUniversityofMiamiinCoralGables, Florida, and at Auburn University in Auburn, Alabama. She is a Senior Member of ASQ.

Page 36: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

36 ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 asqstatdiv.org

Upcoming Conference Calendar25th Annual Quality Audit Conference20–21October2016Memphis, TNwww.asqauditconference.org

Technological innovation matched with innovative auditing approaches and techniques, significant changes in evolving standards, new foundations and new synergies between disciplines—all present New Horizons for the quality professional of the 21st century. With these new horizons, come new challenges and opportunities for auditing. This conference focuses on the auditor’s role in identifying quality issues to improve performance, processes and systems, and controlling risk specific to new horizons in auditing.

72nd Annual Deming Conference on Applied Statistics5–9December2016Atlantic City, NJhttp://www.demingconference.com/

The purpose of the 3-day Deming Conference on Applied Statistics is to provide a learning experience on recent developments in statistical methodologies, stressing biopharmaceutical applications. The conference is followed by two parallel 2-day short courses. The conference is composed of twelve three-hour tutorials on current applied statistical topics.

Lean and Six Sigma Conference27–28February2017Phoenix, AZhttp://asq.org/conferences/six-sigma/about.html

Do you have technical proficiencies and leadership responsibilities within your organization? Are you actively involved in process improvement, organizational change, and development dynamics related to a successful lean and Six Sigma culture? This conference is for you!

2017 Word Conference on Quality and Improvement1–3May2017Charlotte, NCwww.asq.org./wcqi

ASQ’s World Conference on Quality and Improvement has a 70-year tradition of educating, engaging, connecting, and inspiring leading professionals from around the globe. Each year thousands gather to share best practices, expand their network and further develop their professional growth. The theme was chosen as a way of centering on current and future business leaders and the growth they seek to better influence the work they do, organizations they work for, and lives they lead. The body of tools, techniques, and methods that aid in this is ever growing. The conference sessions will feature thought leaders and knowledge that best demonstrate the successes, tested solutions, and proven results these disciplines can bring.

Joint Statistical Meeting29July–3August,2017Baltimore, MDhttp://www.amstat.org/meetings/jsm.cfm

JSM (the Joint Statistical Meetings) is the largest gathering of statisticians held in North America. The JSM program consists not only of invited, topic-contributed, and contributed technical sessions, but also poster presentations, roundtable discussions, professional development courses and workshops, award ceremonies, and countless other meetings and activities.

Page 37: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

asqstatdiv.org ASQ Statistics DIVISION NEWSLETTER Vol. 35, No. 3, 2016 37

Statistics Division Committee Roster 2016

CHAIR Theresa [email protected]

CHAIR-ELECTHerb [email protected]

TREASURER Mindy [email protected]

SECRETARYGary Gehring [email protected]

PAST CHAIR Adam [email protected]

Operations

OPERATIONS CHAIRJoel [email protected]

MEMBERSHIP CHAIRGary Gehring [email protected]

VOICE OF THE CUSTOMER CHAIRJoel [email protected]

CERTIFICATION CHAIRBrian [email protected]

STANDARDS CHAIRMark [email protected]

Member Development

MEMBER DEVELOPMENT CHAIRMindy Hotchkiss [email protected]

OUTREACH/SPEAKER LIST CHAIR Steve Schuelka [email protected]

EXAMINING CHAIRDaksha [email protected]

Content

CONTENT CHAIRAmy Ste. Croix [email protected]

NEWSLETTER EDITORMatthew [email protected]+49-152-05421794

WEBINAR COORDINATORAshley [email protected]

SOCIAL MEDIA MANAGERBrian [email protected]

WEBSITE AND INTERNET LIAISONLandon [email protected]

STATISTICS BLOG EDITORGordon [email protected]

STATISTICS DIGEST REVIEWER AND MEMBERSHIP COMMUNICATIONS COORDINATORAlex [email protected]

Awards

AWARDS CHAIR Scott [email protected]

OTT SCHOLARSHIP CHAIRLynne [email protected]

FTC STUDENT/EARLY CAREER GRANTSJennifer [email protected]

HUNTER AWARD CHAIRJoel [email protected]

NELSON AWARD CHAIROpen

BISGAARD AWARD CHAIRScott [email protected]

YOUDEN AWARD CHAIRAdam [email protected]

Conferences

WCQI/TCC CONFERENCEGordon [email protected]

FTC STEERING COMMITTEEPeter Parker [email protected]

FTC PROGRAM REPRESENTATIVEMindy Hotchkiss [email protected]

FTC SHORT COURSE CHAIR Yongtao [email protected]

Auditing

AUDIT CHAIR Steve [email protected]

By-Laws

BY-LAWS CHAIRAdam [email protected]

Nominating

NOMINATING CHAIR Adam [email protected]

Planning

PLANNING CHAIR Theresa [email protected]

APPOINTED

OFFICERS

Page 38: STATISTICS DIGEST - ASQasq.org/statistics/2016/10/statistics-digest-october-2016-.pdf · Technology (NIST), the National Metrology Institute for the United States. Generic types of

The ASQ Statistics Division Newsletter is published three times a year by the Statistics Division of the AmericanSociety for Quality.

All communications regarding thispublication, EXCLUDING CHANGE OF ADDRESS, should be addressed to:

Matthew Barsalou Editoremail: [email protected]

Othercommunicationsrelatingtothe ASQ Statistics Division should be addressed to:

Theresa I. Utlaut Division Chairemail: [email protected]: (503) 613-7763

Communications regarding change of address should be sent to ASQ at:

ASQP.O.Box3005Milwaukee, WI 53201-3005

This will change the address for all publications you receive from ASQ. You can also handle this by phone (414) 272-8575 or (800) 248-1946.

Upcoming NewsletterDeadlines for Submissions

Issue Vol. No. Due DateJanuary 36 1 December 16

ASQ Statistics Division

VISIT THE STATISTICS DIVISION WEBSITE:www.asq.org/statistics

ASQ Periodicals with Applied Statistics content

Journal of Quality Technologyhttp://www.asq.org/pub/jqt/

Quality Engineeringhttp://www.asq.org/pub/qe/

Six Sigma Forumhttp://www.asq.org/pub/sixsigma/

STATISTICS DIVISION RESOURCES:

LinkedIn Statistics Division Grouphttps://www.linkedin.com/groups/ASQ-Statistics-Division-2115190

Scan this to visit our LinkedIn group!

Connect now by scanning this QR code with a smartphone (requires free QR app)

Check out our YouTube channel at youtube.com/

asqstatsdivision

When professional statisticians analyze data, we quickly learn to check both the context and the data themselves for homogeneity. We learn to look for anomalies. Based on these somewhat subliminal clues, experienced analysts will avoid pitfalls that would trap the unwary.

Dr. Donald J. Wheeler in Dirk Dusharme’s “An Interview with Donald J. Wheeler.” Quality Digest. 04 May 2011