extreme metrics analysis for fun and profit paul below
TRANSCRIPT
Extreme Metrics Analysis for Fun and ProfitExtreme Metrics Analysis for Fun and Profit
Paul BelowPaul Below
February 20, 2003 2
AgendaAgenda
• Statistical Thinking
• Metrics Use: Reporting and Analysis
• Measuring Process Improvement
• Surveys and Sampling
• Organizational Measures
February 20, 2003 3
“Experiments should be reproducible. They should all fail in the same way.”
AgendaAgenda
Statistical Thinking
February 20, 2003 4
Statistical ThinkingStatistical Thinking
• You already use it, at home and at work
• We generalize in everyday thinking
• Often, our generalizations or predictions are wrong
February 20, 2003 5
Uses for StatisticsUses for Statistics
• Summarize our experiences so others can understand
• Use information to make predictions or estimates
• Goal is to do this more precisely than we would in everyday conversation
February 20, 2003 6
Listen for QuestionsListen for Questions
• We are not used to using numbers in professional lives
– “What does this mean?”
– “What should we do with this?”
• We need to take advantage of our past experience
February 20, 2003 7
Statistical ThinkingStatistical Thinking
is more important than methods or technologyAnalysis is iterative, not one shot
Data
Model
Induction
Deduction
(Modification of Shewhart/Deming cycle byGeorge Box, 2000 Deming lecture, Statistics for Discovery)
I I
D D
Learning
February 20, 2003 8
"It ain't so much the things we don't know that get us in trouble. It's the things we know that ain't so." Artemus Ward, 19th Century American Humorist
AgendaAgenda
Metrics Use: Reporting and Analysis
February 20, 2003 9
Purpose of MetricsPurpose of Metrics
• The purpose of metrics is to take action. All types of analysis and reporting have the same high-level goal: to provide information to people who will act upon that information and thereby benefit.
• Metrics offer a means to describe an activity in a quantitative form that would allow a knowledgeable person to make rational decisions. However,
– Good statistical inference on bad data is no help.
– Bad statistical analysis, even on the right variable, is still bad statistics.
February 20, 2003 10
Therefore…Therefore…
• Metrics use requires implemented processes for:
– metrics collection,
– reporting requirements determination,
– metrics analysis, and
– metrics reporting.
February 20, 2003 11
Types of Metrics UseTypes of Metrics Use
“You go to your tailor for a suit of clothes and the first thing that he does is make some measurements; you go to your physician because you are ill and the first thing he does is make some measurements. The objects of making measurements in these two cases are different. They typify the two general objects of making measurements. They are: (a) To obtain quantitative information (b) To obtain a causal explanation of observed phenomena.”
Walter Shewhart
February 20, 2003 12
The Four Types of AnalysisThe Four Types of Analysis
1. Ad hoc: Answer specific questions, usually in a short time frame. Example: Sales support
2. Reporting: Generate predefined output (graphs, tables) and publish or disseminate to defined audience, either on demand or on regular schedule.
3. Analysis: Use statistics and statistical thinking to investigate questions and reach conclusions. The questions are usually analytical (e.g., “Why?” or “How many will there be?”) in nature.
4. Data Mining: Data mining starts with data definition and cleansing, followed by automated knowledge extraction from historical data. Finally, analysis and expert review of the results is required.
February 20, 2003 13
Body of Knowledge (suggestions)Body of Knowledge (suggestions)
• Reporting
– Database query languages, distributed databases, query tools, graphical techniques, OLAP, Six Sigma Green Belt (or Black Belt), Goal-Question-Metric
• Analysis
– Statistics and statistical thinking, graphical techniques, database query languages, Six Sigma black belt, CSQE, CSQA
• Data Mining
– Data mining, OLAP, data warehousing, statistics
February 20, 2003 14
Analysis Decision TreeAnalysis Decision Tree
Enumerative Analytical
Ad hoc Reporting Analysis Data Mining and Analysis
One Time?Yes No
Type of Question?
Factors Analyzed:Few Many
February 20, 2003 16
Extreme AnalysisExtreme Analysis
• Short deadlines, small releases
• Overall high level purposes defined up front, prior to analysis start
• Specific questions prioritized prior to analysis start
• Iterative approach with frequent stakeholder reviews to obtain interim feedback and new direction
• Peer synergy, metrics analysts work in pairs.
• Advanced query and analysis tools, saved work can be reused in future engagements
• Data warehousing techniques, combining data from multiple sources where possible
• Data cleansing done prior to analysis start (as much as possible)
• Collective ownership of the results
February 20, 2003 17
Extreme Analysis TipsExtreme Analysis Tips
Produce clean graphs and tables displaying important information. These can be used by various people for multiple purposes. Explanations should be clear, organization should make it easy to find information of interest. However,
It takes too long to analyze everything -- we cannot expect to produce interpretations for every graph we produce. And even when we do, the results are superficial because we don't have time to dig into everything.
"Special analysis", where we focus in on one topic at a time, and study it in depth, is a good idea. Both because we can complete it in a reasonable time, and also because the result should be something of use to the audience.
Therefore, ongoing feedback from the audience is crucial to obtaining useful results
February 20, 2003 18
Measuring Process Improvement
“Is there any way that the data can show improvement when things aren’t improving?” -- Robert Grady
AgendaAgenda
February 20, 2003 19
Measuring Process ImprovementMeasuring Process Improvement
• Analysis can determine if a perceived difference could be attributed to random variation
• Inferential techniques are commonly used in other fields, we have used them in software engineering for years
• This is an overview, not a training class
Expand our Set of TechniquesExpand our Set of Techniques
Metrics are used for:
• Benchmarking
• Process improvement
• Prediction and trend analysis
• Business decisions
• …all of which require confidence analysis!
February 20, 2003 21
Is This a Meaningful Difference?Is This a Meaningful Difference?
1.0
0.5
1.5
2.0
0
1 2 3
CMM Maturity Level
Re
lative
Pe
rform
ance
February 20, 2003 22
Pressure to Product ResultsPressure to Product Results
“If you torture the data long enough, it will confess.” -- Ronald Coase
• Why doesn’t the data show improvement?
• “Take another sample!”
• Good inference on bad data is no help
February 20, 2003 23
Types of StudiesTypes of Studies
• Anecdote: “I heard it worked once”, cargo cult mentality
• Case Study: some internal validity
• Quasi-Experiment: can demonstrate external validity
• Experiment: can be repeated, need to be carefully designed and controlled
Anecdote Case Study Quasi-experimental Experiment
February 20, 2003 24
Attributes of ExperimentsAttributes of Experiments
• Random Assignment
• Blocked and Unblocked
• Single Factor and Multi Factor
• Census or Sample
• Double Blind
• When you really have to prove causation (can be expensive)
Subject Treatment Reaction
February 20, 2003 25
Limitations of Retrospective StudiesLimitations of Retrospective Studies
• No pretest, we use previous data from similar past projects
• No random assignment possible
• No control group
• Cannot custom design metrics (have to use what you have)
February 20, 2003 26
Quasi-Experimental DesignsQuasi-Experimental Designs
• There are many variations
• Common theme is to increase internal validity through reasonable comparisons between groups
• Useful when formal experiment is not possible
• Can address some limitations of retrospective studies
February 20, 2003 27
Causation in Absence of ExperimentCausation in Absence of Experiment
• Strength and consistency of the association
• Temporal relationship
• Non-spuriousness
• Theoretical adequacy
February 20, 2003 28
What Should We Look For?What Should We Look For?
Some information to accompany claims:•measure of variation•sample size•confidence intervals•data collection methods used•sources•analysis methods
Are the Conclusions Warranted?
February 20, 2003 29
Decision Without AnalysisDecision Without Analysis
• Conclusions may be wrong or misleading
• Observed effects tend to be unexplainable
• Statistics allows us to make honest, verifiable conclusions from data
February 20, 2003 30
Types of Confidence AnalysisTypes of Confidence Analysis
C orre la tion
Q u an tita tive
Tw o-W ay Tab les
C ateg orica l
V ariab les
February 20, 2003 31
Two Techniques We Use FrequentlyTwo Techniques We Use Frequently
• Inference for difference between two means
– Works for quantitative variables
– Compute confidence interval for the difference between the means
• Inference for two-way tables
– Works for categorical variables
– Compare actual and expected counts
February 20, 2003 32
Quantitative VariablesQuantitative Variables
119120120119N =
Project Productivity
ISBSG release 6
Quartiles of Project Size
4321
AF
P p
er
ho
ur
1.0
.9
.8
.7
.6
.5
.4
.3
.2
.1
0.0
Comparison of means of quartiles 2 and 4 yields p value of 88.2%, not a significant difference at 95% level)
February 20, 2003 33
Categorical VariablesCategorical Variables
EffortVariance
Low PM MediumPM
High PM
Met 3 6 7
Not Met 9 10 9
P value is approximately 50%
February 20, 2003 34
Categorical VariablesCategorical Variables
DateVariance
Low PM MediumPM
High PM
Met 2 10 13
Not Met 10 6 3
P value is greater than 99.9%
February 20, 2003 35
Expressing the Results “in English”Expressing the Results “in English”
• “We are 95% certain that the difference in average productivity for these two project types is between 11 and 21 FP/PM.”
• “Some project types have a greater likelihood of cancellation than other types, we would be unlikely to see these results by chance.”
February 20, 2003 36
What if...What if...
• Current data is insufficient
• Experiment can not be done
• Direct observation or 100% collection cannot be done
• or, lower level information is needed?
February 20, 2003 37
Surveys and Samples
In a scientific survey every person in the population has some known positive probability of being selected.
AgendaAgenda
February 20, 2003 38
What is a Survey?What is a Survey?
• A way to gather information about a population from a sample of that population
• Varying purposes
• Different ways:
– telephone
– internet
– in person
February 20, 2003 39
What is a Sample?What is a Sample?
• Representative fraction of the population
• Random selection
• Can reliably project to the larger population
February 20, 2003 40
What is a Margin of Error?What is a Margin of Error?
• An estimate from a survey is unlikely to exactly equal to quantity of interest
• Sampling error means results differ from a target population due to “luck of the draw”
• Margin of error depends on sample size and sample design
February 20, 2003 41
What Makes a Sample Unrepresentative?What Makes a Sample Unrepresentative?
• Subjective or arbitrary selection
• Respondents are volunteers
• Questionable intent
February 20, 2003 42
How Large Should the Sample Be?How Large Should the Sample Be?
• What do you want to learn?
• How reliable must the result be?
– Size of population is not important
– 1500 people is reliable enough for entire U.S.
• How large CAN it be?
February 20, 2003 43
“Dewey Defeats Truman”“Dewey Defeats Truman”
• Prominent example of a poorly conceived survey
• 1948 pre-election poll
• Main flaw: non-representative sample
• 2000 election: methods not modified to new situation
February 20, 2003 44
Is Flawed Sample the Only Type of Problem That Happens?Is Flawed Sample the Only Type of Problem That Happens?
• Non-response
• Measurement difficulties
• Design problems, leading questions
• Analysis problems
February 20, 2003 45
Some RemediesSome Remedies
• Stratify sample
• Adjust for incomplete coverage
• Maximize response rate
• Test questions for
– clarity
– objectivity
• Train interviewers
February 20, 2003 46
Organizational Measures
“Whether measurement is intended to motivate or to provide information, or both, turns out to be very important.” -- Robert Austin
AgendaAgenda
February 20, 2003 47
Dysfunctional MeasuresDysfunctional Measures
• Disconnect between measure and goal
– Can one get worse while the other gets better?
• Is one measure used for two incompatible goals?
• The two general types of measurement are...
February 20, 2003 48
Measurement in OrganizationsMeasurement in Organizations
• Motivational Measurements
– intended to affect the people being measured, to provoke greater expenditure of effort in pursuit of org’s goals
• Informational Measurements
– logistical, status, or research information, provide insight to provide short term management and long term improvement
February 20, 2003 49
Informational MeasurementsInformational Measurements
• Process Refinement Measurements
– reveals detailed structure of processes
• Coordination Measurements
– logistical purpose
February 20, 2003 50
Mixed MeasurementsMixed Measurements
• “Dashboard” concept is incomplete
• We have Gremlins
The desire to be viewed favorably provides an incentive for people being measured to tailor, supplement, repackage, or censor information that flows upward.
February 20, 2003 51
The Right Kind of CultureThe Right Kind of Culture
• Ask yourself what is driving the people around you to do a good job:
– Do they identify with the organization and fellow team members? (Work hard to avoid letting coworkers down)
– Are they only focused on the next performance review and getting a big raise?
Internal or external motivation?
February 20, 2003 52
Why is this important?Why is this important?
• Each of us makes dozens of small decisions each day
– Motivational measures influence us
– These small decisions add up to large impacts
• Are these decisions aligned with the organization’s goals?
February 20, 2003 53
Conclusion: It Has Been DoneConclusion: It Has Been Done
• There are organizations in which people have given themselves completely to pursuit of organizational goals
• These people want measurements as a tool that helps get the job done
• If this is your organization, fight hard to keep it
February 20, 2003 54
A Few Selected Resources:A Few Selected Resources:
• Measuring and Managing Performance in Organizations, Robert D. Austin, 1996.
• Schaum’s Outlines: Business Statistics, Leonard J. Kazmier, 1996.
• International Software Benchmarking Standards Group, http://www.isbsg.org.au
• American Statistical Association, http://www.amstat.org/education/Curriculum_Guidelines.html
• Graphical techniques books by E. Tufte
• Contact a statistician for help