forecasting accuracy and cognitive bias in the analysis of competing hypotheses
DESCRIPTION
Analysis of Competing Hypotheses (ACH) is an analytic methodology used in the United States Intelligence Community to aid qualitative analysis. Taking into consideration what previous studies found, an experiment was conducted testing the methodology’s estimative accuracy as well as its ability to mitigate cognitive phenomena which hinder the analytical process. The findings of the experiment suggest ACH can improve estimative accuracy, is highly effective at mitigating some cognitive phenomena such as confirmation bias, and is almost certain to encourage analysts to use more information and apply it more appropriately. However, the results suggest that ACH may be less effective for an analytical problem where the objective probabilities of each hypothesis are nearly equal. Given these findings, future studies should focus less on the question of ACH’s general efficacy, but instead should aim to expand our understanding of when the methodology is most appropriate to use.TRANSCRIPT
FORECASTING ACCURACY AND COGNITIVE BIAS IN THE ANALYSIS OF COMPETING HYPOTHESES
ANDREW D. BRASFIELD
A Thesis
Submitted to the Faculty of Mercyhurst College
In Partial Fulfillment of the Requirements for
The Degree of
MASTER OF SCIENCEIN
APPLIED INTELLIGENCE
DEPARTMENT OF INTELLIGENCE STUDIESMERCYHURST COLLEGE
ERIE, PAMAY 2009
DEPARTMENT OF INTELLIGENCE STUDIESMERCYHURST COLLEGE
ERIE, PENNSYLVANIA
FORECASTING ACCURACY AND COGNITIVE BIAS IN THE ANALYSIS OF COMPETING HYPOTHESES
A ThesisSubmitted to the Faculty of Mercyhurst CollegeIn Partial Fulfillment of the Requirements for
The Degree of
MASTER OF SCIENCEIN
APPLIED INTELLIGENCE
Submitted By:
ANDREW D. BRASFIELD
Certificate of Approval:
_______________________________________Kristan J. WheatonAssistant ProfessorDepartment of Intelligence Studies
_______________________________________James G. BreckenridgeChair/Assistant ProfessorDepartment of Intelligence Studies
________________________________________Phillip J. BelfioreVice PresidentOffice of Academic Affairs
May 2009
Copyright © 2009 by Andrew D. BrasfieldAll rights reserved.
iii
DEDICATION
This work is dedicated to Melody and Dharma
for being patient with my busy schedule during the last two years.
iv
ACKNOWLEDGEMENTS
First, I would like to thank Professor Kris Wheaton for his guidance and advice during
this process over the last year.
I also would like to thank Professor James Breckenridge for taking the role of my
secondary reader.
I also owe thanks to Professor Stephen Marrin for helping me obtain various documents
pertinent to my literature review.
I would also like to thank Kristine Pollard for her technical assistance during this process,
and; without whom, I would not have been able to begin this process last summer.
I would also like to thank Hemangini Deshmukh for assisting in applying statistical
testing to the results of this thesis.
Lastly, I would like to thank Travis Senor for his assistance while conducting the
experiment.
v
ABSTRACT OF THE THESIS
FORECASTING ACCURACY AND COGNITIVE BIAS IN THEANALYSIS OF COMPETING HYPOTHESE
By
Andrew D. Brasfield
Master of Science in Applied Intelligence
Mercyhurst College, 2009
Assistant Professor Kristan J. Wheaton, Chair
[The Analysis of Competing Hypotheses (ACH) is an analytic methodology used
in the United States Intelligence Community to aid qualitative analysis. Taking into
consideration what previous studies found, an experiment was conducted testing the
methodology’s estimative accuracy as well as its ability to mitigate cognitive phenomena
which hinder the analytical process. The findings of the experiment suggest ACH can
improve estimative accuracy, is highly effective at mitigating some cognitive phenomena
such as confirmation bias, and is almost certain to encourage analysts to use more
information and apply it more appropriately. However, the results suggest that ACH may
be less effective for an analytical problem where the objective probabilities of each
hypothesis are nearly equal. Given these findings, future studies should focus less on the
question of ACH’s general efficacy, but instead should aim to expand our understanding
of when the methodology is most appropriate to use.]
vi
TABLE OF CONTENTS
Page
COPYRIGHT PAGE……………………………………………………………... iii
DEDICATION……………………………………………………………………. iv
ACKNOWLEDGEMENTS………………………………………………………. v
ABSTRACT………………………………………………………………………. vi
TABLE OF CONTENTS…………………………………………………………. vii
LIST OF TABLES………………………………………………………………... ix
LIST OF FIGURES………………………………………………………………. x
CHAPTER
1 INTRODUCTION………………………………… 1
2 LITERATURE REVIEW…………………….…… 5
Key Terms…………...…………………...... 5The Debate: StructuredV. Unstructured Methods…………………. 8
Structured Methods in Intelligence……....... 17Analysis of Competing Hypotheses………. 19
Strengths & Weaknesses………...... 24 Previous Studies…………………... 25
Hypotheses………………………… 28
3 METHODOLOGY……………………………….. 29
Research Design…………………………... 29Participants………………………... 29Procedures………………………… 31Control Group…………………...... 33Experimental Group………………. 36
Data Analysis……………………............... 36
vii
4 RESULTS………………………………………… 38
Accuracy………………………………...... 38Mindsets…………………………………... 39Confirmation Bias………………………… 42Other Findings of Interest………………… 44Summary of Results………………………. 46
5 CONCLUSION…………………………………… 47
BIBLIOGRAPHY………………………………………………………………… 53
APPENDICES……………………………………………………………………. 56
Appendix A: Experiment Sign-Up Forms………… 57 Appendix B: Experiment Consent Forms………… 58 Appendix C: Control & Experiment Group Tasking/Answer Sheets……………. 60 Appendix D: Participant Debriefing Statement………………………....... 66 Appendix E: Post-Experiment Questionnaires……………………… 67 Appendix F: SPSS Testing……………………...... 69
viii
LIST OF TABLES
Page
Table 4.1 Comparative Use of Evidence Between Groups…………………………………………… 44
ix
LIST OF FIGURES
Page
Figure 2.1 Example ACH Matrix…………………………… 21
Figure 3.1 Participant Education Level……………………... 30
Figure 3.2 Group Comparison by Class Year………………. 30
Figure 3.3 Participant Political Affiliation by Group………. 31
Figure 3.4 National Intelligence Council Words of Estimative Probability………………… 34
Figure 3.5 Experiment Words of Estimative Probability…………………………… 34
Figure 3.6 Continuum-like Scale……………………………. 35
Figure 4.1 Results for Accuracy……………………………. 38
Figure 4.2 Results for Mindsets……………………………. 40
Figure 4.3 Findings on Confirmation Bias………………… 42
Figure 4.4 SPSS Testing on Confirmation Bias……………. 43
Figure 4.5 Words of Estimative Probability by Group……………………………. 45
Figure 5.1 Graph of ACH’s Utility with Varying Objective Probabilities………………… 48
x
1
INTRODUCTION
In light of recent intelligence failures, such as Iraq’s alleged possession of
weapons of mass destruction (WMD), it is clear that the United States Intelligence
Community could improve the process it uses to reach analytic judgments. Traditionally,
such judgments are reached through intuitive thinking. However, one of the
recommendations of the Commission on the Intelligence Capabilities of the United States
Regarding Weapons of Mass Destruction was that “the [intelligence] community must
develop and integrate into regular use new tools to assist analysts in filtering and
correlating the vast quantities of information that threaten to overwhelm the analytic
process.”1 This statement represents the growing belief that structured methods can help
the United States Intelligence Community’s analytic capabilities reach the quality and
accuracy required by US policy makers.
One structured analytic method, the Analysis of Competing Hypotheses (ACH),
can potentially assist in the improvement of analysis in the US Intelligence Community.
In this structured technique, the scientific method is incorporated into the analytic process
by weighing multiple hypotheses in a matrix, evaluating all evidence for and against
each, and determining the likelihood of all possibilities by trying to disprove hypotheses.2
Researchers have found that this methodology can help "analysts overcome cognitive
1 United States Government - Commission on the Intelligence Capabilities of the United States Regarding Weapons of Mass Destruction. Report to the President of the United States, (Washington D.C., 2005), 402. <http://www.wmd.gov/report/> (Accessed 22 January 2009)2 Richards J. Heuer, Jr.,“Limits of Intelligence Analysis.” Orbis (Winter 2005): 92.
2
biases, limitations, mindsets, and perceptions...”3 In general, structured methods such as
ACH can offer a variety of potential benefits to intelligence analysis.
The primary benefit is the added element of the scientific method. This, in theory,
improves the quality and accuracy of analysis by imposing structure onto our limited, and
often flawed, cognitive processes. A secondary potential benefit to the intelligence
community is increased transparency and accountability. That is, structured methods
make the analytic process and end product easier to critique and evaluate. This is
important for both analysts and their supervisors so that mistakes and successes can more
easily be identified and understood for the improvement of future efforts. Likewise, in the
aftermath of intelligence analysis failures and successes, accountability is more certain.
Despite these potential benefits, there are some obstacles to the use of structured
methods in the US Intelligence Community. First, although there are over 200 analytic
methods available to intelligence analysts, exposure to these methods has been minimal.4
Because of this, it is likely most analysts in the US Intelligence Community are unaware
of the existence of methods that could aid their work, let alone have received training that
would enable them to use such methods.
The most difficult hurdle is an analytic culture predisposed to intuitive thinking
and skeptical of, if not hostile, to the notion of structured methods. One researcher notes
that this attitude is partly justified from the lack of empirical evidence suggesting
structured methods can improve intelligence analysis.5 According to Dr. Rob Johnston in
3 Diane Chido and Richard M. Seward, Jr., eds. Structured Analysis of Competing Hypotheses: Theory and Application (Mercyhurst College Institute of Intelligence Studies Press, 2006), 48.4 Rob Johnston, “Integrating Methodologists into Teams of Substantive Experts,” Studies in Intelligence,” Vol. 47. No. 1: 65.5 Stephen Marrin, “Intelligence Analysis: Structured Methods or Intuition?” American Intelligence Journal 25, no. 1 (Summer 2007): 10.
3
his ethnographic study of the US Intelligence Community’s analytic culture, he concludes
that empirical evidence is exactly what is needed:
The principal difficulty lies not in developing the methods themselves, but in articulating those methods for the purpose of testing and validating them and then testing their effectiveness throughout the community. In the long view, developing the science of intelligence analysis is easy; what is difficult is changing the perception of the analytic practitioners and managers and, in turn, modifying the culture of tradecraft.6
Hopefully, the quantitative data derived from this experimental study will offer insights
into the utility of structured methods and ACH specifically and challenge commonly held
assumptions within the US Intelligence Community.
Taking into account that previous studies on ACH have yielded mixed and
inconclusive results, the purpose of this study is to add to the small number of such
studies and shed further light on ACH’s utility and efficacy with intelligence analysis
problems in varying circumstances. Specifically, the primary goal of this study is to
evaluate the estimative accuracy of the methodology compared to intuitive analysis. A
secondary purpose, if possible, is to ascertain whether ACH can mitigate cognitive
phenomena that hinder our ability to think clearly and accurately. From the quantitative
data I collect, I hope to gain insight regarding the methodology’s usefulness for analysts
in the US Intelligence Community.
Unfortunately, there are some limitations to this study. These limitations pertain
to the number of relevant research questions that can be addressed, as well as
experimental conditions that are not ideal but impossible to avoid with the given
resources. While ACH offers numerous potential benefits to analysis, such as those
6 Rob Johnston, Analytic Culture in the US Intelligence Community (Washington D.C.: Center for the Study of Intelligence, 2005), 20-21.
4
related to hypothesis generation and its use in a team environment, the primary goals of
this experiment are to test the methodology’s accuracy and its ability to mitigate
cognitive biases. Designing experiment conditions to maximize the capacity to measure
these particular factors of interests at the expense of secondary research questions is a
necessary sacrifice.
Another limitation is available resources. The ideal participants for an
experimental study such as this one would be US Intelligence Community analysts who
are specifically experienced with ACH. Participants with these qualifications would
likely provide higher quality and more valid results. Although all participants using ACH
will have had some experience with the methodology, this study did not have access to a
participant pool with the ideal qualifications.
The nature and order of this study will be as such: First, the researcher will review
the existing body of literature pertinent to the topic, including important terms of
reference, the debate on the use of structured methods, as well as current and past use of
such methods in the US Intelligence Community. Next, the researcher will explain the
methodology for the experiment and the subsequent results. Finally, the researcher will
offer his final interpretation of the experiment results and postulate their implications for
the use of structured methods in the US Intelligence Community.
LITERATURE REVIEW
5
To fully understand the purpose and place of this study and its experiment, it is
necessary to review important concepts and debates relevant to the use of structured
analytical techniques in the US Intelligence Community. First, this chapter will define
and discuss key terms such as intelligence, structured methods, and intuition. Next, this
chapter will attempt to summarize the debate on the use of structured and unstructured
analytical methods from a variety of perspectives. These will include views from
cognitive psychology, experts from within the US Intelligence Community, and empirical
studies on the topic. Furthermore, a general description of the use of structured methods
in the US Intelligence Community will follow. This will include subsections on current
use, explanations for the non-use of structured methods, and finally an in-depth discourse
on ACH itself. This study’s hypotheses will emerge from the intersection of all these
elements.
Key Terms
While the definition of intelligence has been debated for some time, several key
characteristics are clear. Mark Lowenthal, in his book, Intelligence: From Secrets to
Policy, partly describes intelligence as a process where relevant information is
“requested, collected, analyzed, and provided to policy makers…”7 While this common
definition is accurate, it is missing a very important element that is integral to the purpose
of intelligence analysis. Robert M. Clarke points this out in Intelligence Analysis: A
Target-Centric Approach by simply stating, “Intelligence is about reducing uncertainty in
7 Mark M. Lowenthal, Intelligence: From Secrets to Policy (Washington D.C.: CQ Press, 2006), 9.
6
conflict.”8 Therefore, the ultimate purpose of intelligence analysis is estimating the nature
of current and future events. That is, using information to clarify the likelihood or nature
of these events for a policy maker.
From these concepts comes the Mercyhurst College Institute for Intelligence
Studies (MCIIS) definition of intelligence, which incorporates all of the above concepts
into a comprehensive, accurate definition which states, “[intelligence is] a process
focused externally, designed to reduce the level of uncertainty for a decision maker using
information derived from all sources.”9 While the debate continues and this definition is
not definitive, it will suffice in laying the intellectual groundwork for this research.
According to Robert D. Folker, “Quantitative intelligence analysis separates the
relevant variables of a problem for credible numerical measurement. Qualitative
intelligence analysis breaks down topics and ideas that are difficult to quantify into
smaller components for better understanding.”10 Within the US Intelligence Community,
quantitative and qualitative intelligence analysis is most commonly conducted with
unstructured methods.
One former CIA analyst, Stephen Marrin, defines structured analytic methods as
“those techniques which have a formal or structured methodology that is visible to
external observers.”11 From this, it is apparent that the key features of a structured
analytic method are that it is systematic in nature and is externalized from the human
8 Robert. M. Clark, Intelligence Analysis: A Target-Centric Approach (Washington D.C.: CQ Press, 2007), 8.9 Diane Chido, et al., 9.10 Robert D. Folker Jr. Intelligence Analysis in Theater Joint Intelligence Centers: An Experiment in Applying Structured Methods (Washington D.C.: Joint Military Intelligence College, Occasional Paper #7, 2000), 5; citing Robert M. Clark, Intelligence Analysis: Estimation and Prediction (Baltimore: American Literary Press, Inc., 1996), 30.11 Marrin, 7.
7
mind - typically in some visual format. This suggests that inherent in any systematic
method of analysis is the spirit of the scientific method, defined as “principles and
procedures for the systematic pursuit of knowledge involving the recognition and
formulation of a problem, the collection of data through observation and experiment, and
the formulation and testing of hypotheses.”12 In contrast, unstructured methods, which
lack such elements, are commonly referred to in intelligence as “intuitive analysis.”
Developing our understanding of these concepts is important because analysis is a
critical component of intelligence. Although much reform within the US national security
and intelligence infrastructure has focused on collection and dissemination of
intelligence, Folker states that “the root cause of many critical intelligence failures has
been analytical failure,” citing examples such as the North Korean invasion of South
Korea in 1950, the Tet Offensive in Vietnam, the fall of the Shah of Iran, and the
development of India’s nuclear program.13
However, the need to improve the analytic process is not unknown within the US
Government. As early as the 1940s and through the Cold War, numerous government
reports on intelligence, such as the Dulles-Jackson-Correa and Schlesinger reports,
recommended that government entities with an intelligence function take measures to
improve the analytic process and production of estimates.14 More recently, the US
Commission on the Roles and Capabilities of the United States Intelligence Community
specifically criticized the lack of resources allocated to “developing and maintaining
12 Merriam-Webster’s Collegiate Dictionary, 11th ed., s.v. “scientific method.” 13 Folker, 3-4.14 Congressional Research Service Report for Congress, Proposals for Intelligence Reorganization, 1949-2004. 2004, 6; United States Government, A Review of the Intelligence Community, (The Schlesinger Report) (1971), 44.
8
expertise among the analytical pool.”15 Amidst these recommendations, there is much
debate within the US Intelligence Community on how to improve analysis and whether or
not structured methods should be a part of that solution.
The Debate: Structured V. Unstructured Methods
There has been a longstanding debate inside and outside of the US Intelligence
Community over the use of structured and unstructured methods for analysis and decision
making. On one side are those who believe intuitive thinking is sufficient for problem
solving and that scientific methods are inadequate when addressing the same problems.
On the other side of the debate are those who argue that structured and scientific methods
can supplement intuitive thinking and improve its quality. This debate begins with
cognitive psychology and understanding how the simplest and most basic human thought
processes affect efforts at critical thinking.
The research of various psychologists suggests that limitations in human
cognition are inherent and can be detrimental to critical thinking. Specifically, the
research of Daniel Kahneman and Amos Tversky suggests that intuitive thinking can be
thought of as the mind’s shortcut mechanism to aid quick decision making. That is,
taking large amounts of ambiguous and sometimes contradictory information in quick
succession and assimilating that into a succinct explanation of the information being
perceived. Despite its utility in situations requiring this ability, such as deciding whether
to run from a perceived threat or stand and fight, these simplified and more efficient
15United States Government - U.S. Commission on the Roles and Capabilities of the United States Intelligence Community, Preparing for the 21st Century: An Appraisal of U.S. Intelligence (Washington, D.C., 1996), 83.
9
cognitive processes are also inherently subject to a higher number of judgmental errors.16
These judgmental errors are believed to be caused by cognitive biases, defined as “mental
errors caused by our simplified information processing strategies.”17 In Intuition: Its
Powers and Perils, David G. Myers elaborates on these specific advantages and
judgmental errors which result from intuitive thinking. The simple advantage offered by
intuition is the ability to quickly and efficiently process large quantities of information.18
In Blink: The Power of Thinking Without Thinking, Malcolm Gladwell argues for
our ability to use this, which he calls “thin-slicing.”19 Gladwell not only advocates the use
of intuitive thinking, but also argues that it can be just as effective as, if not superior, to
scientific methods of analysis. To support his assertions, Gladwell provides handfuls of
real-life examples that seemingly demonstrate the efficacy of intuition, as well as the
findings of some scientific studies. However, his own discussion on the fallibility of
intuition to cognitive biases undermines his own argument.
While speed and efficiency are two advantages of intuitive thinking, inherent
limitations in human cognition are its Achilles’ heel. Summing up the research of Herbert
Simon, Richards J. Heuer, Jr. explains the use of mindsets in human cognition:
Because of limits in human mental capacity, he argued the mind cannot cope directly with the complexity of the world. Rather, we construct a simplified mental model of reality and then work with this model. We behave rationally within the confines of our mental model, but this model is not always well adapted to the requirements of the real world.20
16 Amos Tversky and Daniel Kahneman, “Judgment Under Uncertainty: Heuristics and Biases,” Science 185, no. 4157, pp. 1124-1131 (1974), JSTOR (accessed March 15, 2009), 1124.17 Richards J. Heuer, Jr., Psychology of Intelligence Analysis (Washington D.C.: CIA Center for the Study of Intelligence, 1999), 111. 18 David G. Myers, Intuition: Its Powers and Perils (New Haven: Yale University Press, 2002), 3-5.19 Malcolm Gladwell, Blink: The Power of Thinking Without Thinking (New York: Back Bay Books/Little, Brown and Company, 2007), 23.20 Heuer, “Limits,” 78; citing Herbert Simon, Models of Man (New York: John Wiley & Sons, 1957).
10
According to Heuer, these mindsets, which Webster’s defines as “a mental attitude or
inclination,” and as “a fixed state of mind,” serve a good purpose for the most part.21
When information is incomplete, ambiguous, or contradictory, mindsets help assimilate
new information quickly and efficiently by using an existing mental framework based on
previous experience, education, and preconceptions to interpret that information.
However, these rigid mindsets sometimes betray our judgment because they do not adapt
well when new information challenges strongly held beliefs and preconceptions.22 One
former CIA analyst, Stanley Feder, specifically identifies mindsets as being “a major
cause of intelligence and policy failures for decades.”23
Intuition further discusses two other biases particularly relevant to intelligence
analysis: overconfidence and confirmation bias. While overconfidence is self-
explanatory, confirmation bias is defined as the tendency “for people to seek information
and cues that confirm the tentatively held hypothesis or belief, and not seek (or discount),
those that support an opposite conclusion or belief.”24 A relevant example of this was the
tendency of some in the US Intelligence Community leading up to the invasion of Iraq in
2003 to seek evidence confirming the established belief that Saddam Hussein had
weapons of mass destruction while discounting or neglecting dissonant evidence.25
21 Heuer, “Limits, 86; Merriam-Webster’s Collegiate Dictionary, 11th ed., s.v. “mindset.”22 Heuer, “Limits, 76, 81, 83, 86.23 Stanley A. Feder. “Forecasting for Policy Making in the Post-Cold War Period,” Annual Review of Political Science Vol. 5. (2002): 113.24 Christopher D. Wickens and Justin G. Hollands, Engineering Psychology and Human Performance, 3rd ed. (Upper Saddle River, NJ: Prentice Hall, 2000), 312.312.25 United States Government. Commission on the Intelligence Capabilities of the United States Regarding Weapons of Mass Destruction. Report to the President of the United States, 31 March 2005, (Washington D.C.), p 162. <http://www.wmd.gov/report/>
11
There is a plethora of other cognitive biases that also plague intuitive thinking in
intelligence. These biases can manifest themselves in research strategy, perception, and
memory. One of the major criticisms of intuitive thinking is that it has the tendency to
identify the first plausible or reasonable hypothesis and seek evidence that supports this
hypothesis, known as “satisficing.”26 The problem with this method is that often the same
evidence is also consistent with any number of alternative hypotheses. Given this, an
analyst risks fooling himself into thinking he has identified the most likely hypothesis,
but unaware he is overlooking other valid, and possibly more likely, alternatives. Also
among these is vividness bias, which is the tendency for vivid evidence to have greater
influence on our thinking than less vivid evidence, regardless of its true value.27 Another
common cognitive bias found in intuitive thinking is availability bias, which is the
tendency for people to estimate the likelihood of an event largely based on how many
relevant past instances they can recall and how easily they come to mind.28 These are
only a few among many cognitive biases that can hinder human cognition.
Acknowledging its weaknesses, Gladwell states that intuition’s effectiveness is
dependent on the absence of these biases.29 This opens an important question regarding
the utility of intuition in intelligence analysis. That is, if the efficacy of intuitive thinking
is dependent on the absence of such biases, then how prominent are these in human
cognition? Specifically, if these biases are prominent and difficult to willfully bypass, this
would suggest that intuition alone is ineffective when dealing with high-risk analytic
decision making. This is where Gladwell’s argument unravels because these biases are 26 Heuer, “Psychology,” 44. 27 Ibid, 116.28 Amos Tversky and Daniel Kahneman, “Availability: A Heuristic for Judging Frequencyand Probability,” Cognitive Psychology, 5 (1973): 207-232.29 Gladwell, 72-76.
12
pervasive and difficult to avoid in intuitive thinking. Heuer likens these biases to “optical
illusions in that the error remains compelling even when one is fully
aware of its nature. Awareness of the bias, by itself, does not produce
a more accurate perception.”30
Michael LeGault contributes to the list of flaws in Gladwell’s argument for
intuition with his book, Think: Why Crucial Decisions Can’t Be Made in the Blink of an
Eye, pointing out that many of the examples he gives are misleading or out of context.
Among these include the case of a museum which purchased what was assumed to be an
authentic Greek statue for its collection. From the start, various experts felt something
was wrong with the statue and these intuitive impressions subsequently led to the
discovery that it was a forgery. LeGault correctly points that these initial impressions
were not really the work of pure intuition, but resulted from observers’ expertise and
scientific inquiry, albeit at the unconscious level at first.31
Although intuitive thinking is the predominant style of analysis in the United
States national security and intelligence infrastructure,32 the use of structured methods has
been “debated in analytic circles for decades.”33 According to Folker, “At the heart of this
controversy is the question of whether intelligence analysis should be accepted as an art
(depending largely on subjective, intuitive judgment) or a science (depending largely on
structured, systematic analytic methods).”34
30 Heuer, “Psychology,” 112.31 Michael R. LeGault, Think: Why Crucial Decisions Can’t Be Made in the Blink of an Eye (New York: Threshold Editions, 2006), 8-10.32 Marrin, 9.33 Ibid, 8.34 Folker, 6.
13
Of these two ideological camps, advocates of intelligence analysis as an art
believe that many factors in a given analytic problem are too complex and abstract to be
incorporated into methods that are rigid and scientific.35 Hence, Folker sums up; this side
argues that the most effective qualitative analysis “is an intuitive process based on
instinct, education, and experience.”36 Even those who acknowledge structured methods
can improve analysis contend such improvements would be so minute that resources
would be better allocated to improving some other aspect of intelligence.37
Advocates of intelligence analysis as a science argue that structured methodology
improves analysts’ ability to evaluate evidence and form conclusions.38 Additionally,
Folker states, “there is also a concern that the artist [analyst] will fall in love with his art
and be reluctant to change it even in the face of new evidence. The more scientific and
objective approach encourages the analyst to be an honest broker and not an advocate.”39
These proponents argue that while subject-matter expertise has its utility, this also
predisposes an analyst to be stuck within the confines of their own subject-area’s
heuristics, which can manifest themselves as cognitive biases.40 Heuer further makes the
case for the use of structured methods when he points out that the “the circumstances
under which accurate perception is most difficult are exactly the circumstances under
which intelligence analysis is generally conducted—making judgments about evolving
35 Folker, 6-7, citing Richard K. Betts, “Surprise, Scholasticism, and Strategy: A Review of Ariel Levite’s Intelligence and Strategic Surprises (New York: Columbia University Press, 1987),” International StudiesQuarterly 33, no. 3 (September 1989): 338. 36 Folker, 7; citing Tom Czerwinski, ed. Coping with the Bounds: Speculations in Nonlinearity in Military Affairs (Washington: National Defense University, 1998), 139.37 Folker, 9.38 Ibid, 10.39 Ibid, 10.40 Johntson, “Integrating Methodologists Into Teams of Substantive Experts,” 65.
14
situations on the basis of incomplete, ambiguous, and often conflicting information that is
processed incrementally under pressure for early judgment.”41
Obtained through his experience in the US Intelligence Community, Feder offers
empirical insight that argues for the utility of structured methods in some circumstances.
While serving as a political analyst at the CIA, Feder used one particular structured
quantitative method to forecast more than 1200 international events.42 During this time,
he found that the structured method, when “compared with conventional intelligence
analyses…had more precise forecasts without sacrificing accuracy.”43 Feder also claims
that another specific structured method used at the CIA “helped avoid analytic traps and
improved the quality of analyses by making it possible to forecast specific outcomes and
the political dynamics leading to them.”44 Also, while this method did not increase
forecasting accuracy over intuitive analysis, it did provide more nuanced results.45
The research and experimentation of Phillip Tetlock suggests that in general,
intuition is lacking as an analytic method. However, cognitive styles similar to structured
methods of thinking were found to be correlated to better judgment. In his book, Expert
Political Judgment, the author aims to define indicators of good judgment, concluding,
“What experts think matters far less than how they think.”46 Tetlock uses a concept first
illustrated by Isaiah Berlin in “The Hedgehog and the Fox” from The Proper Study of
Mankind:
41 Heuer, “Limits,” 78-79.42 Feder, “Forecasting,” 118-119.43 Ibid, 119.44 Stanley A. Feder, “FACTIONS and Policons: New Ways to Analyze Politics.” Inside the CIA’s Private World: Declassified Articles from the Agency’s Internal Journal, 1955-1992, ed. H. Bradford Westerfield (New Haven: Yale University Press, 1995), 275.45 Feder, “FACTIONS,” 275.46 Philip E. Tetlock, Expert Political Judgment (Princeton: Princeton University Press, 2005), 2.
15
If we want realistic odds on what will happen next…we are better off turning to experts who embody the intellectual traits of Isiah Berlin’s prototypical fox – those who ‘know many little things,’ draw from an eclectic array of traditions, and accept ambiguity and contradiction as inevitable features of life – than we are turning to Berlin’s hedgehogs – those who ‘know one big thing,’ toil devotedly within one tradition, and reach for formulaic solutions to ill-defined problems.47
In his research, Tetlock analyzed and compared the forecasts of human participants and
“mindless” statistical strategies.48 Among the human participants were subject-matter
experts and amateurs, all of who used intuitive thinking.49 These groups made predictions
on the short and long-term futures of economic, political, and national security policies of
numerous countries.50 Examining the quantitative results, Tetlock discovered that human
participants, even when advantaged with subject-matter expertise, always performed
worse than various statistical strategies of assigning likelihoods. However, Tetlock
noticed a level of consistency in some forecasters that clearly was not the result of
chance.51
To explain this, he searched the results for correlations in good judgment to
participants’ backgrounds, belief systems, and cognitive style - how they think. The data
showed that level of education and professional experience had no correlation to better
judgment.52 To measure cognitive style, all participants answered a questionnaire, from
which Tetlock discovered a significant correlation between participants’ cognitive styles
and their forecasting accuracy. The questionnaire revealed two dominant cognitive styles:
47 Tetlock, 2.48 Ibid, 49-51.49 Ibid, 54.50 Ibid, 49.51 Ibid, 7.52 Ibid, 68.
16
Berlin’s fox and hedgehog.53 Statistical analysis revealed that having a fox-type
personality correlated to higher accuracy in forecasting.54
When the participants first created their forecasts, they included commentaries
explaining their thought process.55 From this information, Tetlock made numerous
generalizations about why foxes were able to forecast more accurately. Among these
include that foxes are reluctant to view problems through an established, rigid
framework; more cautious to explain current and future events through overly simplistic
historical analogies; less inclined to make overly confident forecasts supported by
looping evidence; were more emotionally neutral; and are more likely to integrate
dissonant viewpoints into their analyses.56 Interestingly, these traits are also common
benefits derived from structured analytical techniques. Tetlock’s research demonstrates
that intuitive thinking, whether used by a subject-matter expert or an amateur, is less
effective than cognitive styles that bear resemblance to structured methods because these
are less susceptible to the errors of cognitive bias.57
Proponents of intuitive analysis make valid points about the power of intuition
and the inherent limitations of structured methods in intelligence. That is, intuitive
thinking is naturally the basis of all analysis. Also, information used in intelligence
analysis problems will sometimes not fit easily into the rigid framework of a structured
method. On the other hand, proponents of structured methods make valid points about the
potential benefits of using such methods to aid intuitive analysis. That is, structured
53 Tetlock, 72-75.54 Ibid, 78-80.55 Ibid 88.56 Ibid; 88-92, 100-107. 57 Ibid, 117-118.
17
thinking can improve both accuracy and nuance by mitigating the effects of cognitive
bias and other judgmental errors. The research and experimentation of Tetlock and others
supports this assertion.
Folker is probably correct when concluding that intelligence analysis is not
exclusively one or the other, but instead “a combination of both intuition and scientific
methods.”58 Both styles of thinking have their strengths and weaknesses; and nothing
suggests they could not supplement each other. While this question still deserves future
research and debate, the “either/or proposition”59 may not be the most progressive
question to ask. Instead, the more appropriate question might be when are structured
methods appropriate? Hopefully, experiments such as this one will advance our
understanding of the utility of structured methods with various analytic problems.
Structured Methods in Intelligence
According to Dr. Rob Johnston from the CIA Center for the Study of Intelligence,
intelligence analysts currently have access to over 200 structured analytic methods.60
Despite this, intuition appears to be the predominant style of analysis within the IC and
most experts agree that structured methods are generally unused. Specifically, one expert,
Stephen Marrin, suggests the use of structured methods is mostly limited to analysts who
are required to use a very a specific methodology for a very specific purpose, such as
social network analysis for terrorism or counter-narcotics.61 Folker’s survey of 40
intelligence analysts from across the US Intelligence Community supported these
58 Folker, 13.59 Ibid, 13.60 Johnston, “Integrating Methodologists Into Teams of Substantive Experts,” 65.61 Marrin, 9.
18
assertions, revealing only one analyst who claimed to routinely use a structured analytic
method.62
There are several reasons why structured methods are not widely used in the US
Intelligence Community. The primary reason for the non-use of structured methods is an
analytic culture predisposed to intuitive thinking. Specifically, Feder states that this
culture views analysts primarily as writers and summarizers of information, rather than
“methodologists” who tinker with scientific tools.63 Whether or not organizational culture
is a key factor, Folker states that in general, “most people instinctively prefer intuitive,
non-structured approaches over structured methodologies.”64 Folker further explains:
Structured thinking is radically at variance with the way in which the human mind is in the habit of working. Most people are used to solving problems intuitively by trial and error. Breaking this habit and establishing a new habit of thinking is an extremely difficult task and probably the primary reason why attempts to reform intelligence analysis have failed in the past, and why intelligence budgets for analytical methodology have remained extremely small when compared to other intelligence functions.65
Furthermore, according to Heuer, given the purpose and nature of their work, intelligence
analysts, “[tend] to be skeptical of any form of simplification such as is inherent in the
application of probabilistic models.”66 While attempting to introduce new structured
methods to political analysts at the CIA in the 1970s, Heuer recalls that responses to the
notion of structured methods “typically ranged from skepticism to hostility.”67 The
underpinning of this skepticism, as discussed earlier, is the belief that structured methods
62 Folker, 11.63 Feder, “Forecasting,” 119.64 Folker, 2. 65 Folker 14; partly citing Morgan D. Jones, The Thinker’s Toolkit, 8.66 Heuer, Adapting Academic Methods and Models to Government Needs: The CIA Experience (Carlisle Barracks: Strategic Studies Institute, 1978), 7.67 Ibid, 5.
19
cannot effectively be applied to qualitative problems. Likely augmenting this skepticism
is the lack of empirical data demonstrating structured methods’ efficacy. While
proponents have argued the case for structured methods, few experiments have been
conducted which demonstrate their efficacy.68
Inadequate education regarding the use of structured methods is also to blame for
their non-use. Unlike many professions that have established cadres of specialists in
methodology, this is not the case with the US Intelligence Community. That is, exposure
to structured methods is typically dependent on self-education by individual analysts who
are heavily preoccupied with their own area of expertise.69 This work environment,
understandably, does not encourage busy analysts to spend time experimenting with new
analytical techniques. This is even more the case with more complex methods, such as
bayesian analysis.70
Analysis of Competing Hypotheses
Analysis of Competing Hypotheses (ACH) is one methodology that arguably can
improve intelligence analysis. According to the creator of the method, Richards J. Heuer,
Jr., ACH “requires an analyst to explicitly identify all the reasonable alternatives and
have them compete against each other for the analyst’s favor, rather than evaluating their
plausibility one at a time.”71 Heuer’s ACH is an eight step process; each with a specific
purpose in avoiding the flaws of unstructured thinking:72
68 Marrin, 10.69 Johntson, “Integrating Methodologists Into Teams of Substantive Experts,” 64-65.70 Folker, 8; citing Captain David Lawrence Graves, ISAF, Bayesian Analysis Methods for Threat Prediction, MSSI Thesis (Washington: Defense Intelligence College, July 1993), second page of Abstract.71 Heuer, “Psychology,” 95.72 These are taken directly from Heuer’s eight-step ACH process as cited. Heuer, 97. A more detailed discussion of these eight steps can be found in Chapter Eight of “Psychology.”
Figure 2.1 - Example ACH matrix from Psychology of Intelligence Analysis
20
1. Identify all possible hypotheses. 2. Make a list of significant evidence and arguments for and against each hypothesis, including assumptions. 3. Prepare a matrix with hypotheses across the top and evidence down the side. Analyze the “diagnosticity” of the evidence and arguments.4. Refine the matrix. Reconsider the hypotheses and delete evidence and arguments that have no diagnostic value. 5. Draw tentative conclusions about the relative likelihood of each hypothesis. Proceed by working down the matrix, trying to disprove the hypotheses rather than prove them. 6. Analyze how sensitive your conclusion is to a few critical items of evidence. Consider the consequences for your analysis if that evidence were wrong, misleading, or subject to a different interpretation. 7. Report conclusions. Discuss the relative likelihood of all the hypotheses, not just the most likely one. 8. Identify milestones for future observation that may indicate events are taking a different course than expected.
The first step of ACH is simply to identify all possible hypotheses, which Heuer
defines as, “a potential explanation or conclusion that is to be tested by collecting and
presenting evidence.”73 It is preferable to generate hypotheses in group discussion in
order to benefit from different perspectives and to reduce the likelihood that a plausible
hypothesis will not be identified.74 According to Heuer, there are not an ideal number of
hypotheses for any given
problem; but the number
should increase relative to
the level of uncertainty.75
While identifying
hypotheses, an emphasis is
73 Heuer, “Psychology,” 95.74 Ibid, 97-98.75 Heuer “Psychology,” 98.
21
placed on distinguishing between unproven and disproved hypotheses. That is, an
unproven hypothesis which has no supporting evidence in contrast to a disproved
hypothesis, which has specific evidence against it. Heuer warns against discarding an
unproven hypothesis simply because it lacks supporting evidence. Doing so can result in
prematurely rejecting a valid hypothesis. This precaution is essential because it is
possible supporting evidence exists but has not been found yet.76
The next step requires listing all pertinent evidence and arguments for and against
each hypothesis. This list is not limited to hard evidence but also includes assumptions
and logical deductions about the topic. These are incorporated into the structured process
because they will often have a strong influence on an analyst’s final thoughts. After
creating the list, an analyst asks himself several questions which will help identify
additional evidence that might be needed. For each hypothesis, what evidence should an
analyst expect to be seeing or not seeing if it were true? Also, the analyst considers how
the absence of evidence could be indicator itself.77 For example, in the case of possible
military attack, “the steps the adversary has not taken to ready his forces for attack may
be more significant than the observable steps that have been taken.”78
After the analyst is confident that all relevant evidence has been collected, step
three in the process requires constructing a matrix with the hypotheses lined over the top
and all evidence listed down the side. From this point, the analyst works across the matrix
one piece of evidence at a time, evaluating whether it is consistent, inconsistent, or
irrelevant to that hypothesis and makes an appropriate notation for future reference. This
76 Ibid.77 Heuer, “Psychology,” 99; Diane Chido, et al., 39-40.78 Heuer, “Psychology,” 99.
22
process is repeated for each piece of evidence until all cells in the matrix are filled. A
second objective in step three is to evaluate the diagnosticity of each piece of evidence.
That is, to evaluate its usefulness as an indicator for each hypothesis. Heuer uses a
medical analogy to demonstrate this principle. In trying to determine what illness a
patient is stricken with, a high-temperature does not have a high diagnosticity because
that symptom would apply to any number of illnesses. In the case of an ACH matrix,
evidence consistent with all hypotheses can be effectively useless in predicting an
outcome, and therefore, has a low diagnosticity.79
In the next step of the process, Heuer advises that the set of hypotheses should be
reevaluated for potential changes. After examining the evidence as it relates to each
hypothesis, it might be necessary to add, combine, or split hypotheses. According to
Heuer, this is essential because the nuances of each hypothesis will greatly affect how it
is analyzed. Additionally, evidence from step three found to have no diagnostic value is
removed from the matrix.80
After preparing and evaluating the matrix, each hypothesis is examined as a
whole and tentative conclusions are formed about the likelihood of each. The analyst
works down the matrix one hypothesis at a time, trying to disprove each with the
evidence. While no amount of consistent evidence can absolutely prove a hypothesis, a
single piece of evidence is enough to disprove it. Additionally, by disproving hypotheses,
an analyst is systematically narrowing down the possibilities until the most likely ones
are clear. The hypothesis with the least inconsistent evidence against it is viewed as the
79 Heuer, “Psychology,” 100-102.80 Heuer, “Psychology,” 103.
23
most likely possibility.81 However, Heuer warns, ACH is not meant to be the absolute
analytic solution to any problem, “the matrix serves only as an aid to thinking and
analysis, to ensure consideration of all the possible interrelationships between evidence
and hypotheses and identification of those few items that really swing your judgment on
the issue.”82 In the end, the analyst must make the final call.
Before finalizing the conclusion, the analyst questions the integrity of key pieces
of evidence and the repercussions if those linchpins turned out to be false, deceptive, or
misunderstood. Finally, when reporting conclusions, the analyst discusses the likelihood
of alternative possibilities and identifies circumstances which may indicate events are
unfolding differently than estimated.83
Strengths and Weaknesses
The methodology’s primary apparent strength is its ability to mitigate cognitive
biases such as satisficing. The ACH process is a structured, systematic methodology for
identifying all the possibilities and evidence, and determining the relation between all
information as a whole. By structuring the cognitive process, estimation and forecasting
will be less susceptible to flaws inherent in human cognition.84
81 Heuer, “Psychology,” 103-104.82 Ibid, 105.83 Ibid, 105-107.84 Kristan J. Wheaton, D.E. Chido, and McManis and Monsalve Associates, “Structured Analysis of Competing Hypotheses: Improving a Tested Intelligence Methodology” Competitive Intelligence Magazine, November-December 2006, http://www.mcmanis-monsalve.com/assets/publications/intelligence-methodology-1-07-chido.pdf (accessed 14 June 2008).
24
Another apparent strength of ACH is its usefulness as a management tool. The
design of the ACH matrix illuminates evidence and hypotheses side by side, acting as an
analytic “audit trail,” for any supervisory analyst or decision maker to take advantage of.
This benefits an analyst by being able to visually explain one’s thought process, and also
a manager, by aiding reviews of analytical judgments.85
While ACH is widely assumed to be a useful methodology, it has its weaknesses
as well as its strengths. The main weakness of ACH is that it can be time consuming.
While an analyst is often under time constraints, filling out an ACH matrix can be
tedious.86 However, several computer software companies, such as the Palo Alto
Research Company (PARC), have developed programs which automate the ACH
process.87 While ACH can still be a lengthy process, these computer programs have
helped make applying the methodology less time consuming.
Another weakness of ACH is difficulty incorporating information from ongoing
events, making it limited to being “only a snapshot in time.”88 As analysts are under time
constraints, they must force themselves to stop adding evidence into the matrix and begin
creation of their final analytic product, even if new information is available.89
Previous Studies on ACH
Quantitative studies on ACH have produced mixed findings regarding its
effectiveness as an analytic methodology, both for accuracy and mitigating cognitive
85 Marrin, 7.86 Kristan Wheaton, et al., 13.87 Palo Alto Research Center, “ACH2.0 Download Page,” http://www2.parc.com/istl/projects/ach/ach.html (accessed August 19, 2008).88 Diane Chido, et al., 50.89 Ibid.
25
biases. More studies are necessary because only a limited number have been conducted
so far. Additionally, testing ACH under varying conditions will help shed light on how
these conditions affect its performance.
In 2000, Robert D. Folker concluded in his paper, Intelligence Analysis In Joint
Intelligence Centers: An Experiment in Applying Structured Methods, that “…
exploitation of a structured methodology will improve qualitative intelligence analysis.”90
In his study, conducted in conjunction with the Joint Military Intelligence College
(JMIC), Folker tested the accuracy of hypothesis testing; a structured method nearly
synonymous with Heuer’s ACH. The researcher measured this by comparing the
accuracy of two groups; one using hypothesis testing and one using an unstructured,
intuitive approach to the same two intelligence scenarios.91 The experimental group
performed slightly better in the first scenario using hypothesis testing, but the difference
was not statistically significant. However, the difference between control and
experimental groups was statistically significant in the second scenario. Overall,
participants using hypothesis testing performed better than those using intuitive
analysis.92
Folker also notes that many experimental group participants “had difficulty
identifying all of the possible hypotheses and determining the consistency of each piece
of evidence with each hypothesis.”93 Because of this observation, Folker acknowledges
that the effectiveness of structured methods depends heavily on the type of problem and
90 Folker, 29.91 Ibid, 15.92 Ibid, 29.93 Ibid, 30.
26
the training of each analyst. However, he concludes that an adequately trained analyst
and a structured methodology can improve intelligence analysis:
Analysis involves critical thinking. Structured methodologies do not perform the analysis for the analyst; the analyst still must do his own thinking. But by structuring a problem the analyst is better able to identify relevant factors and assumptions, formulate and consider different outcomes, weigh different pieces of evidence, and make decisions based on the available information. While exploiting a structured methodology cannot guarantee a correct answer, using a structured methodology ensures that analysis is performed and not overlooked. 94
The MITRE Foundation conducted a study in 2004 on how ACH affects
confirmation bias and the anchoring effect. They define the anchoring effect as the
“tendency to resist change after an initial hypothesis is formed.”95 The study compared
groups working on the same intelligence problem; one group with ACH and one group
without. They found ACH users were just as susceptible to confirmation biases as non-
ACH users, except in special circumstances. ACH did not help mitigate an anchoring
effect, but the researchers admit this result is unreliable due to testing conditions.96 A
pattern of evidence distortion was present in both ACH and non-ACH groups but this is
negligible due to data inconclusively linking it to actual confirmation bias.97 Lastly, a
weighting effect was present in the study and ACH helped mitigate this, but only with
users less experienced in intelligence analysis.98 The researchers’ final conclusion is that
although “ACH is intended to mitigate confirmation bias in intelligence analysts…there
is no evidence that ACH reliably achieves this intended effect.”99
94 Folker, 33.95 B. Cheikes et al., Confirmation Bias in Complex Analyses. (Bedford, MA: MITRE, 2004), 9.96 Ibid, 9.97 Ibid, 12.98 Ibid, iii.99 B.A. Cheikes, et al., 16.
27
In 2004, Jean Scholtz conducted an evaluation of ACH with six Naval Reservists,
who used both intuitive analysis and ACH to solve different intelligence problems. All
participants were tasked one of two intelligence problems, using intuitive analysis for the
first and ACH for the second. After completing both problems, Scholtz administered a
questionnaire to all participants regarding their experience with ACH. The answers from
these questionnaires were overwhelmingly positive toward ACH. Among the answers
provided by participants were that they felt ACH improved their analysis, it was easy to
use, and they would be inclined to use it in the future.100 The quantitative data suggested
that ACH helps users consider more hypotheses and incorporate more evidence.101
In 2006, Peter Pirolli conducted an experiment on ACH in an intelligence
classroom at the Naval Postgraduate School (NPS). Pirolli split students at the NPS into
two groups: those analyzing a problem using ACH on paper, and those using computer-
assisted ACH. In his final paper, Assisting People to Become Independent Learners in
the Analysis of Intelligence, Pirolli concluded there was little difference in ACH used on
paper and computer-assisted ACH.102 Also, post-experiment reviews from participants
were positive about the application of ACH.103
Hypotheses
Taking into consideration the purpose and purported benefits of ACH, as well as
previous literature and studies pertinent to the subject, I developed a series of testable
100 Jean Scholtz, Analysis of Competing Hypotheses Evaluation (Gaithersburg, MD: National Institute of Standards and Technology, 2004), 1.101 Ibid, 12.102 Peter Pirolli, Assisting People to Become Independent Learners in the Analysis of Intelligence (Palo Alto Research Center, Calif.: Office of Naval Research, 2006), 63.103 Ibid.
28
hypotheses. My first hypothesis is that participants using ACH will, as a group, produce
more accurate forecasts regarding the assigned task than those using intuitive analysis.
The second hypothesis is that evidence of cognitive biases and mindsets will be more
prevalent among those using intuitive analysis, but less so among those using ACH
because of its ability to mitigate such phenomena.
METHODOLOGY
Research Design
This experiment was designed with a control and experimental group and
conducted over the course of two weeks in October 2008. Both groups were tasked to
forecast the result of the 2008 Washington State gubernatorial election, which occurred
on November 4, 2008. However, participants in the experimental group were instructed
to use ACH to structure their analysis. Also, participants were organized into control and
experimental groups by political affiliation so that the effects of mindsets, if present,
Figure 3.1
29
could be measured between groups. Furthermore, the use of evidence among all
participants would be used to ascertain the presence and effects of confirmation bias.
Unlike many experiments where participants’ commitment involved a single, sit-
down session to complete a task, this experiment gave participants a full week to
complete the assignment at their own convenience and they were given freedom to
collect any open source information which they viewed as relevant to the tasking. I
structured the experiment in this way to create a less artificial environment for
participants and one more similar to that in which most intelligence analysts work.
Participants
Participants in the
experiment were composed of
undergraduate and graduate
students from the Mercyhurst
College Institute for Intelligence
Studies (MCIIS). There were a
total of 70 students who
participated in the experiment, with 38 in the control group and 32 in the experimental
group. All class years were well represented in the experiment as a whole, including a
markedly higher number of juniors and first year graduate students (See Figure 3.1). The
distribution of class years within each group was nearly even, except for a higher number
of first year graduate students in the control group and a higher number of second year
graduate students in the experimental group (See Figure 3.2). I placed nearly all first year
Figure 3.2
Figure 3.3
30
graduate students in the control group because they lacked experience in ACH at the
time. I placed most second year graduate students in the experimental group in order to
even out the distribution of graduate students among both groups.
Although I did not require all participants to use ACH in their tasking, I did
require that all participants had used the methodology at least once before participating in
this experiment (first year graduate students being an exception). This was done mostly
for ease in assigning participants
to control and experimental
groups. This is also why
freshmen students were not
permitted as participants, because
they had not yet used the
methodology in any of their
academic coursework. The exclusion of freshmen students also likely ensured an overall
more mature and experienced pool of participants.
In total, there were a noticeably higher number of students with the affiliation as a
Republican than as a Democrat (See Figure 3.3). In the control group, the proportion of
Republicans to Democrats was around 1.5:1. In the experimental group, this proportion
was nearly 2:1. Although an even number of Republicans and Democrats in both groups
would have been ideal, the circumstances surrounding participant recruitment did not
allow me to be overly
selective.
31
Procedures
I spent two weeks prior to conducting the experiment visiting classes to recruit
intelligence students as participants. While recruiting, I briefly explained what my
research was on, the time and work required, and the benefits for those who participated.
The primary benefit offered was that some professors were willing to assign extra credit
to those students who volunteered to participate. After giving my brief presentation on
the experiment, I handed out and collected signup sheets from those who were interested
(See Appendix A). The sign-up sheets requested contact information, class year, political
affiliation, and preference for four different time slots to participate in the experiment.
After collecting signup sheets and finishing recruitment, I e-mailed all students with their
assigned time slot for the experiment. Time slots were assigned by myself rather than
chosen by participants so I could ensure a fairly even distribution of Republicans and
Democrats among the control and experimental groups.
While recruiting, I told students my thesis topic was “structured analytical
methods,” rather than ACH. All students who participated had used ACH at least once
through coursework in the Intelligence Studies program and were familiar with the
methodology’s purpose of mitigating cognitive bias. If I had emphasized the use of the
32
methodology while recruiting, it might have ruined the integrity of the experiment’s
results by giving students insight into the purpose of the experiment.
At the beginning of each tasking session, I handed out the Consent Form for each
participant to sign and return to me (See Appendix B). This Consent Form explained the
purpose of the experiment, what participation entailed, that there was no anticipated
dangers or harmful effects associated with participating, and that they may discontinue
participation at any time without penalty. After collecting Consent Forms, I handed out
experiment packets containing their tasking, answer sheet, and other relevant information
(See Appendix C). I reviewed the packet with them, explained their tasking, what was
expected during their participation, and discussed other issues related to successful
completion of the experiment. Specifically, I reviewed concepts relevant to the tasking
such as words of estimative probability (WEP), analytic confidence, and source
reliability.
At the end of the tasking session, participants were instructed on procedures for
returning their answer sheets for the experiment. Over the course of the next week and a
half, I, along with a colleague who offered his assistance, collected answer sheets from
participants who finished the experiment. Upon returning their answer sheet, participants
received a debriefing statement and a post-experiment survey. The debriefing statement
thanked students for participating, explained the purpose of the experiment in further
detail, as well as how this research would contribute to the body of academic work in
their field (See Appendix D). There were two different post-experiment surveys given to
participants, one for the control and one for the experimental (see Appendix E). The
surveys asked questions related to how much time and work was spent on the experiment,
33
estimated difficulty, as well as their understanding of the assigned task. The survey for
the experimental group also included questions about their understanding of ACH. The
purpose of these surveys was that, if the experiment was not successful, I would have
some feedback for structuring a future attempt.
Control Group
After attending the tasking session, control group participants had a full week
from that date to complete their assigned task. This task was to assume the role of a
political analyst working for a fictional news company and forecast the result of the
upcoming 2008 Washington State gubernatorial election. The two hypotheses implicitly
provided in the tasking were:
● The incumbent governor, Christine Gregoire (D), will win the election.
● The challenger, Dino Rossi (R), will win the election.
Participants received some basic background information about the election and its
candidates, and were encouraged to use all available open source information, but were
specifically instructed to use intuitive analysis. On the provided answer sheet,
participants were tasked to include an estimative statement summarizing their analysis.
The answer sheet also included a place to further explain their analytical findings, but this
was not required. The words of estimative probability (WEP) used in the experiment
were primarily based on those used by the National Intelligence Council (See Figure 3.4).
However, there were some slight modifications to accommodate the needs of the
experiment. First, the most central expression of likelihood, “even chance,” was removed.
The research design of this experiment required an analytical problem where the
Figure 3.5 – NIC Words of Estimative Probability
Figure 3.4 – Experiment Words of Estimative Probability
34
likelihood of both hypotheses was so similar that, in this case, politically oriented
mindsets could tip participants’ forecast. Because the result of the election would be
difficult to call, I knew that a high number of participants would be tempted to select a
centrist/neutral expression of likelihood. Although this selection may be legitimate, it
would have likely skewed the results because a high number of participants would have
supplied an answer useless to the research question. The second modification was adding
a level of likelihood between “likely” and “almost certain,” as well as its negative
equivalent on the opposite end of the scale. This is more similar to the scale of WEP used
by the students at Mercyhurst and I also felt this was more appropriate for the topic being
analyzed (See Figure 3.5). Although the Washington State gubernatorial election was
expected to be very close, I felt some participants still might desire to indicate a level of
likelihood greater than “likely,” but not “almost certain.”
Participants’ tasking also included assigning low, medium, or high for an
indication of overall source reliability. Although already familiar with the concept of
source reliability, their tasking sheet included a short explanation. For analytic
Figure 3.6 – Continuum-like Scale
35
confidence, I required participants to use a continuum-like scale rather than a numeric
scale (See Figure 3.6).
Lastly, I provided control group participants with suggestions for beginning their
research. This included a non-partisan website containing basic information about
Washington State politics and links to related resources. Additionally, since MCIIS
students are not familiar with forecasting domestic elections, I provided a list of types of
evidence that could be useful indicators for the result of a gubernatorial election (See
Appendix C).
Experimental Group
Tasking for the experimental group was identical to the control group except that
participants were required to use the Palo Alto Research Center (PARC) ACH 2.0
software to create an ACH matrix for their analyses. They were instructed to print out this
matrix and return it along with their answer sheet. During their tasking session, I
reviewed and discussed ACH to ensure everyone’s understanding of the methodology
was fresh and accurate.
36
Data Analysis
The primary question of this research is whether or not ACH increases forecasting
accuracy. I sought to answer this question simply by comparing the control and
experimental groups to see if there was a significant difference between the accuracy of
their forecasts. The secondary question is whether or not ACH helps mitigate the effects
of cognitive bias and mindsets in users. If the results yield discernible patterns in
participants’ forecasts as related to their political affiliations, this would likely be an
indicator of a politically oriented mindset. Also, if candidates overwhelmingly supplied
evidence only in favor of their forecasted candidate, this would suggest the presence of
confirmation bias, specifically. If such patterns existed in the control group but were less
pronounced or non-existent in the experimental group, this would suggest ACH helps
mitigate confirmation bias.
All data pertaining to the above research questions was tested for statistical
significance using a program called Statistical Package for the Social Sciences (SPSS).
Derived from a series of mathematical formulas and tests, statistical significance is the
likelihood that the difference between control and experimental group data is the result of
mere coincidence. The SPSS tests for all data sets were placed at a 5 percent (.05)
threshold for statistical significance. That is, to achieve statistical significance, the chance
that the findings are mere coincidence must be 5 percent or less.
Figure 4.1
37
RESULTS
Accuracy
At the end of the 2008 Washington State Gubernatorial Election, the incumbent
Democrat, Christine Gregoire (D), defeated the Republican challenger, Dino Rossi (R),
38
by a margin of 6.4 percentage points.104 After compiling and analyzing the results,105 I
found that accuracy improved from the control to experimental group by 9 percentage
points. In the control group, 61 percent of participants forecasted accurately in favor of
the eventual winner, Gregoire (See Figure 4.1). Accuracy in the experimental group
improved slightly with 70 percent of participants forecasting Gregoire (D) as the winner.
Statistical testing found that the data on accuracy is not statistically significant,
having a P-value of .421 (See Appendix F). While this testing does not definitively
invalidate these experiment results, it does raise some doubt about their validity. Other
factors that could have prevented statistical significance are the small sample size and
smaller difference between the control and experimental group data.
Furthermore, there is good reason to believe that the difference in accuracy
between the control and experimental groups in such an experiment should not be that
great. Although many criticisms of the human thought process are valid, intuitive analysis
104 Washington Secretary of State. November 4, 2008 General Election. http://vote.wa.gov/elections/wei/Results.aspx?RaceTypeCode=O&JurisdictionTypeID=2&ElectionID=26&ViewMode=Results. 105 These results exclude two outliers and contain one data correction in the experimental group.
39
is not obsolete. For an experiment like this one, a structured method should only improve
overall forecasting accuracy incrementally since intuitive analysis is, for the most part, an
effective method itself. Additionally, if and when cognitive bias affects an analyst’s
intuitive thought process, structured methods such as ACH can aid as a counter measure.
In other words, a structured method will not improve the analysis of all users. In sum, the
improvement of the group using ACH should not be discounted because it is modest.
This difference is expected and still supports the notion that ACH can improve analysis.
Mindsets
As discussed in the previous section, if a politically oriented mindset is present, it
should manifest itself in the results by a strong tendency of participants to forecast in
favor of the candidate associated with their own political affiliation. However, if ACH
helps mitigate this, this tendency should be less prominent. For example, if forecasts
among Republicans are significantly more in favor of Rossi (R) in the control group, but
more in sync with the actual winner of the election in the experimental group, this would
suggest that ACH helped mitigate the effect in that group. The same should hold true for
Democratic participants. However, interpreting the results will be subject to the winner of
the election. In this case, such a mindset among Democrats will be more difficult to
identify and evaluate because the democratic candidate won. Data comparing forecasts
between Democrats and Republicans in the control and experimental groups is depicted
in Figure 4.2.
Figure 4.2
40
Among Democrats, the percentage of participants who forecasted in favor of
Gregoire (D) compared to Rossi (R) was strongly in favor of Gregoire and remained
nearly identical from the control to experimental group. While this might suggest the
effects of a mindset were prevalent in both groups, it is more likely this appears to be the
case not because of the influence of an actual mindset, but because Democrats
overwhelmingly forecasted correctly in both groups. Unfortunately, this muddles the
ability to estimate the number of Democrats whose forecasts were subject to a mindset.
This hypothetical number of Democrats is likely hiding somewhere among the total
number of Democrats who forecasted accurately in favor of Gregoire (D).
Analyzing Republican forecasts in the control and experimental groups yields
more discernable results. In the control group, the proportion of forecasts between
candidates was nearly equal, with only a 4 percent margin favoring Gregoire (D).
However, this proportion changed dramatically in the experimental group with the
margin expanding to 36 percentage points. This suggests it is likely that ACH helped
41
mitigate a politically oriented mindset among Republicans in the experimental group. It is
likely that Republicans’ thought process in the control group was heavily influenced by
their political leanings and preference for the Republican candidate, while ACH mitigated
these effects among some users in the experimental group.
Additionally, although 32 percent of experimental group Republicans forecasted
incorrectly in favor of Rossi, they displayed better calibration than their counterparts in
the control group. That is, they were arguably less wrong. Tetlock defines calibration as
“the degree to which subjective probabilities [analytic estimate] are aligned with
objective probabilities.”106 Although their estimate was wrong, their matrices generally
indicated a lower level of likelihood than that of the control group analyses. Of the 32
percent of Republicans who still got it wrong with ACH, the methodology arguably
brought them closer to forecasting correctly than those in the control group.
Like the dataset on accuracy, this data did not meet the standard for statistical
significance, having P-values of .973 and .291 for Democrats and Republicans,
respectively (See Appendix F). However, also like the dataset on accuracy, this is likely
attributable to the even smaller sample size. Breaking down participants into Democrats
and Republicans in the control and experimental groups essentially cut the sample size of
each dataset in half, making it difficult to extract statistically significant results.
Furthermore, for the statistical testing on accuracy and mindsets, it is important to
consider appropriate standards for significance with different types of research. Although
the threshold for statistical significance was set at the general standard (p=.05), it is
acceptable to interpret statistical results less stringently in exploratory research. Although
106 Tetlock, 47.
Figure 4.3
42
the statistical results for mindsets among Republicans would not even satisfy an
acceptable standard for exploratory research (.10), having a P-value of .291 is still
notable for its proximity.107 Also, this P-value essentially says there is about a 70 percent
chance that the data is not the result of chance, suggesting that further research, with
larger data sets, is warranted.
Confirmation Bias
Comparing the levels of consistent and inconsistent evidence between groups
clearly reveals confirmation bias among participants in the control group. As discussed
earlier, confirmation bias is the tendency “for people to seek information and cues that
confirm the tentatively held hypothesis or belief, and not seek (or discount), those that
support an opposite conclusion or belief.”108 Regardless of political affiliation or forecast,
107 David G. Garson, Guide to Writing Empirical Papers, Theses, Dissertations (New York: Marcel Dekker, Inc., 2002), 199.108 Wickens and Hollands, 312.
Independent Samples Test
5.940 .018 -7.851 60 .000
-7.772 52.783 .000
Equal variancesassumed
Equal variancesnot assumed
Confirmation BiasF Sig.
Levene's Test forEquality of Variances
t df Sig. (2-tailed)
t-test for Equality of Means
Figure 4.4 – SPSS Testing Results for Confirmation Bias
43
80 percent of all participants in the control group provided evidence in their answer
sheets that entirely supported their forecasted candidate.109 On the other hand, only 9
percent of experimental group participants exhibited this behavior. The ACH matrices of
these participants show that both hypotheses were considered with varying proportions of
consistent and inconsistent evidence. Furthermore, SPSS testing on confirmation bias
revealed a statistically significant difference between control and experimental group
data, with the P-value being .000 (see Figure 4.4). In other words, according to the
calculations of the SPSS program, there is a zero percent chance that the results for
confirmation bias can be attributed to coincidence. This data suggests ACH tremendously
helped mitigate confirmation bias.
Other Findings of Interest
109 This data excludes eight outliers. These outliers were participants who did not provide any evidence whatsoever along with their estimative statement.
Table 4.1
44
Comparing the average number of pieces of evidence used by each group in
creating their estimate reveals a staggering difference and suggests something about the
ability of ACH to encourage users to seek
out and use more information (see Table
4.1). In the control group, participants used
on average less than 3 pieces of evidence for
their analysis. On the other hand, participants in the experimental group used on average
10 pieces of evidence. This is almost certainly attributable to one of the weaknesses of
intuitive analysis and one of the strengths of ACH. One flaw of intuitive analysis is that
the human thought process is constrained by the inability to process more than a handful
of individual pieces of information at a time.110 Given this, analysts will often make a
judgment unaware that they are using an inadequate amount of information. On the other
hand, a structured method such as ACH allows a user to visualize all the information at
the same time. This will not only increase accuracy by allowing the user to better
understand the relationship of all the evidence, but also makes it easier for an analyst to
identify information gaps. As the concept applies to this experiment, I believe
participants using intuitive analysis included fewer pieces of evidence in their analysis
because using cognition alone, they were far were less likely to identify information gaps
and also maintained a false sense of confidence in their collection before making a
forecast. For those using ACH, on the other hand, the matrix aided in both identifying
information gaps and dispelling any false sense of confidence regarding the amount of
evidence used.
110 George A. Miller, “The Magical Number Seven—Plus or Minus Two: Some Limits on our Capacity for Processing Information,” The Psychological Review, Vol. 63, No. 2 (March 1956): 1-12.
Group Avg. # of pieces of evidence used
Control 2.9
Experimental 10.1
Figure 4.5
45
There were no discernible patterns in the words used to describe the estimative
probability assigned to the results (the WEPs) among the control and experimental groups
related to cognitive bias. As can be seen in Figure 4.5, participants in both groups
overwhelmingly used “likely” as the WEP in their estimative statement. I expected this
result because of the close nature of the election. Average analytic confidence among
both groups was very close, with the control group averaging 6.1 on a scale of 10 and the
experimental group averaging 5.9. Analyst assessments of source reliability were very
similar among both groups and sub-groups within, with an overwhelming number of
participants rating their overall source reliability as “medium,” on a low-medium-high
scale. This consistency likely has less to do with the method and more to do with the
analysts’ incomplete understanding of these concepts.
Summary of Results
46
The findings discussed in this section suggest that ACH is modestly effective for
improving accuracy and very effective at reducing the effects of mindsets and cognitive
bias in intelligence analysis. ACH slightly improved accuracy among users in the
experimental group. Among Republicans, ACH appeared to mitigate the effects of a
politically oriented mindset regarding the Republican candidate. This was not the case
with Democrats, but this was likely because the Democratic candidate won the election,
hindering the ability to discern any difference between the control and experimental
groups. Regarding the use of evidence, ACH users incorporated substantially more
evidence into their analysis and applied it more appropriately. Specifically, a tendency
among nearly all control group participants to only incorporate evidence in favor of their
forecasted candidate strongly suggests confirmation bias. This, however, appeared to be
substantially mitigated by ACH.
CONCLUSION
47
The main purpose of this study was to ascertain whether or not ACH is effective
for estimation and forecasting in intelligence analysis. The secondary purpose was to
determine whether or not the methodology is effective for mitigating cognitive bias and
other phenomena detrimental to intelligence analysis. While most of these results are not
definitive, they all support the notion that ACH can improve intelligence analysis.
The results of this experiment revealed that ACH improved forecasting accuracy,
but only modestly. With the exception of one component of Folker’s experiment, where
ACH/hypothesis testing performed drastically better than intuition, the minute difference
in accuracy between the control and experimental groups in this study is consistent with
all other testing on the methodology’s accuracy.
A common variable in both these experiments was that the objective likelihoods
of the given hypotheses were very close. On the other hand, in the component of Folker’s
experiment where ACH/hypothesis testing performed drastically better, it was clear that
one of the given hypotheses was much more likely than the others.111 This suggests,
perhaps, that ACH is less effective with those problems where the objective probabilities
of each hypothesis are roughly equal and more so when they are slightly more uneven.
This inference helps us identify when ACH is most appropriate to use. In this
case, the results on accuracy have shed light on the utility of the methodology with
problems subject to varying objective probabilities among the given hypotheses. This
experiment and previous ones already suggest that ACH is less useful where those
probabilities are roughly equal. On the other end of the spectrum, when those
111 These facts are derived from observing Folker’s priori evaluation of the intelligence scenarios and given evidence.
Figure 5.1
48
probabilities are very clear, a structured methodology is obviously unnecessary. To be
specific, the accumulated data suggests ACH may only be effective where the objective
probability of the most likely hypothesis is at least 10-15 percentage points above the
next most likely hypothesis. Such a probabilistic “distance” should allow the rough tool
that ACH is (compared to more refined statistical measurements) to distinguish the more
likely hypothesis from the less like ones. On the other hand, as the objective probability
of the most likely hypothesis rises more than 30-45 percent above the next most likely
hypothesis, ACH or, indeed, any structured method will become increasingly
unnecessary. The differences between the two hypotheses will be “visible to the naked
eye,” in a manner of speaking. The graph in Figure 5.1 demonstrates this concept for a
two hypothesis scenario.
Practically, implementing this suggestion is difficult if not impossible. Assigning
objective probabilities to realistic intelligence scenarios is fraught with difficulty. That
said, this suggestion may well provide avenues for future research into the utility of
49
ACH. Given this idea, a number of future experiments could be designed to shed further
light on ACH’s utility in varying circumstances. A subsequent experiment could test the
methodology’s utility with two hypotheses when the objective probabilities are more
uneven, such as 70 – 30 percent. Another varying condition could be the number of
hypotheses. The analytic problem in this experiment contained only two hypotheses;
however, future experiments could test ACH against a problem with more than two
hypotheses that has any set of objective probabilities.
ACH also appeared to mitigate the effects of politically oriented mindsets among
some participants; however, this is uncertain because of the conditions for measuring
such an effect. Overall, the researcher was surprised that the difference was not more
pronounced. I confidently expected, given the nature of the analytic problem and one
with close objective probabilities of each hypothesis, that politically oriented mindsets
would be present and would tip the balance in many participants’ forecasts. This
appeared to be the case with Republican participants, but at far less a magnitude than
expected. Anecdotally, I feel that the disparity in evidence used by participants was partly
responsible for this result.
For future tests like this one, an overall larger sample size would also be
beneficial since these tests required breaking down participants further into subsets
within each group, creating even smaller data sets and decreasing their reliability. This
suggestion is not meant to cast doubt on the interpretation that ACH helped mitigate the
influence of politically oriented mindsets, but instead is meant as an explanation as to
why this tendency was less evident than expected. The influence of mindsets was present,
50
but the researcher believes a similar test with a larger sample size would have likely
helped create a result more commensurate with his original expectation.
Confirmation bias was clearly evident among those using intuitive analysis in the
control group. On the other hand, the near non-existence of this in the experimental group
suggests ACH substantially reduced this bias in the experimental group. This finding is
unique and unlike previous studies in several ways. First, the method of measuring and
discerning such an effect is vastly different than that of Cheikes, et. al. Rather than
focusing on evidence distortion for discerning the presence of confirmation bias, the
researcher derived his conclusion solely from the comparative use of evidence and how it
related to analysts’ forecasts. This is more in line with the Wickens and Holland’s
definition of confirmation bias, which emphasizes the idea of seeking and incorporating
information that supports a preferred hypothesis and ignoring or discrediting evidence
unfavorable to a preferred hypothesis. Lastly, the substantial difference between the two
groups is also unlike any other finding on ACH and confirmation bias. This difference
demonstrates that ACH is excellent for encouraging analysts to incorporate and weigh a
variety of discordant evidence against multiple hypotheses.
Overall, the differences in evidence among those using intuitive analysis and
those using ACH were staggering. Not just in how the evidence was used, but even
simply in the amount of evidence used. ACH users incorporated a significantly higher
average number of pieces of evidence. This demonstrates that their analyses were overall
more thorough and comprehensive than analysts using intuition.
These findings also demonstrate the benefit of transparency and added
accountability derived from the use of structured methods. For every participant using
51
ACH, I can easily check every piece of evidence they used as well as how that evidence
contributed to their final conclusion. This was somewhat the case with the intuitive
thinkers, most of whom listed the evidence they used. However, their lists are nowhere as
organized and clear as the ACH matrices.
One possible flaw in this study which might have prevented more definitive
results was the varying evidence used among participants. While allowing participants to
collect their own information led to its own insights such as the finding on confirmation
bias, this created a less than ideal environment for comparing some results among users.
For example, did some of the experimental group participants forecast incorrectly
because using ACH was ineffective or because their research led to incorrect or
inadequate information? As Heuer explains, an ACH matrix is only as good as the
evidence it contains.112 While this aspect of the methodology created some interesting and
valid results, it unfortunately creates some level of uncertainty about other results.
Given this, another suggestion for future experiments would be to provide
participants with a base set of evidence, but like this experiment, allow them within their
given period of participation to seek out additional information. Providing a base set of
evidence would help control for the varying evidence used among participants but still
maintain conditions conductive to testing for mindsets and confirmation bias. Also, this
base set of evidence would act as a benchmark to compare to any additional information
participants collect – improving the ability to measure confirmation bias. However future
studies on ACH are structured, it will benefit our understanding of the methodology for it
to be tested in conditions varying from past studies.
112 Heuer, “Psychology,” 109.
52
The results of this experiment support my hypotheses that ACH can improve
forecasting accuracy and that it aids in mitigating biases and other cognitive phenomena.
However, these are far from definitive and more research is needed that validates these
findings and test ACH in varying conditions. Doing so will continue to expand our
understanding of the methodology and support efforts to improve the United States’
intelligence analysis capability via use of structured methods.
As suggested by various Congressional committees on intelligence, analysts in the
US Intelligence Community should begin taking advantage of effective tools and
methods which can improve their analysis. These analysts already have access to over
200 analytic methods – ACH being one of them. Taking into consideration both the need
for the use of such methods and the demonstrated ability of ACH to improve analysis,
there is no reason that structured methods should not be taken advantage of when
appropriate. Hence, the last step to improving intelligence analysis with structured
methods is innovative analysts willing to incorporate these tested methods into their daily
work. In answering the research question, I hope these findings promote the use of
structured methods that can improve the overall quality of intelligence analysis in the US
Intelligence Community.
53
BIBLIOGRAPHY
Cheikes, B.A., et al., Confirmation Bias in Complex Analyses. Technical Report No. MTR 04B0000017. (Bedford, MA: MITRE, 2004).
Chido, Diane and Richard M. Seward, Jr., eds. Structured Analysis of Competing Hypotheses: Theory and Application. Mercyhurst College Institute of Intelligence Studies Press, 2006.
Clark, Robert M. Intelligence Analysis: A Target-Centric Approach. Washington D.C.: CQ Press, 2007.
Clark, Robert M. Intelligence Analysis: Estimation and Prediction. Baltimore: American Literary Press, Inc., 1996.
Congressional Research Service Report for Congress. Proposals for Intelligence Reorganization, 1949-2004. 2004
Feder, Stanley A. “FACTIONS and Policons: New Ways to Analyze Politics.” Inside the CIA’s Private World: Declassified Articles from the Agency’s Internal Journal 1955-1992, ed. H. Bradford Westerfield. New Haven: Yale University Press, 1995.
Feder, Stanley A. “Forecasting for Policy Making in the Post-Cold War Period.” Annual Review of Political Science Vol. 5. (2002): 113-119.
Folker, Jr., Robert D Jr. (2000). Intelligence Analysis in Theater Joint Intelligence Centers: An Experiment in Applying Structured Methods. Washington D.C.: Joint Military Intelligence College, Occasional Paper #7, 2000.
Garson, David G. Guide to Writing Empirical Papers, Theses, Dissertations. New York: Marcel Dekker, Inc., 2002.
Gladwell, Malcolm. Blink: The Power of Thinking Without Thinking. New York: Back Bay Books/Little, Brown and Company, 2007.
Heuer, Jr. Richards. J. Adapting Academic Methods and Models to Governmental Needs: The CIA Experience. Carlisle Barracks: Strategic Studies Institute, 1978.
Heuer, Jr., Richards. J. “Limits of Intelligence Analysis,” Orbis, Winter 2005, 75-94.
Heuer, Jr., Richards J. Psychology of Intelligence Analysis. Washington D.C.: CIA Center
54
for the Study of Intelligence, 1999. Johnston, Rob. Analytic Culture in the US Intelligence Community: An Ethnographic Study. Washington D.C.: Center for the Study of Intelligence, 2005.
Johnston, Rob. “Integrating Methodologists into Teams of Substantive Experts.” Studies in Intelligence. Vol. 47. No. 1: 65.
LeGault, Michael R. Think: Why Crucial Decisions Can’t Be Made in the Blink of an Eye.
New York: Threshold Editions, 2006.
Lowenthal, Mark M. Intelligence: From Secrets to Policy. Washington D.C.: CQ Press, 2006.
Marrin, Stephen. “Intelligence Analysis: Structured Methods or Intuition?” American Intelligence Journal 25, no. 1 (Summer 2007): 7-10.
Merriam-Webster’s Collegiate Dictionary, 11th ed., s.v. “intuition.”
Merriam-Webster’s Collegiate Dictionary, 11th ed., s.v. “mindset.”
Merriam-Webster’s Collegiate Dictionary, 11th ed., s.v. “scientific method.”
Miller, George A. “The Magical Number Seven—Plus or Minus Two: Some Limits on our Capacity for Processing Information.” The Psychological Review, Vol. 63, No. 2
(March 1956): 1-12.
Myers, David G. Intuition: Its Powers and Perils. New Haven: Yale University Press, 2002.
Palo Alto Research Center. “ACH2.0 Download Page.” http://www2.parc.com/istl/projects/ach/ach.html (accessed August 19, 2008).
Pirolli, P. Assisting People to Become Independent Learners in the Analysis of Intelligence (Tech. No. CDRL A002). Palo Alto Research Center, Calif.: Office of Naval Research, 2006.
Scholtz, Jean. Analysis of Competing Hypotheses Evaluation (PARC) (No. Unpublished Report). Gaithersburg, MD: National Institute of Standards and Technology,
2004.
Tetlock, Philip E. Expert Political Judgment. Princeton: Princeton University Press, 2005.
Tversky, Amos and Daniel Kahneman. “Availability: A Heuristic for Judging Frequency
55
and Probability.” Cognitive Psychology 5 (1973), 207-232.
Tversky, Amos and Daniel Kahneman. “Judgment Under Uncertainty: Heuristics and Biases.” Science 185, no. 4157 (1974). JSTOR (accessed March 15, 2009).
United States Government. A Review of the Intelligence Community (The Schlesinger Report). 1971.
United States Government - Commission on the Intelligence Capabilities of the United States Regarding Weapons of Mass Destruction. Report to the President of the United States. Washington D.C., 2005. <http://www.wmd.gov/report/> (Accessed 22 January 2009).
United States Government - U.S. Commission on the Roles and Capabilities of the United States Intelligence Community, Preparing for the 21st Century: An Appraisal of U.S. Intelligence. Washington, D.C., 1996.
Washington Secretary of State. “November 4, 2008 General Election.” <http:/vote.wa.gov/elections/wei/Results.aspx?RaceTypeCode=O&JurisdictionTypeID=2&ElectionID=26&ViewMode=Results> (Accessed December 14, 2008).
Wheaton, Kristan J., D.E. Chido, and McManis and Monsalve Associates.“Structured Analysis of Competing Hypotheses: Improving a Tested Intelligence Methodology. Competitive Intelligence Magazine, November-December 2006. http://www.mcmanis-monsalve.com/assets/publications/intelligence-methodology-1-07-chido.pdf (accessed 14 June 2008).
Wickens, C.D, and Justin G. Hollands. Engineering Psychology and Human Performance.
3rd Ed. Upper Saddle River, NJ: Prentice Hall, 2000.
56
APPENDICES
57
Appendix A: Experiment Sign-Up Forms
Structured Methods ExperimentSign-Up Form
Name:
Class Year:
Phone Number:
E-mail Address:
Political Affiliation: (circle one) Republican Democrat
Instruction Session Dates/Times: (Rank preferences 1-4, 1=highest, 4=lowest)
Monday, 13 October 2008 – 5:00pm ____
Tuesday, 14 October 2008 – 6:00pm ____
Wednesday, 15 October 2008 – 5:00pm ____
Thursday, 16 October 2008 – 6:00pm ____
58
Upon completion, please return this form to Drew Brasfield or Travis Senor in CIRAT.
Contact Info:[email protected]
(205)542-8892
Appendix B: Experiment Consent Forms
The purpose of this research is to gauge factors of interest in various analytic methodologies.
Your participation involves a short instruction period, evaluating an intelligence scenario, and returning it to the administrator of the experiment. The instruction session should last no longer than 60 minutes and the evaluation can be completed at your convenience within the period of a week. Your name WILL NOT appear in any information disseminated by the researcher. Your name will only be used to notify professors of your participation in order for them to assign extra credit.
There are no foreseeable risks or discomforts associated with your participation in this study. Participation is voluntary and you have the right to opt out of the study at any time for any reason without penalty.
I, ____________________________, acknowledge that my involvement in this research is voluntary and agree to submit my data for the purpose of this research.
_________________________________ __________________
Signature Date
_________________________________ __________________
Structured methods thesis Experiment
Participation Consent Form
59
Printed Name Class
Name(s) of professors offering extra credit: ____________________________________
Researcher’s Signature: ___________________________________________________
If you have any further question about analytic methodology or this research you can contact me at [email protected].
Research at Mercyhurst College which involves human participants is overseen by the Institutional Review Board. Questions or problems regarding your rights as a participant should be addressed to Tim Harvey; Institutional Review Board Chair; Mercyhurst College; 501 East 38th Street; Erie, Pennsylvania 16546-0001; Telephone (814) 824-3372. [email protected]
Andrew Brasfield, Applied Intelligence Master’s Student, Mercyhurst College 205-542-8892
Kristan Wheaton, Research Advisor, Mercyhurst College 814-824-3021
60
Appendix C: Control & Experimental Group Tasking/Answer Sheets
You are a high-profile political analyst working for News Corporation X. You have been tasked to
forecast the winner of the 2008 Washington State Gubernatorial election, which will be decided
on November 4, 2008. To complete your task, use all available open source information. The
main candidates in this race are Christine Gregoire (D) and Dino Rossi (R). This will be a rematch
from the previous Washington State Gubernatorial election, which was hotly contested and
controversial. Your supervisor gave you a full week to prepare your forecast.
Use the National Intelligence Council (NIC) Words of Estimative Probability (WEP) as an indicator
of your forecast:
Remote Very Unlikely Unlikely Likely Very Likely Almost Certain
Example Forecast:
It is [WEP] that [Candidate Name] Will Win the 2008 Washington State Gubernatorial Election.
Record your final answers on the provided answer sheet. This answer sheet includes spaces for
your final estimate (WEP), Source Reliability, Analytic Confidence, and a short explanation of how
the evidence and subsequent analysis led to your final forecast. Please return all of the described
materials to the experiment administrator by the due date in order to receive extra credit from
your professor.
Task Due: 10/xx/2008
Experiment Administrator: Drew Brasfield, [email protected]
Structured Methods Thesis Experiment
GROUP 1 & 3 INSTRUCTIONS
61
Important Information:
Source Reliability:
Source Reliability reflects the accuracy and reliability of a particular source over time.
Sources with high reliability have been proven to be accurate and consistently reliable.
Sources with low reliability lack the accuracy and proven track record commensurate with
more reliable sources.
o Rate source reliability as low, medium, or high.
Analytic Confidence:
Analytic Confidence reflects the level of confidence an analyst has in his or her estimates
and analyses. It is not the same as using words of estimative probability, which indicate
likelihood. It is possible for an analyst to suggest an event is virtually certain based on
the available evidence, yet have a low amount of confidence in that forecast due to a
variety of factors or vice versa.
o To assess analytic confidence, mark your rating on the line given on the answer sheet. The
far left represents the lowest level of confidence while the far right represents absolute
confidence in your analytic judgment.
62
You are a high-profile political analyst working for News Corporation Y. You have been tasked to
forecast the winner of the 2008 Washington State Gubernatorial election, which will be decided
on November 4, 2008. To complete your task, use all available open source information. Also,
use ACH to structure your analysis. The main candidates in this race are Christine Gregoire
(D) and Dino Rossi (R). This will be a rematch from the previous Washington State Gubernatorial
election, which was hotly contested and controversial. Your supervisor gave you a full week to prepare your forecast.
Use the National Intelligence Council (NIC) Words of Estimative Probability (WEP) as an indicator
of your forecast:
Remote Very Unlikely Unlikely Likely Very Likely Almost Certain
Example Forecast:
It is [WEP] that [Candidate Name] Will Win the 2008 Washington State Gubernatorial Election.
Record your final answers on the provided answer sheet. This answer sheet includes spaces for
your final estimate (WEP), Source Reliability, Analytic Confidence, and a short explanation of how
the evidence and subsequent analysis led to your final forecast. Also include a print out of your
ACH matrix when returning the above materials. Please return all of the described materials to
the experiment administrator by the due date in order to receive extra credit from your professor.
Task Due: 10/xx/2008
Experiment Administrator: Drew Brasfield, [email protected]
Important Information:
Structured Methods Thesis Experiment
GROUP 2 & 4 INSTRUCTIONS
63
Source Reliability:
Source Reliability reflects the accuracy and reliability of a particular source over time.
Sources with high reliability have been proven to be accurate and consistently reliable.
Sources with low reliability lack the accuracy and proven track record commensurate with
more reliable sources.
o Rate source reliability as low, medium, or high.
Analytic Confidence:
Analytic Confidence reflects the level of confidence an analyst has in his or her estimates
and analyses. It is not the same as using words of estimative probability, which indicate
likelihood. It is possible for an analyst to suggest an event is virtually certain based on
the available evidence, yet have a low amount of confidence in that forecast due to a
variety of factors or vice versa.
o To assess analytic confidence, mark your rating on the line given on the answer sheet. The
far left represents the lowest level of confidence while the far right represents absolute
confidence in your analytic judgment.
64
NAME:
FORECAST:
SHORT EXPLANATION (not required):
SOURCE RELIABILITY (circle one) :
LOW MEDIUM HIGH
ANALYTIC CONFIDENCE:
Lowest Level Highest Level
of Confidence of Confidence
-------------------------------------------------------------------------------------------------------
Lowest Level Highest Level
Structured Methods Thesis Experiment
Answer Sheet
65
of Confidence of Confidence
Starting point:
http://www.politics1.com/wa.htm
Google/Google News
Types of relevant evidence:
● Incumbent/challenger popularity
● Election Polls
● Campaign spending
● Local issues relevant to the election
● Party issues
● National party support of incumbent/challenger
● Local economy
● State voting trends
● Voter registration
● Past elections
● Candidate debates
*This is not a list of required evidence to collect, but types of evidence that could be an indicator
for an election.
Other Important Information
66
Appendix D: Participant Debriefing Statement
Analysis of Competing Hypotheses
Participation Debriefing
Thank you for participating in this research process. I appreciate your contribution and willingness to
support the student research process.
The purpose of this study was to determine how well ACH mitigates cognitive bias and how accurate the
methodology is for forecasting in intelligence analysis, compared to unstructured methods. Only a handful
of experimental studies have been conducted on ACH, and this research hopes to contribute to the growing
body of literature on structured analytical methods. The experiment you participated in was designed to
test ACH’s capabilities against an unstructured method. Specifically, participants were organized into
experimental and control groups by political affiliation so that factors of interest could be measured.
As the US Intelligence Community faces recent intelligence failures, the use of advanced analytical
techniques will enhance the community’s quality of analysis and benefit US national security.
If you have any further questions about the Analysis of Competing Hypotheses or this research you can
contact me at [email protected].
67
Appendix E: Post Experiment Questionnaires
Follow-Up Questionnaire
Control Group
Thanks for your participation! Please take a few moments to answer the following questions. Your feedback is greatly appreciated. Your response to these questions will NOT affect whether or not you receive extra credit.
1. How much time did you spend working on the assigned task (hours)?
2. Why did you agree to participate in the experiment? (extra credit, other, etc.)
3. Do you feel you understood the assigned task as explained at the instruction session?
4. Were you able to find adequate open source information about the topic?
5. Please rate the level of difficulty in finding open source information related to the
topic:
1=Very difficult 5=Very Easy
1 2 3 4
6. Please provide any additional comments you may have about the Analysis of
Competing Hypotheses, the assigned task, or any other part of this experiment.
68
Follow-Up Questionnaire
ACH Group
Thanks for your participation! Please take a few moments to answer the following questions. Your feedback is greatly appreciated. Your response to these questions will NOT affect whether or not you receive extra credit.
1. How much time did you spend working on the assigned task (hours)?
2. Why did you agree to participate in the experiment? (extra credit, other)
3. Do you feel you understood the assigned task as explained at the instruction session?
4. Were you able to find adequate open source information about the topic?
5. Please rate the level of difficulty in finding open source information related to the
topic:
1=Very difficult 5=Very Easy
1 2 3 4 5
6. How helpful was ACH in creating your final estimate?
7. Please rate your understanding of ACH before participating in this experiment:1= No understanding of ACH 5=Very thorough understanding of ACH
1 2 3 4 5
8. Please rate your understanding of ACH after participating in this experiment:
1=No understanding of ACH 5=Very thorough understanding of
ACH
1 2 3 4 5
9. Please provide any additional comments you may have about the Analysis of
Competing Hypotheses, the assigned task, or any other part of this experiment.
Independent Samples Test
2.625 .110 .804 66 .425
.809 63.934 .421
Equal variancesassumed
Equal variancesnot assumed
ForecastF Sig.
Levene's Test forEquality of Variances
t df Sig. (2-tailed)
t-test for Equality of Means
Test Statisticsb
82.000
202.000
-.034
.973
1.000a
Mann-Whitney U
Wilcoxon W
Z
Asymp. Sig. (2-tailed)
Exact Sig. [2*(1-tailedSig.)]
Forecast
Not corrected for ties.a.
Grouping Variable: Groupb.
Ranks
15 13.47 202.00
11 13.55 149.00
26
GroupControl
Experimental
Total
ForecastN Mean Rank Sum of Ranks
69
Appendix F: SPSS Testing
Accuracy
Group Statistics
38 1.3947 .49536 .08036
30 1.3000 .46609 .08510
GroupControl
Experimental
ForecastN Mean Std. Deviation
Std. ErrorMean
Mindsets –Democrats
Wilcoxon Rank Sum test value = -0.034, P-value = 0.973 is larger than ( = 0.05).
Test Statisticsa
183.000
373.000
-1.055
.291
Mann-Whitney U
Wilcoxon W
Z
Asymp. Sig. (2-tailed)
Forecast
Grouping Variable: Groupa.
70
Mindsets – Republicans
Ranks
23 23.04 530.00
19 19.63 373.00
42
GroupControl
Experimental
Total
ForecastN Mean Rank Sum of Ranks
Confirmation Bias
Group Statistics
30 1.2000 .40684 .07428
32 1.9063 .29614 .05235
GroupControl
Experimental
Confirmation BiasN Mean Std. Deviation
Std. ErrorMean
Independent Samples Test
5.940 .018 -7.851 60 .000
-7.772 52.783 .000
Equal variancesassumed
Equal variancesnot assumed
Confirmation BiasF Sig.
Levene's Test forEquality of Variances
t df Sig. (2-tailed)
t-test for Equality of Means
Wilcoxon Rank Sum test value = -1.055, P-value = 0.291 is larger than ( = 0.05).
71