forecasting accuracy and cognitive bias in the analysis of competing hypotheses

FORECASTING ACCURACY AND COGNITIVE BIAS IN THE ANALYSIS OF COMPETING HYPOTHESES

ANDREW D. BRASFIELD

A Thesis

Submitted to the Faculty of Mercyhurst College

In Partial Fulfillment of the Requirements for

The Degree of

MASTER OF SCIENCEIN

APPLIED INTELLIGENCE

DEPARTMENT OF INTELLIGENCE STUDIESMERCYHURST COLLEGE

ERIE, PAMAY 2009

DEPARTMENT OF INTELLIGENCE STUDIESMERCYHURST COLLEGE

ERIE, PENNSYLVANIA

FORECASTING ACCURACY AND COGNITIVE BIAS IN THE ANALYSIS OF COMPETING HYPOTHESES

A ThesisSubmitted to the Faculty of Mercyhurst CollegeIn Partial Fulfillment of the Requirements for

The Degree of

MASTER OF SCIENCEIN

APPLIED INTELLIGENCE

Submitted By:

ANDREW D. BRASFIELD

Certificate of Approval:

_______________________________________Kristan J. WheatonAssistant ProfessorDepartment of Intelligence Studies

_______________________________________James G. BreckenridgeChair/Assistant ProfessorDepartment of Intelligence Studies

________________________________________Phillip J. BelfioreVice PresidentOffice of Academic Affairs

May 2009

Copyright © 2009 by Andrew D. BrasfieldAll rights reserved.

iii

DEDICATION

This work is dedicated to Melody and Dharma

for being patient with my busy schedule during the last two years.

iv

ACKNOWLEDGEMENTS

First, I would like to thank Professor Kris Wheaton for his guidance and advice during

this process over the last year.

I also would like to thank Professor James Breckenridge for taking the role of my

secondary reader.

I also owe thanks to Professor Stephen Marrin for helping me obtain various documents

pertinent to my literature review.

I would also like to thank Kristine Pollard for her technical assistance during this process,

and; without whom, I would not have been able to begin this process last summer.

I would also like to thank Hemangini Deshmukh for assisting in applying statistical

testing to the results of this thesis.

Lastly, I would like to thank Travis Senor for his assistance while conducting the

experiment.

v

ABSTRACT OF THE THESIS

FORECASTING ACCURACY AND COGNITIVE BIAS IN THEANALYSIS OF COMPETING HYPOTHESE

By

Andrew D. Brasfield

Master of Science in Applied Intelligence

Mercyhurst College, 2009

Assistant Professor Kristan J. Wheaton, Chair

[The Analysis of Competing Hypotheses (ACH) is an analytic methodology used

in the United States Intelligence Community to aid qualitative analysis. Taking into

consideration what previous studies found, an experiment was conducted testing the

methodology’s estimative accuracy as well as its ability to mitigate cognitive phenomena

which hinder the analytical process. The findings of the experiment suggest ACH can

improve estimative accuracy, is highly effective at mitigating some cognitive phenomena

such as confirmation bias, and is almost certain to encourage analysts to use more

information and apply it more appropriately. However, the results suggest that ACH may

be less effective for an analytical problem where the objective probabilities of each

hypothesis are nearly equal. Given these findings, future studies should focus less on the

question of ACH’s general efficacy, but instead should aim to expand our understanding

of when the methodology is most appropriate to use.]

vi

TABLE OF CONTENTS

Page

COPYRIGHT PAGE……………………………………………………………... iii

DEDICATION……………………………………………………………………. iv

ACKNOWLEDGEMENTS………………………………………………………. v

ABSTRACT………………………………………………………………………. vi

TABLE OF CONTENTS…………………………………………………………. vii

LIST OF TABLES………………………………………………………………... ix

LIST OF FIGURES………………………………………………………………. x

CHAPTER

1 INTRODUCTION………………………………… 1

2 LITERATURE REVIEW…………………….…… 5

Key Terms…………...…………………...... 5The Debate: StructuredV. Unstructured Methods…………………. 8

Structured Methods in Intelligence……....... 17Analysis of Competing Hypotheses………. 19

Strengths & Weaknesses………...... 24 Previous Studies…………………... 25

Hypotheses………………………… 28

3 METHODOLOGY……………………………….. 29

Research Design…………………………... 29Participants………………………... 29Procedures………………………… 31Control Group…………………...... 33Experimental Group………………. 36

Data Analysis……………………............... 36

vii

4 RESULTS………………………………………… 38

Accuracy………………………………...... 38Mindsets…………………………………... 39Confirmation Bias………………………… 42Other Findings of Interest………………… 44Summary of Results………………………. 46

5 CONCLUSION…………………………………… 47

BIBLIOGRAPHY………………………………………………………………… 53

APPENDICES……………………………………………………………………. 56

Appendix A: Experiment Sign-Up Forms………… 57 Appendix B: Experiment Consent Forms………… 58 Appendix C: Control & Experiment Group Tasking/Answer Sheets……………. 60 Appendix D: Participant Debriefing Statement………………………....... 66 Appendix E: Post-Experiment Questionnaires……………………… 67 Appendix F: SPSS Testing……………………...... 69

viii

LIST OF TABLES

Page

Table 4.1 Comparative Use of Evidence Between Groups…………………………………………… 44

ix

LIST OF FIGURES

Page

Figure 2.1 Example ACH Matrix…………………………… 21

Figure 3.1 Participant Education Level……………………... 30

Figure 3.2 Group Comparison by Class Year………………. 30

Figure 3.3 Participant Political Affiliation by Group………. 31

Figure 3.4 National Intelligence Council Words of Estimative Probability………………… 34

Figure 3.5 Experiment Words of Estimative Probability…………………………… 34

Figure 3.6 Continuum-like Scale……………………………. 35

Figure 4.1 Results for Accuracy……………………………. 38

Figure 4.2 Results for Mindsets……………………………. 40

Figure 4.3 Findings on Confirmation Bias………………… 42

Figure 4.4 SPSS Testing on Confirmation Bias……………. 43

Figure 4.5 Words of Estimative Probability by Group……………………………. 45

Figure 5.1 Graph of ACH’s Utility with Varying Objective Probabilities………………… 48

x

1

INTRODUCTION

In light of recent intelligence failures, such as Iraq’s alleged possession of

weapons of mass destruction (WMD), it is clear that the United States Intelligence

Community could improve the process it uses to reach analytic judgments. Traditionally,

such judgments are reached through intuitive thinking. However, one of the

recommendations of the Commission on the Intelligence Capabilities of the United States

Regarding Weapons of Mass Destruction was that “the [intelligence] community must

develop and integrate into regular use new tools to assist analysts in filtering and

correlating the vast quantities of information that threaten to overwhelm the analytic

process.”1 This statement represents the growing belief that structured methods can help

the United States Intelligence Community’s analytic capabilities reach the quality and

accuracy required by US policy makers.

One structured analytic method, the Analysis of Competing Hypotheses (ACH),

can potentially assist in the improvement of analysis in the US Intelligence Community.

In this structured technique, the scientific method is incorporated into the analytic process

by weighing multiple hypotheses in a matrix, evaluating all evidence for and against

each, and determining the likelihood of all possibilities by trying to disprove hypotheses.2

Researchers have found that this methodology can help "analysts overcome cognitive

1 United States Government - Commission on the Intelligence Capabilities of the United States Regarding Weapons of Mass Destruction. Report to the President of the United States, (Washington D.C., 2005), 402. <http://www.wmd.gov/report/> (Accessed 22 January 2009)2 Richards J. Heuer, Jr.,“Limits of Intelligence Analysis.” Orbis (Winter 2005): 92.

2

biases, limitations, mindsets, and perceptions...”3 In general, structured methods such as

ACH can offer a variety of potential benefits to intelligence analysis.

The primary benefit is the added element of the scientific method. This, in theory,

improves the quality and accuracy of analysis by imposing structure onto our limited, and

often flawed, cognitive processes. A secondary potential benefit to the intelligence

community is increased transparency and accountability. That is, structured methods

make the analytic process and end product easier to critique and evaluate. This is

important for both analysts and their supervisors so that mistakes and successes can more

easily be identified and understood for the improvement of future efforts. Likewise, in the

aftermath of intelligence analysis failures and successes, accountability is more certain.

Despite these potential benefits, there are some obstacles to the use of structured

methods in the US Intelligence Community. First, although there are over 200 analytic

methods available to intelligence analysts, exposure to these methods has been minimal.4

Because of this, it is likely most analysts in the US Intelligence Community are unaware

of the existence of methods that could aid their work, let alone have received training that

would enable them to use such methods.

The most difficult hurdle is an analytic culture predisposed to intuitive thinking

and skeptical of, if not hostile, to the notion of structured methods. One researcher notes

that this attitude is partly justified from the lack of empirical evidence suggesting

structured methods can improve intelligence analysis.5 According to Dr. Rob Johnston in

3 Diane Chido and Richard M. Seward, Jr., eds. Structured Analysis of Competing Hypotheses: Theory and Application (Mercyhurst College Institute of Intelligence Studies Press, 2006), 48.4 Rob Johnston, “Integrating Methodologists into Teams of Substantive Experts,” Studies in Intelligence,” Vol. 47. No. 1: 65.5 Stephen Marrin, “Intelligence Analysis: Structured Methods or Intuition?” American Intelligence Journal 25, no. 1 (Summer 2007): 10.

3

his ethnographic study of the US Intelligence Community’s analytic culture, he concludes

that empirical evidence is exactly what is needed:

The principal difficulty lies not in developing the methods themselves, but in articulating those methods for the purpose of testing and validating them and then testing their effectiveness throughout the community. In the long view, developing the science of intelligence analysis is easy; what is difficult is changing the perception of the analytic practitioners and managers and, in turn, modifying the culture of tradecraft.6

Hopefully, the quantitative data derived from this experimental study will offer insights

into the utility of structured methods and ACH specifically and challenge commonly held

assumptions within the US Intelligence Community.

Taking into account that previous studies on ACH have yielded mixed and

inconclusive results, the purpose of this study is to add to the small number of such

studies and shed further light on ACH’s utility and efficacy with intelligence analysis

problems in varying circumstances. Specifically, the primary goal of this study is to

evaluate the estimative accuracy of the methodology compared to intuitive analysis. A

secondary purpose, if possible, is to ascertain whether ACH can mitigate cognitive

phenomena that hinder our ability to think clearly and accurately. From the quantitative

data I collect, I hope to gain insight regarding the methodology’s usefulness for analysts

in the US Intelligence Community.

Unfortunately, there are some limitations to this study. These limitations pertain

to the number of relevant research questions that can be addressed, as well as

experimental conditions that are not ideal but impossible to avoid with the given

resources. While ACH offers numerous potential benefits to analysis, such as those

6 Rob Johnston, Analytic Culture in the US Intelligence Community (Washington D.C.: Center for the Study of Intelligence, 2005), 20-21.

4

related to hypothesis generation and its use in a team environment, the primary goals of

this experiment are to test the methodology’s accuracy and its ability to mitigate

cognitive biases. Designing experiment conditions to maximize the capacity to measure

these particular factors of interests at the expense of secondary research questions is a

necessary sacrifice.

Another limitation is available resources. The ideal participants for an

experimental study such as this one would be US Intelligence Community analysts who

are specifically experienced with ACH. Participants with these qualifications would

likely provide higher quality and more valid results. Although all participants using ACH

will have had some experience with the methodology, this study did not have access to a

participant pool with the ideal qualifications.

The nature and order of this study will be as such: First, the researcher will review

the existing body of literature pertinent to the topic, including important terms of

reference, the debate on the use of structured methods, as well as current and past use of

such methods in the US Intelligence Community. Next, the researcher will explain the

methodology for the experiment and the subsequent results. Finally, the researcher will

offer his final interpretation of the experiment results and postulate their implications for

the use of structured methods in the US Intelligence Community.

LITERATURE REVIEW

5

To fully understand the purpose and place of this study and its experiment, it is

necessary to review important concepts and debates relevant to the use of structured

analytical techniques in the US Intelligence Community. First, this chapter will define

and discuss key terms such as intelligence, structured methods, and intuition. Next, this

chapter will attempt to summarize the debate on the use of structured and unstructured

analytical methods from a variety of perspectives. These will include views from

cognitive psychology, experts from within the US Intelligence Community, and empirical

studies on the topic. Furthermore, a general description of the use of structured methods

in the US Intelligence Community will follow. This will include subsections on current

use, explanations for the non-use of structured methods, and finally an in-depth discourse

on ACH itself. This study’s hypotheses will emerge from the intersection of all these

elements.

Key Terms

While the definition of intelligence has been debated for some time, several key

characteristics are clear. Mark Lowenthal, in his book, Intelligence: From Secrets to

Policy, partly describes intelligence as a process where relevant information is

“requested, collected, analyzed, and provided to policy makers…”7 While this common

definition is accurate, it is missing a very important element that is integral to the purpose

of intelligence analysis. Robert M. Clarke points this out in Intelligence Analysis: A

Target-Centric Approach by simply stating, “Intelligence is about reducing uncertainty in

7 Mark M. Lowenthal, Intelligence: From Secrets to Policy (Washington D.C.: CQ Press, 2006), 9.

6

conflict.”8 Therefore, the ultimate purpose of intelligence analysis is estimating the nature

of current and future events. That is, using information to clarify the likelihood or nature

of these events for a policy maker.

From these concepts comes the Mercyhurst College Institute for Intelligence

Studies (MCIIS) definition of intelligence, which incorporates all of the above concepts

into a comprehensive, accurate definition which states, “[intelligence is] a process

focused externally, designed to reduce the level of uncertainty for a decision maker using

information derived from all sources.”9 While the debate continues and this definition is

not definitive, it will suffice in laying the intellectual groundwork for this research.

According to Robert D. Folker, “Quantitative intelligence analysis separates the

relevant variables of a problem for credible numerical measurement. Qualitative

intelligence analysis breaks down topics and ideas that are difficult to quantify into

smaller components for better understanding.”10 Within the US Intelligence Community,

quantitative and qualitative intelligence analysis is most commonly conducted with

unstructured methods.

One former CIA analyst, Stephen Marrin, defines structured analytic methods as

“those techniques which have a formal or structured methodology that is visible to

external observers.”11 From this, it is apparent that the key features of a structured

analytic method are that it is systematic in nature and is externalized from the human

8 Robert. M. Clark, Intelligence Analysis: A Target-Centric Approach (Washington D.C.: CQ Press, 2007), 8.9 Diane Chido, et al., 9.10 Robert D. Folker Jr. Intelligence Analysis in Theater Joint Intelligence Centers: An Experiment in Applying Structured Methods (Washington D.C.: Joint Military Intelligence College, Occasional Paper #7, 2000), 5; citing Robert M. Clark, Intelligence Analysis: Estimation and Prediction (Baltimore: American Literary Press, Inc., 1996), 30.11 Marrin, 7.

7

mind - typically in some visual format. This suggests that inherent in any systematic

method of analysis is the spirit of the scientific method, defined as “principles and

procedures for the systematic pursuit of knowledge involving the recognition and

formulation of a problem, the collection of data through observation and experiment, and

the formulation and testing of hypotheses.”12 In contrast, unstructured methods, which

lack such elements, are commonly referred to in intelligence as “intuitive analysis.”

Developing our understanding of these concepts is important because analysis is a

critical component of intelligence. Although much reform within the US national security

and intelligence infrastructure has focused on collection and dissemination of

intelligence, Folker states that “the root cause of many critical intelligence failures has

been analytical failure,” citing examples such as the North Korean invasion of South

Korea in 1950, the Tet Offensive in Vietnam, the fall of the Shah of Iran, and the

development of India’s nuclear program.13

However, the need to improve the analytic process is not unknown within the US

Government. As early as the 1940s and through the Cold War, numerous government

reports on intelligence, such as the Dulles-Jackson-Correa and Schlesinger reports,

recommended that government entities with an intelligence function take measures to

improve the analytic process and production of estimates.14 More recently, the US

Commission on the Roles and Capabilities of the United States Intelligence Community

specifically criticized the lack of resources allocated to “developing and maintaining

12 Merriam-Webster’s Collegiate Dictionary, 11th ed., s.v. “scientific method.” 13 Folker, 3-4.14 Congressional Research Service Report for Congress, Proposals for Intelligence Reorganization, 1949-2004. 2004, 6; United States Government, A Review of the Intelligence Community, (The Schlesinger Report) (1971), 44.

8

expertise among the analytical pool.”15 Amidst these recommendations, there is much

debate within the US Intelligence Community on how to improve analysis and whether or

not structured methods should be a part of that solution.

The Debate: Structured V. Unstructured Methods

There has been a longstanding debate inside and outside of the US Intelligence

Community over the use of structured and unstructured methods for analysis and decision

making. On one side are those who believe intuitive thinking is sufficient for problem

solving and that scientific methods are inadequate when addressing the same problems.

On the other side of the debate are those who argue that structured and scientific methods

can supplement intuitive thinking and improve its quality. This debate begins with

cognitive psychology and understanding how the simplest and most basic human thought

processes affect efforts at critical thinking.

The research of various psychologists suggests that limitations in human

cognition are inherent and can be detrimental to critical thinking. Specifically, the

research of Daniel Kahneman and Amos Tversky suggests that intuitive thinking can be

thought of as the mind’s shortcut mechanism to aid quick decision making. That is,

taking large amounts of ambiguous and sometimes contradictory information in quick

succession and assimilating that into a succinct explanation of the information being

perceived. Despite its utility in situations requiring this ability, such as deciding whether

to run from a perceived threat or stand and fight, these simplified and more efficient

15United States Government - U.S. Commission on the Roles and Capabilities of the United States Intelligence Community, Preparing for the 21st Century: An Appraisal of U.S. Intelligence (Washington, D.C., 1996), 83.

9

cognitive processes are also inherently subject to a higher number of judgmental errors.16

These judgmental errors are believed to be caused by cognitive biases, defined as “mental

errors caused by our simplified information processing strategies.”17 In Intuition: Its

Powers and Perils, David G. Myers elaborates on these specific advantages and

judgmental errors which result from intuitive thinking. The simple advantage offered by

intuition is the ability to quickly and efficiently process large quantities of information.18

In Blink: The Power of Thinking Without Thinking, Malcolm Gladwell argues for

our ability to use this, which he calls “thin-slicing.”19 Gladwell not only advocates the use

of intuitive thinking, but also argues that it can be just as effective as, if not superior, to

scientific methods of analysis. To support his assertions, Gladwell provides handfuls of

real-life examples that seemingly demonstrate the efficacy of intuition, as well as the

findings of some scientific studies. However, his own discussion on the fallibility of

intuition to cognitive biases undermines his own argument.

While speed and efficiency are two advantages of intuitive thinking, inherent

limitations in human cognition are its Achilles’ heel. Summing up the research of Herbert

Simon, Richards J. Heuer, Jr. explains the use of mindsets in human cognition:

Because of limits in human mental capacity, he argued the mind cannot cope directly with the complexity of the world. Rather, we construct a simplified mental model of reality and then work with this model. We behave rationally within the confines of our mental model, but this model is not always well adapted to the requirements of the real world.20

16 Amos Tversky and Daniel Kahneman, “Judgment Under Uncertainty: Heuristics and Biases,” Science 185, no. 4157, pp. 1124-1131 (1974), JSTOR (accessed March 15, 2009), 1124.17 Richards J. Heuer, Jr., Psychology of Intelligence Analysis (Washington D.C.: CIA Center for the Study of Intelligence, 1999), 111. 18 David G. Myers, Intuition: Its Powers and Perils (New Haven: Yale University Press, 2002), 3-5.19 Malcolm Gladwell, Blink: The Power of Thinking Without Thinking (New York: Back Bay Books/Little, Brown and Company, 2007), 23.20 Heuer, “Limits,” 78; citing Herbert Simon, Models of Man (New York: John Wiley & Sons, 1957).

10

According to Heuer, these mindsets, which Webster’s defines as “a mental attitude or

inclination,” and as “a fixed state of mind,” serve a good purpose for the most part.21

When information is incomplete, ambiguous, or contradictory, mindsets help assimilate

new information quickly and efficiently by using an existing mental framework based on

previous experience, education, and preconceptions to interpret that information.

However, these rigid mindsets sometimes betray our judgment because they do not adapt

well when new information challenges strongly held beliefs and preconceptions.22 One

former CIA analyst, Stanley Feder, specifically identifies mindsets as being “a major

cause of intelligence and policy failures for decades.”23

Intuition further discusses two other biases particularly relevant to intelligence

analysis: overconfidence and confirmation bias. While overconfidence is self-

explanatory, confirmation bias is defined as the tendency “for people to seek information

and cues that confirm the tentatively held hypothesis or belief, and not seek (or discount),

those that support an opposite conclusion or belief.”24 A relevant example of this was the

tendency of some in the US Intelligence Community leading up to the invasion of Iraq in

2003 to seek evidence confirming the established belief that Saddam Hussein had

weapons of mass destruction while discounting or neglecting dissonant evidence.25

21 Heuer, “Limits, 86; Merriam-Webster’s Collegiate Dictionary, 11th ed., s.v. “mindset.”22 Heuer, “Limits, 76, 81, 83, 86.23 Stanley A. Feder. “Forecasting for Policy Making in the Post-Cold War Period,” Annual Review of Political Science Vol. 5. (2002): 113.24 Christopher D. Wickens and Justin G. Hollands, Engineering Psychology and Human Performance, 3rd ed. (Upper Saddle River, NJ: Prentice Hall, 2000), 312.312.25 United States Government. Commission on the Intelligence Capabilities of the United States Regarding Weapons of Mass Destruction. Report to the President of the United States, 31 March 2005, (Washington D.C.), p 162. <http://www.wmd.gov/report/>

11

There is a plethora of other cognitive biases that also plague intuitive thinking in

intelligence. These biases can manifest themselves in research strategy, perception, and

memory. One of the major criticisms of intuitive thinking is that it has the tendency to

identify the first plausible or reasonable hypothesis and seek evidence that supports this

hypothesis, known as “satisficing.”26 The problem with this method is that often the same

evidence is also consistent with any number of alternative hypotheses. Given this, an

analyst risks fooling himself into thinking he has identified the most likely hypothesis,

but unaware he is overlooking other valid, and possibly more likely, alternatives. Also

among these is vividness bias, which is the tendency for vivid evidence to have greater

influence on our thinking than less vivid evidence, regardless of its true value.27 Another

common cognitive bias found in intuitive thinking is availability bias, which is the

tendency for people to estimate the likelihood of an event largely based on how many

relevant past instances they can recall and how easily they come to mind.28 These are

only a few among many cognitive biases that can hinder human cognition.

Acknowledging its weaknesses, Gladwell states that intuition’s effectiveness is

dependent on the absence of these biases.29 This opens an important question regarding

the utility of intuition in intelligence analysis. That is, if the efficacy of intuitive thinking

is dependent on the absence of such biases, then how prominent are these in human

cognition? Specifically, if these biases are prominent and difficult to willfully bypass, this

would suggest that intuition alone is ineffective when dealing with high-risk analytic

decision making. This is where Gladwell’s argument unravels because these biases are 26 Heuer, “Psychology,” 44. 27 Ibid, 116.28 Amos Tversky and Daniel Kahneman, “Availability: A Heuristic for Judging Frequencyand Probability,” Cognitive Psychology, 5 (1973): 207-232.29 Gladwell, 72-76.

12

pervasive and difficult to avoid in intuitive thinking. Heuer likens these biases to “optical

illusions in that the error remains compelling even when one is fully

aware of its nature. Awareness of the bias, by itself, does not produce

a more accurate perception.”30

Michael LeGault contributes to the list of flaws in Gladwell’s argument for

intuition with his book, Think: Why Crucial Decisions Can’t Be Made in the Blink of an

Eye, pointing out that many of the examples he gives are misleading or out of context.

Among these include the case of a museum which purchased what was assumed to be an

authentic Greek statue for its collection. From the start, various experts felt something

was wrong with the statue and these intuitive impressions subsequently led to the

discovery that it was a forgery. LeGault correctly points that these initial impressions

were not really the work of pure intuition, but resulted from observers’ expertise and

scientific inquiry, albeit at the unconscious level at first.31

Although intuitive thinking is the predominant style of analysis in the United

States national security and intelligence infrastructure,32 the use of structured methods has

been “debated in analytic circles for decades.”33 According to Folker, “At the heart of this

controversy is the question of whether intelligence analysis should be accepted as an art

(depending largely on subjective, intuitive judgment) or a science (depending largely on

structured, systematic analytic methods).”34

30 Heuer, “Psychology,” 112.31 Michael R. LeGault, Think: Why Crucial Decisions Can’t Be Made in the Blink of an Eye (New York: Threshold Editions, 2006), 8-10.32 Marrin, 9.33 Ibid, 8.34 Folker, 6.

13

Of these two ideological camps, advocates of intelligence analysis as an art

believe that many factors in a given analytic problem are too complex and abstract to be

incorporated into methods that are rigid and scientific.35 Hence, Folker sums up; this side

argues that the most effective qualitative analysis “is an intuitive process based on

instinct, education, and experience.”36 Even those who acknowledge structured methods

can improve analysis contend such improvements would be so minute that resources

would be better allocated to improving some other aspect of intelligence.37

Advocates of intelligence analysis as a science argue that structured methodology

improves analysts’ ability to evaluate evidence and form conclusions.38 Additionally,

Folker states, “there is also a concern that the artist [analyst] will fall in love with his art

and be reluctant to change it even in the face of new evidence. The more scientific and

objective approach encourages the analyst to be an honest broker and not an advocate.”39

These proponents argue that while subject-matter expertise has its utility, this also

predisposes an analyst to be stuck within the confines of their own subject-area’s

heuristics, which can manifest themselves as cognitive biases.40 Heuer further makes the

case for the use of structured methods when he points out that the “the circumstances

under which accurate perception is most difficult are exactly the circumstances under

which intelligence analysis is generally conducted—making judgments about evolving

35 Folker, 6-7, citing Richard K. Betts, “Surprise, Scholasticism, and Strategy: A Review of Ariel Levite’s Intelligence and Strategic Surprises (New York: Columbia University Press, 1987),” International StudiesQuarterly 33, no. 3 (September 1989): 338. 36 Folker, 7; citing Tom Czerwinski, ed. Coping with the Bounds: Speculations in Nonlinearity in Military Affairs (Washington: National Defense University, 1998), 139.37 Folker, 9.38 Ibid, 10.39 Ibid, 10.40 Johntson, “Integrating Methodologists Into Teams of Substantive Experts,” 65.

14

situations on the basis of incomplete, ambiguous, and often conflicting information that is

processed incrementally under pressure for early judgment.”41

Obtained through his experience in the US Intelligence Community, Feder offers

empirical insight that argues for the utility of structured methods in some circumstances.

While serving as a political analyst at the CIA, Feder used one particular structured

quantitative method to forecast more than 1200 international events.42 During this time,

he found that the structured method, when “compared with conventional intelligence

analyses…had more precise forecasts without sacrificing accuracy.”43 Feder also claims

that another specific structured method used at the CIA “helped avoid analytic traps and

improved the quality of analyses by making it possible to forecast specific outcomes and

the political dynamics leading to them.”44 Also, while this method did not increase

forecasting accuracy over intuitive analysis, it did provide more nuanced results.45

The research and experimentation of Phillip Tetlock suggests that in general,

intuition is lacking as an analytic method. However, cognitive styles similar to structured

methods of thinking were found to be correlated to better judgment. In his book, Expert

Political Judgment, the author aims to define indicators of good judgment, concluding,

“What experts think matters far less than how they think.”46 Tetlock uses a concept first

illustrated by Isaiah Berlin in “The Hedgehog and the Fox” from The Proper Study of

Mankind:

41 Heuer, “Limits,” 78-79.42 Feder, “Forecasting,” 118-119.43 Ibid, 119.44 Stanley A. Feder, “FACTIONS and Policons: New Ways to Analyze Politics.” Inside the CIA’s Private World: Declassified Articles from the Agency’s Internal Journal, 1955-1992, ed. H. Bradford Westerfield (New Haven: Yale University Press, 1995), 275.45 Feder, “FACTIONS,” 275.46 Philip E. Tetlock, Expert Political Judgment (Princeton: Princeton University Press, 2005), 2.

15

If we want realistic odds on what will happen next…we are better off turning to experts who embody the intellectual traits of Isiah Berlin’s prototypical fox – those who ‘know many little things,’ draw from an eclectic array of traditions, and accept ambiguity and contradiction as inevitable features of life – than we are turning to Berlin’s hedgehogs – those who ‘know one big thing,’ toil devotedly within one tradition, and reach for formulaic solutions to ill-defined problems.47

In his research, Tetlock analyzed and compared the forecasts of human participants and

“mindless” statistical strategies.48 Among the human participants were subject-matter

experts and amateurs, all of who used intuitive thinking.49 These groups made predictions

on the short and long-term futures of economic, political, and national security policies of

numerous countries.50 Examining the quantitative results, Tetlock discovered that human

participants, even when advantaged with subject-matter expertise, always performed

worse than various statistical strategies of assigning likelihoods. However, Tetlock

noticed a level of consistency in some forecasters that clearly was not the result of

chance.51

To explain this, he searched the results for correlations in good judgment to

participants’ backgrounds, belief systems, and cognitive style - how they think. The data

showed that level of education and professional experience had no correlation to better

judgment.52 To measure cognitive style, all participants answered a questionnaire, from

which Tetlock discovered a significant correlation between participants’ cognitive styles

and their forecasting accuracy. The questionnaire revealed two dominant cognitive styles:

47 Tetlock, 2.48 Ibid, 49-51.49 Ibid, 54.50 Ibid, 49.51 Ibid, 7.52 Ibid, 68.

16

Berlin’s fox and hedgehog.53 Statistical analysis revealed that having a fox-type

personality correlated to higher accuracy in forecasting.54

When the participants first created their forecasts, they included commentaries

explaining their thought process.55 From this information, Tetlock made numerous

generalizations about why foxes were able to forecast more accurately. Among these

include that foxes are reluctant to view problems through an established, rigid

framework; more cautious to explain current and future events through overly simplistic

historical analogies; less inclined to make overly confident forecasts supported by

looping evidence; were more emotionally neutral; and are more likely to integrate

dissonant viewpoints into their analyses.56 Interestingly, these traits are also common

benefits derived from structured analytical techniques. Tetlock’s research demonstrates

that intuitive thinking, whether used by a subject-matter expert or an amateur, is less

effective than cognitive styles that bear resemblance to structured methods because these

are less susceptible to the errors of cognitive bias.57

Proponents of intuitive analysis make valid points about the power of intuition

and the inherent limitations of structured methods in intelligence. That is, intuitive

thinking is naturally the basis of all analysis. Also, information used in intelligence

analysis problems will sometimes not fit easily into the rigid framework of a structured

method. On the other hand, proponents of structured methods make valid points about the

potential benefits of using such methods to aid intuitive analysis. That is, structured

53 Tetlock, 72-75.54 Ibid, 78-80.55 Ibid 88.56 Ibid; 88-92, 100-107. 57 Ibid, 117-118.

17

thinking can improve both accuracy and nuance by mitigating the effects of cognitive

bias and other judgmental errors. The research and experimentation of Tetlock and others

supports this assertion.

Folker is probably correct when concluding that intelligence analysis is not

exclusively one or the other, but instead “a combination of both intuition and scientific

methods.”58 Both styles of thinking have their strengths and weaknesses; and nothing

suggests they could not supplement each other. While this question still deserves future

research and debate, the “either/or proposition”59 may not be the most progressive

question to ask. Instead, the more appropriate question might be when are structured

methods appropriate? Hopefully, experiments such as this one will advance our

understanding of the utility of structured methods with various analytic problems.

Structured Methods in Intelligence

According to Dr. Rob Johnston from the CIA Center for the Study of Intelligence,

intelligence analysts currently have access to over 200 structured analytic methods.60

Despite this, intuition appears to be the predominant style of analysis within the IC and

most experts agree that structured methods are generally unused. Specifically, one expert,

Stephen Marrin, suggests the use of structured methods is mostly limited to analysts who

are required to use a very a specific methodology for a very specific purpose, such as

social network analysis for terrorism or counter-narcotics.61 Folker’s survey of 40

intelligence analysts from across the US Intelligence Community supported these

58 Folker, 13.59 Ibid, 13.60 Johnston, “Integrating Methodologists Into Teams of Substantive Experts,” 65.61 Marrin, 9.

18

assertions, revealing only one analyst who claimed to routinely use a structured analytic

method.62

There are several reasons why structured methods are not widely used in the US

Intelligence Community. The primary reason for the non-use of structured methods is an

analytic culture predisposed to intuitive thinking. Specifically, Feder states that this

culture views analysts primarily as writers and summarizers of information, rather than

“methodologists” who tinker with scientific tools.63 Whether or not organizational culture

is a key factor, Folker states that in general, “most people instinctively prefer intuitive,

non-structured approaches over structured methodologies.”64 Folker further explains:

Structured thinking is radically at variance with the way in which the human mind is in the habit of working. Most people are used to solving problems intuitively by trial and error. Breaking this habit and establishing a new habit of thinking is an extremely difficult task and probably the primary reason why attempts to reform intelligence analysis have failed in the past, and why intelligence budgets for analytical methodology have remained extremely small when compared to other intelligence functions.65

Furthermore, according to Heuer, given the purpose and nature of their work, intelligence

analysts, “[tend] to be skeptical of any form of simplification such as is inherent in the

application of probabilistic models.”66 While attempting to introduce new structured

methods to political analysts at the CIA in the 1970s, Heuer recalls that responses to the

notion of structured methods “typically ranged from skepticism to hostility.”67 The

underpinning of this skepticism, as discussed earlier, is the belief that structured methods

62 Folker, 11.63 Feder, “Forecasting,” 119.64 Folker, 2. 65 Folker 14; partly citing Morgan D. Jones, The Thinker’s Toolkit, 8.66 Heuer, Adapting Academic Methods and Models to Government Needs: The CIA Experience (Carlisle Barracks: Strategic Studies Institute, 1978), 7.67 Ibid, 5.

19

cannot effectively be applied to qualitative problems. Likely augmenting this skepticism

is the lack of empirical data demonstrating structured methods’ efficacy. While

proponents have argued the case for structured methods, few experiments have been

conducted which demonstrate their efficacy.68

Inadequate education regarding the use of structured methods is also to blame for

their non-use. Unlike many professions that have established cadres of specialists in

methodology, this is not the case with the US Intelligence Community. That is, exposure

to structured methods is typically dependent on self-education by individual analysts who

are heavily preoccupied with their own area of expertise.69 This work environment,

understandably, does not encourage busy analysts to spend time experimenting with new

analytical techniques. This is even more the case with more complex methods, such as

bayesian analysis.70

Analysis of Competing Hypotheses

Analysis of Competing Hypotheses (ACH) is one methodology that arguably can

improve intelligence analysis. According to the creator of the method, Richards J. Heuer,

Jr., ACH “requires an analyst to explicitly identify all the reasonable alternatives and

have them compete against each other for the analyst’s favor, rather than evaluating their

plausibility one at a time.”71 Heuer’s ACH is an eight step process; each with a specific

purpose in avoiding the flaws of unstructured thinking:72

68 Marrin, 10.69 Johntson, “Integrating Methodologists Into Teams of Substantive Experts,” 64-65.70 Folker, 8; citing Captain David Lawrence Graves, ISAF, Bayesian Analysis Methods for Threat Prediction, MSSI Thesis (Washington: Defense Intelligence College, July 1993), second page of Abstract.71 Heuer, “Psychology,” 95.72 These are taken directly from Heuer’s eight-step ACH process as cited. Heuer, 97. A more detailed discussion of these eight steps can be found in Chapter Eight of “Psychology.”

Figure 2.1 - Example ACH matrix from Psychology of Intelligence Analysis

20

1. Identify all possible hypotheses. 2. Make a list of significant evidence and arguments for and against each hypothesis, including assumptions. 3. Prepare a matrix with hypotheses across the top and evidence down the side. Analyze the “diagnosticity” of the evidence and arguments.4. Refine the matrix. Reconsider the hypotheses and delete evidence and arguments that have no diagnostic value. 5. Draw tentative conclusions about the relative likelihood of each hypothesis. Proceed by working down the matrix, trying to disprove the hypotheses rather than prove them. 6. Analyze how sensitive your conclusion is to a few critical items of evidence. Consider the consequences for your analysis if that evidence were wrong, misleading, or subject to a different interpretation. 7. Report conclusions. Discuss the relative likelihood of all the hypotheses, not just the most likely one. 8. Identify milestones for future observation that may indicate events are taking a different course than expected.

The first step of ACH is simply to identify all possible hypotheses, which Heuer

defines as, “a potential explanation or conclusion that is to be tested by collecting and

presenting evidence.”73 It is preferable to generate hypotheses in group discussion in

order to benefit from different perspectives and to reduce the likelihood that a plausible

hypothesis will not be identified.74 According to Heuer, there are not an ideal number of

hypotheses for any given

problem; but the number

should increase relative to

the level of uncertainty.75

While identifying

hypotheses, an emphasis is

73 Heuer, “Psychology,” 95.74 Ibid, 97-98.75 Heuer “Psychology,” 98.

21

placed on distinguishing between unproven and disproved hypotheses. That is, an

unproven hypothesis which has no supporting evidence in contrast to a disproved

hypothesis, which has specific evidence against it. Heuer warns against discarding an

unproven hypothesis simply because it lacks supporting evidence. Doing so can result in

prematurely rejecting a valid hypothesis. This precaution is essential because it is

possible supporting evidence exists but has not been found yet.76

The next step requires listing all pertinent evidence and arguments for and against

each hypothesis. This list is not limited to hard evidence but also includes assumptions

and logical deductions about the topic. These are incorporated into the structured process

because they will often have a strong influence on an analyst’s final thoughts. After

creating the list, an analyst asks himself several questions which will help identify

additional evidence that might be needed. For each hypothesis, what evidence should an

analyst expect to be seeing or not seeing if it were true? Also, the analyst considers how

the absence of evidence could be indicator itself.77 For example, in the case of possible

military attack, “the steps the adversary has not taken to ready his forces for attack may

be more significant than the observable steps that have been taken.”78

After the analyst is confident that all relevant evidence has been collected, step

three in the process requires constructing a matrix with the hypotheses lined over the top

and all evidence listed down the side. From this point, the analyst works across the matrix

one piece of evidence at a time, evaluating whether it is consistent, inconsistent, or

irrelevant to that hypothesis and makes an appropriate notation for future reference. This

76 Ibid.77 Heuer, “Psychology,” 99; Diane Chido, et al., 39-40.78 Heuer, “Psychology,” 99.

22

process is repeated for each piece of evidence until all cells in the matrix are filled. A

second objective in step three is to evaluate the diagnosticity of each piece of evidence.

That is, to evaluate its usefulness as an indicator for each hypothesis. Heuer uses a

medical analogy to demonstrate this principle. In trying to determine what illness a

patient is stricken with, a high-temperature does not have a high diagnosticity because

that symptom would apply to any number of illnesses. In the case of an ACH matrix,

evidence consistent with all hypotheses can be effectively useless in predicting an

outcome, and therefore, has a low diagnosticity.79

In the next step of the process, Heuer advises that the set of hypotheses should be

reevaluated for potential changes. After examining the evidence as it relates to each

hypothesis, it might be necessary to add, combine, or split hypotheses. According to

Heuer, this is essential because the nuances of each hypothesis will greatly affect how it

is analyzed. Additionally, evidence from step three found to have no diagnostic value is

removed from the matrix.80

After preparing and evaluating the matrix, each hypothesis is examined as a

whole and tentative conclusions are formed about the likelihood of each. The analyst

works down the matrix one hypothesis at a time, trying to disprove each with the

evidence. While no amount of consistent evidence can absolutely prove a hypothesis, a

single piece of evidence is enough to disprove it. Additionally, by disproving hypotheses,

an analyst is systematically narrowing down the possibilities until the most likely ones

are clear. The hypothesis with the least inconsistent evidence against it is viewed as the

79 Heuer, “Psychology,” 100-102.80 Heuer, “Psychology,” 103.

23

most likely possibility.81 However, Heuer warns, ACH is not meant to be the absolute

analytic solution to any problem, “the matrix serves only as an aid to thinking and

analysis, to ensure consideration of all the possible interrelationships between evidence

and hypotheses and identification of those few items that really swing your judgment on

the issue.”82 In the end, the analyst must make the final call.

Before finalizing the conclusion, the analyst questions the integrity of key pieces

of evidence and the repercussions if those linchpins turned out to be false, deceptive, or

misunderstood. Finally, when reporting conclusions, the analyst discusses the likelihood

of alternative possibilities and identifies circumstances which may indicate events are

unfolding differently than estimated.83

Strengths and Weaknesses

The methodology’s primary apparent strength is its ability to mitigate cognitive

biases such as satisficing. The ACH process is a structured, systematic methodology for

identifying all the possibilities and evidence, and determining the relation between all

information as a whole. By structuring the cognitive process, estimation and forecasting

will be less susceptible to flaws inherent in human cognition.84

81 Heuer, “Psychology,” 103-104.82 Ibid, 105.83 Ibid, 105-107.84 Kristan J. Wheaton, D.E. Chido, and McManis and Monsalve Associates, “Structured Analysis of Competing Hypotheses: Improving a Tested Intelligence Methodology” Competitive Intelligence Magazine, November-December 2006, http://www.mcmanis-monsalve.com/assets/publications/intelligence-methodology-1-07-chido.pdf (accessed 14 June 2008).

24

Another apparent strength of ACH is its usefulness as a management tool. The

design of the ACH matrix illuminates evidence and hypotheses side by side, acting as an

analytic “audit trail,” for any supervisory analyst or decision maker to take advantage of.

This benefits an analyst by being able to visually explain one’s thought process, and also

a manager, by aiding reviews of analytical judgments.85

While ACH is widely assumed to be a useful methodology, it has its weaknesses

as well as its strengths. The main weakness of ACH is that it can be time consuming.

While an analyst is often under time constraints, filling out an ACH matrix can be

tedious.86 However, several computer software companies, such as the Palo Alto

Research Company (PARC), have developed programs which automate the ACH

process.87 While ACH can still be a lengthy process, these computer programs have

helped make applying the methodology less time consuming.

Another weakness of ACH is difficulty incorporating information from ongoing

events, making it limited to being “only a snapshot in time.”88 As analysts are under time

constraints, they must force themselves to stop adding evidence into the matrix and begin

creation of their final analytic product, even if new information is available.89

Previous Studies on ACH

Quantitative studies on ACH have produced mixed findings regarding its

effectiveness as an analytic methodology, both for accuracy and mitigating cognitive

85 Marrin, 7.86 Kristan Wheaton, et al., 13.87 Palo Alto Research Center, “ACH2.0 Download Page,” http://www2.parc.com/istl/projects/ach/ach.html (accessed August 19, 2008).88 Diane Chido, et al., 50.89 Ibid.

25

biases. More studies are necessary because only a limited number have been conducted

so far. Additionally, testing ACH under varying conditions will help shed light on how

these conditions affect its performance.

In 2000, Robert D. Folker concluded in his paper, Intelligence Analysis In Joint

Intelligence Centers: An Experiment in Applying Structured Methods, that “…

exploitation of a structured methodology will improve qualitative intelligence analysis.”90

In his study, conducted in conjunction with the Joint Military Intelligence College

(JMIC), Folker tested the accuracy of hypothesis testing; a structured method nearly

synonymous with Heuer’s ACH. The researcher measured this by comparing the

accuracy of two groups; one using hypothesis testing and one using an unstructured,

intuitive approach to the same two intelligence scenarios.91 The experimental group

performed slightly better in the first scenario using hypothesis testing, but the difference

was not statistically significant. However, the difference between control and

experimental groups was statistically significant in the second scenario. Overall,

participants using hypothesis testing performed better than those using intuitive

analysis.92

Folker also notes that many experimental group participants “had difficulty

identifying all of the possible hypotheses and determining the consistency of each piece

of evidence with each hypothesis.”93 Because of this observation, Folker acknowledges

that the effectiveness of structured methods depends heavily on the type of problem and

90 Folker, 29.91 Ibid, 15.92 Ibid, 29.93 Ibid, 30.

26

the training of each analyst. However, he concludes that an adequately trained analyst

and a structured methodology can improve intelligence analysis:

Analysis involves critical thinking. Structured methodologies do not perform the analysis for the analyst; the analyst still must do his own thinking. But by structuring a problem the analyst is better able to identify relevant factors and assumptions, formulate and consider different outcomes, weigh different pieces of evidence, and make decisions based on the available information. While exploiting a structured methodology cannot guarantee a correct answer, using a structured methodology ensures that analysis is performed and not overlooked. 94

The MITRE Foundation conducted a study in 2004 on how ACH affects

confirmation bias and the anchoring effect. They define the anchoring effect as the

“tendency to resist change after an initial hypothesis is formed.”95 The study compared

groups working on the same intelligence problem; one group with ACH and one group

without. They found ACH users were just as susceptible to confirmation biases as non-

ACH users, except in special circumstances. ACH did not help mitigate an anchoring

effect, but the researchers admit this result is unreliable due to testing conditions.96 A

pattern of evidence distortion was present in both ACH and non-ACH groups but this is

negligible due to data inconclusively linking it to actual confirmation bias.97 Lastly, a

weighting effect was present in the study and ACH helped mitigate this, but only with

users less experienced in intelligence analysis.98 The researchers’ final conclusion is that

although “ACH is intended to mitigate confirmation bias in intelligence analysts…there

is no evidence that ACH reliably achieves this intended effect.”99

94 Folker, 33.95 B. Cheikes et al., Confirmation Bias in Complex Analyses. (Bedford, MA: MITRE, 2004), 9.96 Ibid, 9.97 Ibid, 12.98 Ibid, iii.99 B.A. Cheikes, et al., 16.

27

In 2004, Jean Scholtz conducted an evaluation of ACH with six Naval Reservists,

who used both intuitive analysis and ACH to solve different intelligence problems. All

participants were tasked one of two intelligence problems, using intuitive analysis for the

first and ACH for the second. After completing both problems, Scholtz administered a

questionnaire to all participants regarding their experience with ACH. The answers from

these questionnaires were overwhelmingly positive toward ACH. Among the answers

provided by participants were that they felt ACH improved their analysis, it was easy to

use, and they would be inclined to use it in the future.100 The quantitative data suggested

that ACH helps users consider more hypotheses and incorporate more evidence.101

In 2006, Peter Pirolli conducted an experiment on ACH in an intelligence

classroom at the Naval Postgraduate School (NPS). Pirolli split students at the NPS into

two groups: those analyzing a problem using ACH on paper, and those using computer-

assisted ACH. In his final paper, Assisting People to Become Independent Learners in

the Analysis of Intelligence, Pirolli concluded there was little difference in ACH used on

paper and computer-assisted ACH.102 Also, post-experiment reviews from participants

were positive about the application of ACH.103

Hypotheses

Taking into consideration the purpose and purported benefits of ACH, as well as

previous literature and studies pertinent to the subject, I developed a series of testable

100 Jean Scholtz, Analysis of Competing Hypotheses Evaluation (Gaithersburg, MD: National Institute of Standards and Technology, 2004), 1.101 Ibid, 12.102 Peter Pirolli, Assisting People to Become Independent Learners in the Analysis of Intelligence (Palo Alto Research Center, Calif.: Office of Naval Research, 2006), 63.103 Ibid.

28

hypotheses. My first hypothesis is that participants using ACH will, as a group, produce

more accurate forecasts regarding the assigned task than those using intuitive analysis.

The second hypothesis is that evidence of cognitive biases and mindsets will be more

prevalent among those using intuitive analysis, but less so among those using ACH

because of its ability to mitigate such phenomena.

METHODOLOGY

Research Design

This experiment was designed with a control and experimental group and

conducted over the course of two weeks in October 2008. Both groups were tasked to

forecast the result of the 2008 Washington State gubernatorial election, which occurred

on November 4, 2008. However, participants in the experimental group were instructed

to use ACH to structure their analysis. Also, participants were organized into control and

experimental groups by political affiliation so that the effects of mindsets, if present,

Figure 3.1

29

could be measured between groups. Furthermore, the use of evidence among all

participants would be used to ascertain the presence and effects of confirmation bias.

Unlike many experiments where participants’ commitment involved a single, sit-

down session to complete a task, this experiment gave participants a full week to

complete the assignment at their own convenience and they were given freedom to

collect any open source information which they viewed as relevant to the tasking. I

structured the experiment in this way to create a less artificial environment for

participants and one more similar to that in which most intelligence analysts work.

Participants

Participants in the

experiment were composed of

undergraduate and graduate

students from the Mercyhurst

College Institute for Intelligence

Studies (MCIIS). There were a

total of 70 students who

participated in the experiment, with 38 in the control group and 32 in the experimental

group. All class years were well represented in the experiment as a whole, including a

markedly higher number of juniors and first year graduate students (See Figure 3.1). The

distribution of class years within each group was nearly even, except for a higher number

of first year graduate students in the control group and a higher number of second year

graduate students in the experimental group (See Figure 3.2). I placed nearly all first year

Figure 3.2

Figure 3.3

30

graduate students in the control group because they lacked experience in ACH at the

time. I placed most second year graduate students in the experimental group in order to

even out the distribution of graduate students among both groups.

Although I did not require all participants to use ACH in their tasking, I did

require that all participants had used the methodology at least once before participating in

this experiment (first year graduate students being an exception). This was done mostly

for ease in assigning participants

to control and experimental

groups. This is also why

freshmen students were not

permitted as participants, because

they had not yet used the

methodology in any of their

academic coursework. The exclusion of freshmen students also likely ensured an overall

more mature and experienced pool of participants.

In total, there were a noticeably higher number of students with the affiliation as a

Republican than as a Democrat (See Figure 3.3). In the control group, the proportion of

Republicans to Democrats was around 1.5:1. In the experimental group, this proportion

was nearly 2:1. Although an even number of Republicans and Democrats in both groups

would have been ideal, the circumstances surrounding participant recruitment did not

allow me to be overly

selective.

31

Procedures

I spent two weeks prior to conducting the experiment visiting classes to recruit

intelligence students as participants. While recruiting, I briefly explained what my

research was on, the time and work required, and the benefits for those who participated.

The primary benefit offered was that some professors were willing to assign extra credit

to those students who volunteered to participate. After giving my brief presentation on

the experiment, I handed out and collected signup sheets from those who were interested

(See Appendix A). The sign-up sheets requested contact information, class year, political

affiliation, and preference for four different time slots to participate in the experiment.

After collecting signup sheets and finishing recruitment, I e-mailed all students with their

assigned time slot for the experiment. Time slots were assigned by myself rather than

chosen by participants so I could ensure a fairly even distribution of Republicans and

Democrats among the control and experimental groups.

While recruiting, I told students my thesis topic was “structured analytical

methods,” rather than ACH. All students who participated had used ACH at least once

through coursework in the Intelligence Studies program and were familiar with the

methodology’s purpose of mitigating cognitive bias. If I had emphasized the use of the

32

methodology while recruiting, it might have ruined the integrity of the experiment’s

results by giving students insight into the purpose of the experiment.

At the beginning of each tasking session, I handed out the Consent Form for each

participant to sign and return to me (See Appendix B). This Consent Form explained the

purpose of the experiment, what participation entailed, that there was no anticipated

dangers or harmful effects associated with participating, and that they may discontinue

participation at any time without penalty. After collecting Consent Forms, I handed out

experiment packets containing their tasking, answer sheet, and other relevant information

(See Appendix C). I reviewed the packet with them, explained their tasking, what was

expected during their participation, and discussed other issues related to successful

completion of the experiment. Specifically, I reviewed concepts relevant to the tasking

such as words of estimative probability (WEP), analytic confidence, and source

reliability.

At the end of the tasking session, participants were instructed on procedures for

returning their answer sheets for the experiment. Over the course of the next week and a

half, I, along with a colleague who offered his assistance, collected answer sheets from

participants who finished the experiment. Upon returning their answer sheet, participants

received a debriefing statement and a post-experiment survey. The debriefing statement

thanked students for participating, explained the purpose of the experiment in further

detail, as well as how this research would contribute to the body of academic work in

their field (See Appendix D). There were two different post-experiment surveys given to

participants, one for the control and one for the experimental (see Appendix E). The

surveys asked questions related to how much time and work was spent on the experiment,

33

estimated difficulty, as well as their understanding of the assigned task. The survey for

the experimental group also included questions about their understanding of ACH. The

purpose of these surveys was that, if the experiment was not successful, I would have

some feedback for structuring a future attempt.

Control Group

After attending the tasking session, control group participants had a full week

from that date to complete their assigned task. This task was to assume the role of a

political analyst working for a fictional news company and forecast the result of the

upcoming 2008 Washington State gubernatorial election. The two hypotheses implicitly

provided in the tasking were:

● The incumbent governor, Christine Gregoire (D), will win the election.

● The challenger, Dino Rossi (R), will win the election.

Participants received some basic background information about the election and its

candidates, and were encouraged to use all available open source information, but were

specifically instructed to use intuitive analysis. On the provided answer sheet,

participants were tasked to include an estimative statement summarizing their analysis.

The answer sheet also included a place to further explain their analytical findings, but this

was not required. The words of estimative probability (WEP) used in the experiment

were primarily based on those used by the National Intelligence Council (See Figure 3.4).

However, there were some slight modifications to accommodate the needs of the

experiment. First, the most central expression of likelihood, “even chance,” was removed.

The research design of this experiment required an analytical problem where the

Figure 3.5 – NIC Words of Estimative Probability

Figure 3.4 – Experiment Words of Estimative Probability

34

likelihood of both hypotheses was so similar that, in this case, politically oriented

mindsets could tip participants’ forecast. Because the result of the election would be

difficult to call, I knew that a high number of participants would be tempted to select a

centrist/neutral expression of likelihood. Although this selection may be legitimate, it

would have likely skewed the results because a high number of participants would have

supplied an answer useless to the research question. The second modification was adding

a level of likelihood between “likely” and “almost certain,” as well as its negative

equivalent on the opposite end of the scale. This is more similar to the scale of WEP used

by the students at Mercyhurst and I also felt this was more appropriate for the topic being

analyzed (See Figure 3.5). Although the Washington State gubernatorial election was

expected to be very close, I felt some participants still might desire to indicate a level of

likelihood greater than “likely,” but not “almost certain.”

Participants’ tasking also included assigning low, medium, or high for an

indication of overall source reliability. Although already familiar with the concept of

source reliability, their tasking sheet included a short explanation. For analytic

Figure 3.6 – Continuum-like Scale

35

confidence, I required participants to use a continuum-like scale rather than a numeric

scale (See Figure 3.6).

Lastly, I provided control group participants with suggestions for beginning their

research. This included a non-partisan website containing basic information about

Washington State politics and links to related resources. Additionally, since MCIIS

students are not familiar with forecasting domestic elections, I provided a list of types of

evidence that could be useful indicators for the result of a gubernatorial election (See

Appendix C).

Experimental Group

Tasking for the experimental group was identical to the control group except that

participants were required to use the Palo Alto Research Center (PARC) ACH 2.0

software to create an ACH matrix for their analyses. They were instructed to print out this

matrix and return it along with their answer sheet. During their tasking session, I

reviewed and discussed ACH to ensure everyone’s understanding of the methodology

was fresh and accurate.

36

Data Analysis

The primary question of this research is whether or not ACH increases forecasting

accuracy. I sought to answer this question simply by comparing the control and

experimental groups to see if there was a significant difference between the accuracy of

their forecasts. The secondary question is whether or not ACH helps mitigate the effects

of cognitive bias and mindsets in users. If the results yield discernible patterns in

participants’ forecasts as related to their political affiliations, this would likely be an

indicator of a politically oriented mindset. Also, if candidates overwhelmingly supplied

evidence only in favor of their forecasted candidate, this would suggest the presence of

confirmation bias, specifically. If such patterns existed in the control group but were less

pronounced or non-existent in the experimental group, this would suggest ACH helps

mitigate confirmation bias.

All data pertaining to the above research questions was tested for statistical

significance using a program called Statistical Package for the Social Sciences (SPSS).

Derived from a series of mathematical formulas and tests, statistical significance is the

likelihood that the difference between control and experimental group data is the result of

mere coincidence. The SPSS tests for all data sets were placed at a 5 percent (.05)

threshold for statistical significance. That is, to achieve statistical significance, the chance

that the findings are mere coincidence must be 5 percent or less.

Figure 4.1

37

RESULTS

Accuracy

At the end of the 2008 Washington State Gubernatorial Election, the incumbent

Democrat, Christine Gregoire (D), defeated the Republican challenger, Dino Rossi (R),

38

by a margin of 6.4 percentage points.104 After compiling and analyzing the results,105 I

found that accuracy improved from the control to experimental group by 9 percentage

points. In the control group, 61 percent of participants forecasted accurately in favor of

the eventual winner, Gregoire (See Figure 4.1). Accuracy in the experimental group

improved slightly with 70 percent of participants forecasting Gregoire (D) as the winner.

Statistical testing found that the data on accuracy is not statistically significant,

having a P-value of .421 (See Appendix F). While this testing does not definitively

invalidate these experiment results, it does raise some doubt about their validity. Other

factors that could have prevented statistical significance are the small sample size and

smaller difference between the control and experimental group data.

Furthermore, there is good reason to believe that the difference in accuracy

between the control and experimental groups in such an experiment should not be that

great. Although many criticisms of the human thought process are valid, intuitive analysis

104 Washington Secretary of State. November 4, 2008 General Election. http://vote.wa.gov/elections/wei/Results.aspx?RaceTypeCode=O&JurisdictionTypeID=2&ElectionID=26&ViewMode=Results. 105 These results exclude two outliers and contain one data correction in the experimental group.

http://vote.wa.gov/elections/wei/Results.aspx?RaceTypeCode=O&JurisdictionTypeID=2&ElectionID=26&ViewMode=Results

http://vote.wa.gov/elections/wei/Results.aspx?RaceTypeCode=O&JurisdictionTypeID=2&ElectionID=26&ViewMode=Results

39

is not obsolete. For an experiment like this one, a structured method should only improve

overall forecasting accuracy incrementally since intuitive analysis is, for the most part, an

effective method itself. Additionally, if and when cognitive bias affects an analyst’s

intuitive thought process, structured methods such as ACH can aid as a counter measure.

In other words, a structured method will not improve the analysis of all users. In sum, the

improvement of the group using ACH should not be discounted because it is modest.

This difference is expected and still supports the notion that ACH can improve analysis.

Mindsets

As discussed in the previous section, if a politically oriented mindset is present, it

should manifest itself in the results by a strong tendency of participants to forecast in

favor of the candidate associated with their own political affiliation. However, if ACH

helps mitigate this, this tendency should be less prominent. For example, if forecasts

among Republicans are significantly more in favor of Rossi (R) in the control group, but

more in sync with the actual winner of the election in the experimental group, this would

suggest that ACH helped mitigate the effect in that group. The same should hold true for

Democratic participants. However, interpreting the results will be subject to the winner of

the election. In this case, such a mindset among Democrats will be more difficult to

identify and evaluate because the democratic candidate won. Data comparing forecasts

between Democrats and Republicans in the control and experimental groups is depicted

in Figure 4.2.

Figure 4.2

40

Among Democrats, the percentage of participants who forecasted in favor of

Gregoire (D) compared to Rossi (R) was strongly in favor of Gregoire and remained

nearly identical from the control to experimental group. While this might suggest the

effects of a mindset were prevalent in both groups, it is more likely this appears to be the

case not because of the influence of an actual mindset, but because Democrats

overwhelmingly forecasted correctly in both groups. Unfortunately, this muddles the

ability to estimate the number of Democrats whose forecasts were subject to a mindset.

This hypothetical number of Democrats is likely hiding somewhere among the total

number of Democrats who forecasted accurately in favor of Gregoire (D).

Analyzing Republican forecasts in the control and experimental groups yields

more discernable results. In the control group, the proportion of forecasts between

candidates was nearly equal, with only a 4 percent margin favoring Gregoire (D).

However, this proportion changed dramatically in the experimental group with the

margin expanding to 36 percentage points. This suggests it is likely that ACH helped

41

mitigate a politically oriented mindset among Republicans in the experimental group. It is

likely that Republicans’ thought process in the control group was heavily influenced by

their political leanings and preference for the Republican candidate, while ACH mitigated

these effects among some users in the experimental group.

Additionally, although 32 percent of experimental group Republicans forecasted

incorrectly in favor of Rossi, they displayed better calibration than their counterparts in

the control group. That is, they were arguably less wrong. Tetlock defines calibration as

“the degree to which subjective probabilities [analytic estimate] are aligned with

objective probabilities.”106 Although their estimate was wrong, their matrices generally

indicated a lower level of likelihood than that of the control group analyses. Of the 32

percent of Republicans who still got it wrong with ACH, the methodology arguably

brought them closer to forecasting correctly than those in the control group.

Like the dataset on accuracy, this data did not meet the standard for statistical

significance, having P-values of .973 and .291 for Democrats and Republicans,

respectively (See Appendix F). However, also like the dataset on accuracy, this is likely

attributable to the even smaller sample size. Breaking down participants into Democrats

and Republicans in the control and experimental groups essentially cut the sample size of

each dataset in half, making it difficult to extract statistically significant results.

Furthermore, for the statistical testing on accuracy and mindsets, it is important to

consider appropriate standards for significance with different types of research. Although

the threshold for statistical significance was set at the general standard (p=.05), it is

acceptable to interpret statistical results less stringently in exploratory research. Although

106 Tetlock, 47.

Figure 4.3

42

the statistical results for mindsets among Republicans would not even satisfy an

acceptable standard for exploratory research (.10), having a P-value of .291 is still

notable for its proximity.107 Also, this P-value essentially says there is about a 70 percent

chance that the data is not the result of chance, suggesting that further research, with

larger data sets, is warranted.

Confirmation Bias

Comparing the levels of consistent and inconsistent evidence between groups

clearly reveals confirmation bias among participants in the control group. As discussed

earlier, confirmation bias is the tendency “for people to seek information and cues that

confirm the tentatively held hypothesis or belief, and not seek (or discount), those that

support an opposite conclusion or belief.”108 Regardless of political affiliation or forecast,

107 David G. Garson, Guide to Writing Empirical Papers, Theses, Dissertations (New York: Marcel Dekker, Inc., 2002), 199.108 Wickens and Hollands, 312.

Independent Samples Test

5.940 .018 -7.851 60 .000

-7.772 52.783 .000

Equal variancesassumed

Equal variancesnot assumed

Confirmation BiasF Sig.

Levene's Test forEquality of Variances

t df Sig. (2-tailed)

t-test for Equality of Means

Figure 4.4 – SPSS Testing Results for Confirmation Bias

43

80 percent of all participants in the control group provided evidence in their answer

sheets that entirely supported their forecasted candidate.109 On the other hand, only 9

percent of experimental group participants exhibited this behavior. The ACH matrices of

these participants show that both hypotheses were considered with varying proportions of

consistent and inconsistent evidence. Furthermore, SPSS testing on confirmation bias

revealed a statistically significant difference between control and experimental group

data, with the P-value being .000 (see Figure 4.4). In other words, according to the

calculations of the SPSS program, there is a zero percent chance that the results for

confirmation bias can be attributed to coincidence. This data suggests ACH tremendously

helped mitigate confirmation bias.

Other Findings of Interest

109 This data excludes eight outliers. These outliers were participants who did not provide any evidence whatsoever along with their estimative statement.

Table 4.1

44

Comparing the average number of pieces of evidence used by each group in

creating their estimate reveals a staggering difference and suggests something about the

ability of ACH to encourage users to seek

out and use more information (see Table

4.1). In the control group, participants used

on average less than 3 pieces of evidence for

their analysis. On the other hand, participants in the experimental group used on average

10 pieces of evidence. This is almost certainly attributable to one of the weaknesses of

intuitive analysis and one of the strengths of ACH. One flaw of intuitive analysis is that

the human thought process is constrained by the inability to process more than a handful

of individual pieces of information at a time.110 Given this, analysts will often make a

judgment unaware that they are using an inadequate amount of information. On the other

hand, a structured method such as ACH allows a user to visualize all the information at

the same time. This will not only increase accuracy by allowing the user to better

understand the relationship of all the evidence, but also makes it easier for an analyst to

identify information gaps. As the concept applies to this experiment, I believe

participants using intuitive analysis included fewer pieces of evidence in their analysis

because using cognition alone, they were far were less likely to identify information gaps

and also maintained a false sense of confidence in their collection before making a

forecast. For those using ACH, on the other hand, the matrix aided in both identifying

information gaps and dispelling any false sense of confidence regarding the amount of

evidence used.

110 George A. Miller, “The Magical Number Seven—Plus or Minus Two: Some Limits on our Capacity for Processing Information,” The Psychological Review, Vol. 63, No. 2 (March 1956): 1-12.

Group Avg. # of pieces of evidence used

Control 2.9

Experimental 10.1

Figure 4.5

45

There were no discernible patterns in the words used to describe the estimative

probability assigned to the results (the WEPs) among the control and experimental groups

related to cognitive bias. As can be seen in Figure 4.5, participants in both groups

overwhelmingly used “likely” as the WEP in their estimative statement. I expected this

result because of the close nature of the election. Average analytic confidence among

both groups was very close, with the control group averaging 6.1 on a scale of 10 and the

experimental group averaging 5.9. Analyst assessments of source reliability were very

similar among both groups and sub-groups within, with an overwhelming number of

participants rating their overall source reliability as “medium,” on a low-medium-high

scale. This consistency likely has less to do with the method and more to do with the

analysts’ incomplete understanding of these concepts.

Summary of Results

46

The findings discussed in this section suggest that ACH is modestly effective for

improving accuracy and very effective at reducing the effects of mindsets and cognitive

bias in intelligence analysis. ACH slightly improved accuracy among users in the

experimental group. Among Republicans, ACH appeared to mitigate the effects of a

politically oriented mindset regarding the Republican candidate. This was not the case

with Democrats, but this was likely because the Democratic candidate won the election,

hindering the ability to discern any difference between the control and experimental

groups. Regarding the use of evidence, ACH users incorporated substantially more

evidence into their analysis and applied it more appropriately. Specifically, a tendency

among nearly all control group participants to only incorporate evidence in favor of their

forecasted candidate strongly suggests confirmation bias. This, however, appeared to be

substantially mitigated by ACH.

CONCLUSION

47

The main purpose of this study was to ascertain whether or not ACH is effective

for estimation and forecasting in intelligence analysis. The secondary purpose was to

determine whether or not the methodology is effective for mitigating cognitive bias and

other phenomena detrimental to intelligence analysis. While most of these results are not

definitive, they all support the notion that ACH can improve intelligence analysis.

The results of this experiment revealed that ACH improved forecasting accuracy,

but only modestly. With the exception of one component of Folker’s experiment, where

ACH/hypothesis testing performed drastically better than intuition, the minute difference

in accuracy between the control and experimental groups in this study is consistent with

all other testing on the methodology’s accuracy.

A common variable in both these experiments was that the objective likelihoods

of the given hypotheses were very close. On the other hand, in the component of Folker’s

experiment where ACH/hypothesis testing performed drastically better, it was clear that

one of the given hypotheses was much more likely than the others.111 This suggests,

perhaps, that ACH is less effective with those problems where the objective probabilities

of each hypothesis are roughly equal and more so when they are slightly more uneven.

This inference helps us identify when ACH is most appropriate to use. In this

case, the results on accuracy have shed light on the utility of the methodology with

problems subject to varying objective probabilities among the given hypotheses. This

experiment and previous ones already suggest that ACH is less useful where those

probabilities are roughly equal. On the other end of the spectrum, when those

111 These facts are derived from observing Folker’s priori evaluation of the intelligence scenarios and given evidence.

Figure 5.1

48

probabilities are very clear, a structured methodology is obviously unnecessary. To be

specific, the accumulated data suggests ACH may only be effective where the objective

probability of the most likely hypothesis is at least 10-15 percentage points above the

next most likely hypothesis. Such a probabilistic “distance” should allow the rough tool

that ACH is (compared to more refined statistical measurements) to distinguish the more

likely hypothesis from the less like ones. On the other hand, as the objective probability

of the most likely hypothesis rises more than 30-45 percent above the next most likely

hypothesis, ACH or, indeed, any structured method will become increasingly

unnecessary. The differences between the two hypotheses will be “visible to the naked

eye,” in a manner of speaking. The graph in Figure 5.1 demonstrates this concept for a

two hypothesis scenario.

Practically, implementing this suggestion is difficult if not impossible. Assigning

objective probabilities to realistic intelligence scenarios is fraught with difficulty. That

said, this suggestion may well provide avenues for future research into the utility of

49

ACH. Given this idea, a number of future experiments could be designed to shed further

light on ACH’s utility in varying circumstances. A subsequent experiment could test the

methodology’s utility with two hypotheses when the objective probabilities are more

uneven, such as 70 – 30 percent. Another varying condition could be the number of

hypotheses. The analytic problem in this experiment contained only two hypotheses;

however, future experiments could test ACH against a problem with more than two

hypotheses that has any set of objective probabilities.

ACH also appeared to mitigate the effects of politically oriented mindsets among

some participants; however, this is uncertain because of the conditions for measuring

such an effect. Overall, the researcher was surprised that the difference was not more

pronounced. I confidently expected, given the nature of the analytic problem and one

with close objective probabilities of each hypothesis, that politically oriented mindsets

would be present and would tip the balance in many participants’ forecasts. This

appeared to be the case with Republican participants, but at far less a magnitude than

expected. Anecdotally, I feel that the disparity in evidence used by participants was partly

responsible for this result.

For future tests like this one, an overall larger sample size would also be

beneficial since these tests required breaking down participants further into subsets

within each group, creating even smaller data sets and decreasing their reliability. This

suggestion is not meant to cast doubt on the interpretation that ACH helped mitigate the

influence of politically oriented mindsets, but instead is meant as an explanation as to

why this tendency was less evident than expected. The influence of mindsets was present,

50

but the researcher believes a similar test with a larger sample size would have likely

helped create a result more commensurate with his original expectation.

Confirmation bias was clearly evident among those using intuitive analysis in the

control group. On the other hand, the near non-existence of this in the experimental group

suggests ACH substantially reduced this bias in the experimental group. This finding is

unique and unlike previous studies in several ways. First, the method of measuring and

discerning such an effect is vastly different than that of Cheikes, et. al. Rather than

focusing on evidence distortion for discerning the presence of confirmation bias, the

researcher derived his conclusion solely from the comparative use of evidence and how it

related to analysts’ forecasts. This is more in line with the Wickens and Holland’s

definition of confirmation bias, which emphasizes the idea of seeking and incorporating

information that supports a preferred hypothesis and ignoring or discrediting evidence

unfavorable to a preferred hypothesis. Lastly, the substantial difference between the two

groups is also unlike any other finding on ACH and confirmation bias. This difference

demonstrates that ACH is excellent for encouraging analysts to incorporate and weigh a

variety of discordant evidence against multiple hypotheses.

Overall, the differences in evidence among those using intuitive analysis and

those using ACH were staggering. Not just in how the evidence was used, but even

simply in the amount of evidence used. ACH users incorporated a significantly higher

average number of pieces of evidence. This demonstrates that their analyses were overall

more thorough and comprehensive than analysts using intuition.

These findings also demonstrate the benefit of transparency and added

accountability derived from the use of structured methods. For every participant using

51

ACH, I can easily check every piece of evidence they used as well as how that evidence

contributed to their final conclusion. This was somewhat the case with the intuitive

thinkers, most of whom listed the evidence they used. However, their lists are nowhere as

organized and clear as the ACH matrices.

One possible flaw in this study which might have prevented more definitive

results was the varying evidence used among participants. While allowing participants to

collect their own information led to its own insights such as the finding on confirmation

bias, this created a less than ideal environment for comparing some results among users.

For example, did some of the experimental group participants forecast incorrectly

because using ACH was ineffective or because their research led to incorrect or

inadequate information? As Heuer explains, an ACH matrix is only as good as the

evidence it contains.112 While this aspect of the methodology created some interesting and

valid results, it unfortunately creates some level of uncertainty about other results.

Given this, another suggestion for future experiments would be to provide

participants with a base set of evidence, but like this experiment, allow them within their

given period of participation to seek out additional information. Providing a base set of

evidence would help control for the varying evidence used among participants but still

maintain conditions conductive to testing for mindsets and confirmation bias. Also, this

base set of evidence would act as a benchmark to compare to any additional information

participants collect – improving the ability to measure confirmation bias. However future

studies on ACH are structured, it will benefit our understanding of the methodology for it

to be tested in conditions varying from past studies.

112 Heuer, “Psychology,” 109.

52

The results of this experiment support my hypotheses that ACH can improve

forecasting accuracy and that it aids in mitigating biases and other cognitive phenomena.

However, these are far from definitive and more research is needed that validates these

findings and test ACH in varying conditions. Doing so will continue to expand our

understanding of the methodology and support efforts to improve the United States’

intelligence analysis capability via use of structured methods.

As suggested by various Congressional committees on intelligence, analysts in the

US Intelligence Community should begin taking advantage of effective tools and

methods which can improve their analysis. These analysts already have access to over

200 analytic methods – ACH being one of them. Taking into consideration both the need

for the use of such methods and the demonstrated ability of ACH to improve analysis,

there is no reason that structured methods should not be taken advantage of when

appropriate. Hence, the last step to improving intelligence analysis with structured

methods is innovative analysts willing to incorporate these tested methods into their daily

work. In answering the research question, I hope these findings promote the use of

structured methods that can improve the overall quality of intelligence analysis in the US

Intelligence Community.

53

BIBLIOGRAPHY

Cheikes, B.A., et al., Confirmation Bias in Complex Analyses. Technical Report No. MTR 04B0000017. (Bedford, MA: MITRE, 2004).

Chido, Diane and Richard M. Seward, Jr., eds. Structured Analysis of Competing Hypotheses: Theory and Application. Mercyhurst College Institute of Intelligence Studies Press, 2006.

Clark, Robert M. Intelligence Analysis: A Target-Centric Approach. Washington D.C.: CQ Press, 2007.

Clark, Robert M. Intelligence Analysis: Estimation and Prediction. Baltimore: American Literary Press, Inc., 1996.

Congressional Research Service Report for Congress. Proposals for Intelligence Reorganization, 1949-2004. 2004

Feder, Stanley A. “FACTIONS and Policons: New Ways to Analyze Politics.” Inside the CIA’s Private World: Declassified Articles from the Agency’s Internal Journal 1955-1992, ed. H. Bradford Westerfield. New Haven: Yale University Press, 1995.

Feder, Stanley A. “Forecasting for Policy Making in the Post-Cold War Period.” Annual Review of Political Science Vol. 5. (2002): 113-119.

Folker, Jr., Robert D Jr. (2000). Intelligence Analysis in Theater Joint Intelligence Centers: An Experiment in Applying Structured Methods. Washington D.C.: Joint Military Intelligence College, Occasional Paper #7, 2000.

Garson, David G. Guide to Writing Empirical Papers, Theses, Dissertations. New York: Marcel Dekker, Inc., 2002.

Gladwell, Malcolm. Blink: The Power of Thinking Without Thinking. New York: Back Bay Books/Little, Brown and Company, 2007.

Heuer, Jr. Richards. J. Adapting Academic Methods and Models to Governmental Needs: The CIA Experience. Carlisle Barracks: Strategic Studies Institute, 1978.

Heuer, Jr., Richards. J. “Limits of Intelligence Analysis,” Orbis, Winter 2005, 75-94.

Heuer, Jr., Richards J. Psychology of Intelligence Analysis. Washington D.C.: CIA Center

54

for the Study of Intelligence, 1999. Johnston, Rob. Analytic Culture in the US Intelligence Community: An Ethnographic Study. Washington D.C.: Center for the Study of Intelligence, 2005.

Johnston, Rob. “Integrating Methodologists into Teams of Substantive Experts.” Studies in Intelligence. Vol. 47. No. 1: 65.

LeGault, Michael R. Think: Why Crucial Decisions Can’t Be Made in the Blink of an Eye.

New York: Threshold Editions, 2006.

Lowenthal, Mark M. Intelligence: From Secrets to Policy. Washington D.C.: CQ Press, 2006.

Marrin, Stephen. “Intelligence Analysis: Structured Methods or Intuition?” American Intelligence Journal 25, no. 1 (Summer 2007): 7-10.

Merriam-Webster’s Collegiate Dictionary, 11th ed., s.v. “intuition.”

Merriam-Webster’s Collegiate Dictionary, 11th ed., s.v. “mindset.”

Merriam-Webster’s Collegiate Dictionary, 11th ed., s.v. “scientific method.”

Miller, George A. “The Magical Number Seven—Plus or Minus Two: Some Limits on our Capacity for Processing Information.” The Psychological Review, Vol. 63, No. 2

(March 1956): 1-12.

Myers, David G. Intuition: Its Powers and Perils. New Haven: Yale University Press, 2002.

Palo Alto Research Center. “ACH2.0 Download Page.” http://www2.parc.com/istl/projects/ach/ach.html (accessed August 19, 2008).

Pirolli, P. Assisting People to Become Independent Learners in the Analysis of Intelligence (Tech. No. CDRL A002). Palo Alto Research Center, Calif.: Office of Naval Research, 2006.

Scholtz, Jean. Analysis of Competing Hypotheses Evaluation (PARC) (No. Unpublished Report). Gaithersburg, MD: National Institute of Standards and Technology,

2004.

Tetlock, Philip E. Expert Political Judgment. Princeton: Princeton University Press, 2005.

Tversky, Amos and Daniel Kahneman. “Availability: A Heuristic for Judging Frequency

55

and Probability.” Cognitive Psychology 5 (1973), 207-232.

Tversky, Amos and Daniel Kahneman. “Judgment Under Uncertainty: Heuristics and Biases.” Science 185, no. 4157 (1974). JSTOR (accessed March 15, 2009).

United States Government. A Review of the Intelligence Community (The Schlesinger Report). 1971.

United States Government - Commission on the Intelligence Capabilities of the United States Regarding Weapons of Mass Destruction. Report to the President of the United States. Washington D.C., 2005. <http://www.wmd.gov/report/> (Accessed 22 January 2009).

United States Government - U.S. Commission on the Roles and Capabilities of the United States Intelligence Community, Preparing for the 21st Century: An Appraisal of U.S. Intelligence. Washington, D.C., 1996.

Washington Secretary of State. “November 4, 2008 General Election.” <http:/vote.wa.gov/elections/wei/Results.aspx?RaceTypeCode=O&JurisdictionTypeID=2&ElectionID=26&ViewMode=Results> (Accessed December 14, 2008).

Wheaton, Kristan J., D.E. Chido, and McManis and Monsalve Associates.“Structured Analysis of Competing Hypotheses: Improving a Tested Intelligence Methodology. Competitive Intelligence Magazine, November-December 2006. http://www.mcmanis-monsalve.com/assets/publications/intelligence-methodology-1-07-chido.pdf (accessed 14 June 2008).

Wickens, C.D, and Justin G. Hollands. Engineering Psychology and Human Performance.

3rd Ed. Upper Saddle River, NJ: Prentice Hall, 2000.

56

APPENDICES

57

Appendix A: Experiment Sign-Up Forms

Structured Methods ExperimentSign-Up Form

Name:

Class Year:

Phone Number:

E-mail Address:

Political Affiliation: (circle one) Republican Democrat

Instruction Session Dates/Times: (Rank preferences 1-4, 1=highest, 4=lowest)

Monday, 13 October 2008 – 5:00pm ____

Tuesday, 14 October 2008 – 6:00pm ____

Wednesday, 15 October 2008 – 5:00pm ____

Thursday, 16 October 2008 – 6:00pm ____

58

Upon completion, please return this form to Drew Brasfield or Travis Senor in CIRAT.

Contact Info:[email protected]

(205)542-8892

Appendix B: Experiment Consent Forms

The purpose of this research is to gauge factors of interest in various analytic methodologies.

Your participation involves a short instruction period, evaluating an intelligence scenario, and returning it to the administrator of the experiment. The instruction session should last no longer than 60 minutes and the evaluation can be completed at your convenience within the period of a week. Your name WILL NOT appear in any information disseminated by the researcher. Your name will only be used to notify professors of your participation in order for them to assign extra credit.

There are no foreseeable risks or discomforts associated with your participation in this study. Participation is voluntary and you have the right to opt out of the study at any time for any reason without penalty.

I, ____________________________, acknowledge that my involvement in this research is voluntary and agree to submit my data for the purpose of this research.

_________________________________ __________________

Signature Date

_________________________________ __________________

Structured methods thesis Experiment

Participation Consent Form

mailto:[email protected]

59

Printed Name Class

Name(s) of professors offering extra credit: ____________________________________

Researcher’s Signature: ___________________________________________________

If you have any further question about analytic methodology or this research you can contact me at [email protected].

Research at Mercyhurst College which involves human participants is overseen by the Institutional Review Board. Questions or problems regarding your rights as a participant should be addressed to Tim Harvey; Institutional Review Board Chair; Mercyhurst College; 501 East 38th Street; Erie, Pennsylvania 16546-0001; Telephone (814) 824-3372. [email protected]

Andrew Brasfield, Applied Intelligence Master’s Student, Mercyhurst College 205-542-8892

Kristan Wheaton, Research Advisor, Mercyhurst College 814-824-3021



60

Appendix C: Control & Experimental Group Tasking/Answer Sheets

You are a high-profile political analyst working for News Corporation X. You have been tasked to

forecast the winner of the 2008 Washington State Gubernatorial election, which will be decided

on November 4, 2008. To complete your task, use all available open source information. The

main candidates in this race are Christine Gregoire (D) and Dino Rossi (R). This will be a rematch

from the previous Washington State Gubernatorial election, which was hotly contested and

controversial. Your supervisor gave you a full week to prepare your forecast.

Use the National Intelligence Council (NIC) Words of Estimative Probability (WEP) as an indicator

of your forecast:

Remote Very Unlikely Unlikely Likely Very Likely Almost Certain

Example Forecast:

It is [WEP] that [Candidate Name] Will Win the 2008 Washington State Gubernatorial Election.

Record your final answers on the provided answer sheet. This answer sheet includes spaces for

your final estimate (WEP), Source Reliability, Analytic Confidence, and a short explanation of how

the evidence and subsequent analysis led to your final forecast. Please return all of the described

materials to the experiment administrator by the due date in order to receive extra credit from

your professor.

Task Due: 10/xx/2008

Experiment Administrator: Drew Brasfield, [email protected]

Structured Methods Thesis Experiment

GROUP 1 & 3 INSTRUCTIONS


61

Important Information:

Source Reliability:

Source Reliability reflects the accuracy and reliability of a particular source over time.

Sources with high reliability have been proven to be accurate and consistently reliable.

Sources with low reliability lack the accuracy and proven track record commensurate with

more reliable sources.

o Rate source reliability as low, medium, or high.

Analytic Confidence:

Analytic Confidence reflects the level of confidence an analyst has in his or her estimates

and analyses. It is not the same as using words of estimative probability, which indicate

likelihood. It is possible for an analyst to suggest an event is virtually certain based on

the available evidence, yet have a low amount of confidence in that forecast due to a

variety of factors or vice versa.

o To assess analytic confidence, mark your rating on the line given on the answer sheet. The

far left represents the lowest level of confidence while the far right represents absolute

confidence in your analytic judgment.

62

You are a high-profile political analyst working for News Corporation Y. You have been tasked to

forecast the winner of the 2008 Washington State Gubernatorial election, which will be decided

on November 4, 2008. To complete your task, use all available open source information. Also,

use ACH to structure your analysis. The main candidates in this race are Christine Gregoire

(D) and Dino Rossi (R). This will be a rematch from the previous Washington State Gubernatorial

election, which was hotly contested and controversial. Your supervisor gave you a full week to prepare your forecast.

Use the National Intelligence Council (NIC) Words of Estimative Probability (WEP) as an indicator

of your forecast:

Remote Very Unlikely Unlikely Likely Very Likely Almost Certain

Example Forecast:

It is [WEP] that [Candidate Name] Will Win the 2008 Washington State Gubernatorial Election.

Record your final answers on the provided answer sheet. This answer sheet includes spaces for

your final estimate (WEP), Source Reliability, Analytic Confidence, and a short explanation of how

the evidence and subsequent analysis led to your final forecast. Also include a print out of your

ACH matrix when returning the above materials. Please return all of the described materials to

the experiment administrator by the due date in order to receive extra credit from your professor.

Task Due: 10/xx/2008

Experiment Administrator: Drew Brasfield, [email protected]

Important Information:


GROUP 2 & 4 INSTRUCTIONS


63

Source Reliability:

Source Reliability reflects the accuracy and reliability of a particular source over time.

Sources with high reliability have been proven to be accurate and consistently reliable.

Sources with low reliability lack the accuracy and proven track record commensurate with

more reliable sources.

o Rate source reliability as low, medium, or high.

Analytic Confidence:

Analytic Confidence reflects the level of confidence an analyst has in his or her estimates

and analyses. It is not the same as using words of estimative probability, which indicate

likelihood. It is possible for an analyst to suggest an event is virtually certain based on

the available evidence, yet have a low amount of confidence in that forecast due to a

variety of factors or vice versa.

o To assess analytic confidence, mark your rating on the line given on the answer sheet. The

far left represents the lowest level of confidence while the far right represents absolute

confidence in your analytic judgment.

64

NAME:

FORECAST:

SHORT EXPLANATION (not required):

SOURCE RELIABILITY (circle one) :

LOW MEDIUM HIGH

ANALYTIC CONFIDENCE:

Lowest Level Highest Level

of Confidence of Confidence

-------------------------------------------------------------------------------------------------------

Lowest Level Highest Level


Answer Sheet

65

of Confidence of Confidence

Starting point:

http://www.politics1.com/wa.htm

Google/Google News

Types of relevant evidence:

● Incumbent/challenger popularity

● Election Polls

● Campaign spending

● Local issues relevant to the election

● Party issues

● National party support of incumbent/challenger

● Local economy

● State voting trends

● Voter registration

● Past elections

● Candidate debates

*This is not a list of required evidence to collect, but types of evidence that could be an indicator

for an election.

Other Important Information

http://www.politics1.com/wa.htm

66

Appendix D: Participant Debriefing Statement

Analysis of Competing Hypotheses

Participation Debriefing

Thank you for participating in this research process. I appreciate your contribution and willingness to

support the student research process.

The purpose of this study was to determine how well ACH mitigates cognitive bias and how accurate the

methodology is for forecasting in intelligence analysis, compared to unstructured methods. Only a handful

of experimental studies have been conducted on ACH, and this research hopes to contribute to the growing

body of literature on structured analytical methods. The experiment you participated in was designed to

test ACH’s capabilities against an unstructured method. Specifically, participants were organized into

experimental and control groups by political affiliation so that factors of interest could be measured.

As the US Intelligence Community faces recent intelligence failures, the use of advanced analytical

techniques will enhance the community’s quality of analysis and benefit US national security.

If you have any further questions about the Analysis of Competing Hypotheses or this research you can

contact me at [email protected].


67

Appendix E: Post Experiment Questionnaires

Follow-Up Questionnaire

Control Group

Thanks for your participation! Please take a few moments to answer the following questions. Your feedback is greatly appreciated. Your response to these questions will NOT affect whether or not you receive extra credit.

1. How much time did you spend working on the assigned task (hours)?

2. Why did you agree to participate in the experiment? (extra credit, other, etc.)

3. Do you feel you understood the assigned task as explained at the instruction session?

4. Were you able to find adequate open source information about the topic?

5. Please rate the level of difficulty in finding open source information related to the

topic:

1=Very difficult 5=Very Easy

1 2 3 4

6. Please provide any additional comments you may have about the Analysis of

Competing Hypotheses, the assigned task, or any other part of this experiment.

68

Follow-Up Questionnaire

ACH Group

Thanks for your participation! Please take a few moments to answer the following questions. Your feedback is greatly appreciated. Your response to these questions will NOT affect whether or not you receive extra credit.

1. How much time did you spend working on the assigned task (hours)?

2. Why did you agree to participate in the experiment? (extra credit, other)

3. Do you feel you understood the assigned task as explained at the instruction session?

4. Were you able to find adequate open source information about the topic?

5. Please rate the level of difficulty in finding open source information related to the

topic:

1=Very difficult 5=Very Easy

1 2 3 4 5

6. How helpful was ACH in creating your final estimate?

7. Please rate your understanding of ACH before participating in this experiment:1= No understanding of ACH 5=Very thorough understanding of ACH

1 2 3 4 5

8. Please rate your understanding of ACH after participating in this experiment:

1=No understanding of ACH 5=Very thorough understanding of

ACH

1 2 3 4 5

9. Please provide any additional comments you may have about the Analysis of

Competing Hypotheses, the assigned task, or any other part of this experiment.


2.625 .110 .804 66 .425

.809 63.934 .421



ForecastF Sig.




Test Statisticsb

82.000

202.000

-.034

.973

1.000a

Mann-Whitney U

Wilcoxon W

Z

Asymp. Sig. (2-tailed)

Exact Sig. [2*(1-tailedSig.)]

Forecast

Not corrected for ties.a.

Grouping Variable: Groupb.

Ranks

15 13.47 202.00

11 13.55 149.00

26

GroupControl

Experimental

Total

ForecastN Mean Rank Sum of Ranks

69

Appendix F: SPSS Testing

Accuracy

Group Statistics

38 1.3947 .49536 .08036

30 1.3000 .46609 .08510

GroupControl

Experimental

ForecastN Mean Std. Deviation

Std. ErrorMean

Mindsets –Democrats

Wilcoxon Rank Sum test value = -0.034, P-value = 0.973 is larger than ( = 0.05).

Test Statisticsa

183.000

373.000

-1.055

.291

Mann-Whitney U

Wilcoxon W

Z

Asymp. Sig. (2-tailed)

Forecast

Grouping Variable: Groupa.

70

Mindsets – Republicans

Ranks

23 23.04 530.00

19 19.63 373.00

42

GroupControl

Experimental

Total

ForecastN Mean Rank Sum of Ranks

Confirmation Bias

Group Statistics

30 1.2000 .40684 .07428

32 1.9063 .29614 .05235

GroupControl

Experimental

Confirmation BiasN Mean Std. Deviation

Std. ErrorMean


5.940 .018 -7.851 60 .000

-7.772 52.783 .000



Confirmation BiasF Sig.




Wilcoxon Rank Sum test value = -1.055, P-value = 0.291 is larger than ( = 0.05).

forecasting accuracy and cognitive bias in the analysis of competing hypotheses

Documents