means–ends epistemology schulte

31
Means–Ends Epistemology Oliver Schulte ABSTRACT This paper describes the corner-stones of a means–ends approach to the philosophy of inductive inference. I begin with a fallibilist ideal of convergence to the truth in the long run, or in the ‘limit of inquiry’. I determine which methods are optimal for attaining additional epistemic aims (notably fast and steady convergence to the truth). Means– ends vindications of (a version of) Occam’s Razor and the natural generalizations in a Goodmanian Riddle of Induction illustrate the power of this approach. The paper establishes a hierarchy of means-ends notions of empirical success, and discusses a number of issues, results and applications of means–ends epistemology. 1 The long run in the short run 2 Discovery problems 3 Two examples of discovery problems 3.1 The Occam problem 3.2 The Riddle of Induction 4 Standards of inductive success 5 Convergence to the truth and nothing but the truth: logical reliability 6 Fast convergence to the truth: data-minimality 7 Steady convergence to the truth: retraction-minimality 8 Steady convergence to the truth: minimaxing retractions 9 The hierarchy of hypothetical imperatives for inductive inference 10 Conclusion: a map of means–ends methodology 11 Appendix: Proofs 1 The long run in the short run Inquiry begins with uncertainty. Empirical inquiry uses evidence to find the answers to the questions that prompted investigation. If these questions are of a general nature, empirical inquiry immediately faces the problem of induction: how to generalize beyond the evidence. David Hume observed that all such generalizations are uncertain and hence involve a risk of error, and concluded that none has a normative justification over and above our customs or inferential habits (Hume [1984]). A traditional response to Hume’s scepticism seeks pure norms of inductive inference, enshrined in an ‘inductive logic’ or a ‘theory of confirmation’. This project motivated the work of Keynes ([1921]) and Carnap ([1962]), and counts modern-day Bayesians among its heirs (cf. Earman [1992]; Howson and Urbach [1989]; Howson [1997]). Brit. J. Phil. Sci. 50 (1999), 1–31 q Oxford University Press 1999

Upload: libertysean

Post on 22-Dec-2015

9 views

Category:

Documents


2 download

DESCRIPTION

Epistemology

TRANSCRIPT

Page 1: Means–Ends Epistemology Schulte

Means–Ends EpistemologyOliver Schulte

ABSTRACT

This paper describes the corner-stones of a means–ends approach to the philosophy ofinductive inference. I begin with a fallibilist ideal of convergence to the truth in the longrun, or in the ‘limit of inquiry’. I determine which methods are optimal for attainingadditionalepistemic aims (notably fast and steady convergence to the truth). Means–ends vindications of (a version of) Occam’s Razor and the natural generalizations in aGoodmanian Riddle of Induction illustrate the power of this approach. The paperestablishes a hierarchy of means-ends notions of empirical success, and discusses anumber of issues, results and applications of means–ends epistemology.

1 The long run in the short run2 Discovery problems3 Two examples of discovery problems

3.1 The Occam problem3.2 The Riddle of Induction

4 Standards of inductive success5 Convergence to the truth and nothing but the truth: logical reliability6 Fast convergence to the truth: data-minimality7 Steady convergence to the truth: retraction-minimality8 Steady convergence to the truth: minimaxing retractions9 The hierarchy of hypothetical imperatives for inductive inference

10 Conclusion: a map of means–ends methodology11 Appendix: Proofs

1 The long run in the short runInquiry begins with uncertainty. Empirical inquiry uses evidence to find theanswers to the questions that prompted investigation. If these questions are of ageneral nature, empirical inquiry immediately faces theproblem of induction:how to generalize beyond the evidence. David Hume observed that all suchgeneralizations are uncertain and hence involve a risk of error, and concludedthat none has a normative justification over and above our customs or inferentialhabits (Hume [1984]). A traditional response to Hume’s scepticism seeks purenorms of inductive inference, enshrined in an ‘inductive logic’ or a ‘theory ofconfirmation’. This project motivated the work of Keynes ([1921]) and Carnap([1962]), and counts modern-day Bayesians among its heirs (cf. Earman[1992]; Howson and Urbach [1989]; Howson [1997]).

Brit. J. Phil. Sci.50 (1999), 1–31

q Oxford University Press 1999

Page 2: Means–Ends Epistemology Schulte

Hans Reichenbach proposed an original alternative approach to the problemof induction: he supplied a ‘pragmatic vindication’ for a certain natural rule forestimating limiting frequencies of certain kinds of events, which he dubbed‘the straight rule’ (Reichenbach [1949], Juhl [1994], Salmon [1991]). Thestraight rule has the virtue that, if for some type of repeatable experiment therelative frequency of an outcome approaches a fixed limit, then the straight ruleproduces a sequence of conjectures that converges to that limit. Essentially,Reichenbach proposed ameans–endsjustification for an inference rule: hisrule is guaranteed to achieve the end of converging to the right answer, if thereis a right answer (limiting frequency) to be found. Reichenbach limited hispragmatic vindication to but one kind of rule for estimating probabilities.1 Hisstudent Hilary Putnam extended the idea to a more general setting andinvestigated inference rules that are guaranteed to settle on the right answerin the long run, or in the ‘limit of inquiry’. Putnam used his theory of long-runconvergence to criticize Carnap’s ‘confirmation functions’ on the grounds thatthey were not the best means for achieving convergence to the truth (Putnam[1963]). The ideal of guaranteed convergence in the limit of inquiry is a preciseformulation of a fallibilist, Peircean vision of empirical success in whichinquiry may never yield certainty but nonetheless settles on the truth asmore and more evidence is gathered. This idea is the philosophical core offormal learning theory, a mathematical framework for studying questionsof inductive inference with contributions from philosophers, logicians, andcomputer scientists (for a survey, see Kelly [1996]).

A long-standing criticism of Reichenbach’s pragmatic vindication has beenthat convergence in the limit is consistent with any counterintuitive behaviourin the short run. Salmon discussed this issue in depth, and concluded that, toyield short-run constraints on inductive inferences, Reichenbach’s means-endsanalysis had to be augmented by pure norms of inductive rationality (forexample, along Bayesian lines) (Salmon [1967], [1991]).

But there is no need to abandon the spirit of Reichenbach’s pragmatic,means–ends justification so quickly. Instead of appealing to ‘pure norms ofconfirmation’, we may consider epistemic goalsin addition tofinding the truth.In this paper, I show that taking other epistemic aims into consideration yieldsstrong constraints on what inference rules may conjecture in the short run.

I begin with a number of natural cognitive values: convergence to the truth,fast convergence to the truth, convergence to the truth with as few vacillations(retractions) as possible, and avoiding error. Corresponding to these objectives, Idefine a set ofstandards of successfor inductive methods that meet them. Thethree most interesting of these standards derive from the aims of settling on thetruth, and doing so quickly and with few vacillations. We may think of the

Oliver Schulte2

1 Reichenbach held that all inductive inference could be reduced to estimating probabilities.

Page 3: Means–Ends Epistemology Schulte

latter two considerations as criteria ofefficientconvergence to the right answer.I illustrate the import of these efficiency criteria in two well-known inductiveproblems: the Occamian problem of determining whether a certain entity existsor not, and a Goodmanian ‘Riddle of Induction’ involving a number ofalternative colour predicates for emeralds. The theory of efficient long-runconvergence provides a means–ends vindication of a version of Occam’sRazor (namely, ‘do not posit the existence of entities that are observable inprinciple but that you haven’t observed yet’) as well as the natural projectionrule in the Goodmanian Riddle (namely, project that ‘all emeralds are green’ aslong as only green emeralds have been observed).

The main result of this paper shows that the various standards of inductivesuccess that stem from the cognitive goals mentioned above fall into asystematichierarchy of feasibility: some cognitive objectives are feasiblewhenever certain others are, but not vice versa. In this sense some epistemicobjectives place more stringent requirements on inductive inquiry than others.These results establish a measure of theinductive complexityof a givenempirical problem: an inductive problemP is easier than another problemP0

if it is possible to attain a higher standard of success inP than inP0. This notionof complexity allows us to weigh on the same scale such diverse problems asthe Occam problem, Goodman’s Riddle of Induction, language learning anddetermining whether matter is finitely or infinitely divisible.

I address a number of questions that arise naturally for means–endsepistemology:

1. Consider a given standard of inductive success. What is thegeneralstructure of inductive problemsin which that standard is feasible?

2. When a given standard of inductive success is feasible, what are thefeatures of the inductivemethodsthat attain that standard?

3. Are theretensions and trade-offsamong various cognitive goals? If so,how great is the conflict?

4. What intuitively plausible norms for scientific reasoning havemeans–ends justifications?

5. Do the methodological recommendations from means–ends analysisdepend on thelanguagein which evidence and hypotheses are framed?

Let us begin with some fundamental concepts from learning theory.

2 Discovery problemsLearning theory studies several classes of inductive problems, such as makingpredictions, testing hypotheses, inferring general theories, and others. Thispaper examines the following type of problem: consider a collectionH ofmutually exclusive alternative hypotheses under investigation. Given some

Means–Ends Epistemology 3

Page 4: Means–Ends Epistemology Schulte

piece of evidence, which of the alternative hypotheses should the agentconjecture? Following Popper’s and Kelly’s usage, I refer to such problemsasdiscovery problems(Popper [1968], Kelly [1996]).2 The general definitionof a discovery problem is as follows. Let a set of evidence items be given (e.g.observations of the colours of emeralds, sightings of particles, positions ofplanets, and so on). Adata streamis an infinite discrete sequence of evidenceitems. For example, if the evidence statements are either ‘this emerald is green’or ‘this emerald is blue’, then one possible data stream is the infinite sequenceof observations of green emeralds. If the evidence statements are either ‘theQ

particle has appeared’ or ‘theQ particle has not appeared’, a possible datastream is the infinite sequence of never observing the particleQ. If « is a datastream, then«jn denotes the firstn observations in the data stream.

For my present purposes, I shall refer to a set of data streams as anempiricalproposition; this is the set of data streams on which the hypothesis is correct,or true.3 For example, since the empirical content of the hypothesis ‘allemeralds are green’ is just the data stream featuring only green emeralds, Iidentify ‘all emeralds are green’ with the singleton containing that data stream.An empirical propositionK represents the inquirer’s background knowledgeabout what observation sequences are possible. Now we are ready to define:

Definition 1: A discovery problemis a pair (H ,K), where K is an empiricalproposition representing background knowledge, andH is a collection ofmutually exclusive empirical hypotheses—that is, exactly one of the alterna-tive hypotheses is correct on a given data stream.

An inference rule, or inductive method, d produces an empirical propositiond(e) as its current theory in response to a finite evidence sequencee. There are ofcourse other kinds of inductive methods, for example ones that revise ‘degrees ofbelief’, as Bayesian methods do. In principle, means–ends rationality can guideagents in revising any epistemic state.4

Many important inductive problems from a variety of settings take the formof discovery problems. These include language learning (Oshersonet al.[1986]); parameter estimation and ‘model selection’ in statistics; and inferringtheories in scientific disciplines such as particle physics (Schulte [1997]) andcognitive neuropsychology (Glymour [1994], Bub [1994]). In this paper, I will

Oliver Schulte4

2 Kelly does not require the alternative hypotheses to be mutually exclusive (see Kelly [1996]).3 The operative notion of correctness may embody virtues of theories other than truth, for example

empirical adequacy or problem-solving ability (Laudan [1977], Kitcher [1993]). The results inthis paper presuppose only that correctness is some relation between hypotheses and datastreams.

4 Putnam showed how means–ends analysis yields a critique of ‘confirmation functions’ thatproduce ‘degrees of confirmation’ in light of new evidence (Putnam [1963]). For learning-theoretic treatments of Bayesian updating, see, for example, Earman ([1992], Ch. 9), Oshersonand Weinstein ([1988]), Kellyet al.([1997]), Kelly and Schulte ([1995]), Juhl ([1997]). Ga¨rdenfors([1988]) and Spohn ([1988]) are among the writers who study the revision of full belief.

Page 5: Means–Ends Epistemology Schulte

consider two fairly simple but well-known and instructive inductive tasks: (1)Does a certain kind of entity exist?—I call this theOccam problem—and (2)a version of Goodman’s celebratedRiddle of Inductioninvolving variousalternative colour predicates for emeralds.

3 Two examples of discovery problems

3.1 The Occam problemSuppose a scientist wants to investigate by empirical means whether a certaintype of entity, say anQ particle, exists. Imagine that the physicist undertakes aseries of experiments (bigger accelerators, more sophisticated detectiondevices, pure neutrino beams). As inquiry continues indefinitely, scientistsobtain increasing segments of an infinite sequence of experimental outcomes—a data stream. There are two hypotheses under investigation: (1) ‘theQ

particle exists’ and its complement, (2) ‘theQ particle does not exist’. Tokeep matters simple, let’s grant the physicist generous assumptions about thepowers of her experimental apparatus: I assume that if an experiment detectsthe particle in question, then the particle exists, and that if all experiments failto detect a particle, then the particle does not exist (in other words, the particleis nothidden for ever). Thus the hypothesis ‘Q exists’ is true just in case one ofthe scientists’ experiments registers anQ particle.5 A natural inference rule forthis problem is to conjecture thatQ does not exist until it has been detected inan experiment. I call this rule theOccam rule because it follows Occam’sRazor (‘do not needlessly multiply entities’). From the point of view ofmeans–ends methodology, whether or not we should follow Occam’s Razordepends on the extent to which that maxim furthers epistemic aims. In Section8, I will show that the Occam rule is indeed the optimal inference procedure forthe Occam problem, with respect to certain cognitive values specified inSection4.

3.2 The Riddle of InductionIn his ‘New Riddle of Induction’, Nelson Goodman introduces an unusualcolour predicate for emeralds.

Suppose that all emeralds examined before a certain timet are green [ . . . ]Our evidence statements assert that emeralda is green, that emeraldb isgreen, and so on . . . Now let me introduce another predicate less familiar

Means–Ends Epistemology 5

5 Instead of assuming that particles are observable, we could treat the scientists as aiming forempirically adequate theories rather than true ones, in the manner of van Fraasen ([1980]). Thehypothesis ‘Q does not exist’ is empirically adequate just in case scientists never observeQ (yetfalse if there is a hiddenQ particle). Section8 below has some discussion of the methodologicalrole of unobservable particles.

Page 6: Means–Ends Epistemology Schulte

than ‘green’. It is the predicate ‘grue’ and it applies to all things examinedbeforet just in case they are green but to other things just in case they areblue. Then at timet we have, for each evidence statement asserting that agiven emerald is green, a parallel evidence statement asserting thatemerald is grue. ([1983])

The question is whether we should conjecture that all emeralds are green ratherthan that all emeralds are grue when we obtain a sample of green emeraldsexamined before timet, and if so, why. I shall treat this as a question aboutoptimal inference in a discovery problem, in which the set of alternativehypotheses comprises the universal generalizations of the various colourpredicates under consideration. To see what the empirical content of thesehypotheses is, notice that they determine, for each ‘examination time’, aunique colour for the emerald examined at that time. Thus the empiricalcontent of the claim ‘all emeralds are green’ is that at each time, the emeraldexamined at that time is green; that is, the empirical content comprises the onedata stream on which only green emeralds are observed. The empirical contentof ‘all emeralds are grue’ comprises the single data stream on which allemeralds examined before the critical timet are green, and those examinedat time t and later times are blue. If not all emeralds are examined, then ‘allemeralds are green (grue)’ may be empirically adequate yet false, namely ifall emeralds that have been or will be examined are green (grue) but theunexamined ones are not. In what follows, I shall not be concerned with thispossibility.6 In particular, for the sake of more natural expression, I will use theterm ‘all emeralds’ to mean implicitly the same as ‘all examined emeralds’; forexample, I will say that the conjecture ‘all emeralds are green’ is correct, ortrue, on the data stream along which only green emeralds are found.

A discovery problem is defined by the range of alternative hypotheses underconsideration and given background knowledge. In the Riddle of Induction,what shall we take as the hypotheses under serious investigation, or, as Goodmanmight say, as candidates for projection? Let’s writegrue(t) for the grue predicatewith critical time t. In this paper, I examine theinfinitely iterated Riddle ofInduction , which includes allgrue(t) predicates as alternative hypotheses—candidates for projection—for any natural numbert.7 The backgroundknowledge in the infinitely iterated Riddle of Induction is that one of

Oliver Schulte6

6 As in the Occam problem, we might assume that in the long run, all existing emeralds will beexamined. Or we might just not care about emeralds that are forever hidden from sight; in otherwords, we may concern ourselves with empirical adequacy only.

7 There are other versions of the Riddle of Induction: in theone-shotRiddle, we consider only twohypotheses, ‘all emeralds are green’ and ‘all emeralds are grue’ for some fixed critical timet; seealso Section8. In afinitely iteratedRiddle, the alternatives include ‘all emeralds are green’ and‘all emeralds aregrue(1)’, ‘all emeralds aregrue(2)’, . . . , ‘all emeralds aregrue(m)’ up to somelast critical timem. Other formulations permit more than one blue-green colour change to occur(e.g. colour predicates such as ‘gruegr’ and ‘bleenbl’). Schulte ([forthcoming]) examines all ofthese versions.

Page 7: Means–Ends Epistemology Schulte

these alternatives is true. Figure 1 illustrates the infinitely iterated Riddle ofInduction. To keep the example simple, I don’t include thebleen(t) predicatesamong the candidates for projection. It is straightforward to extend themethodological analysis to Riddles of Induction that include ‘bleen’ predicates,but that would not yield more philosophical insight.

Now for inference rules or, as Goodman calls them,projection rules. Thenatural projection rulein the infinitely iterated Riddle of Induction conjecturesthat all emeralds are green as long as all emeralds observed so far are green. If ablue emerald is found at timet for the first time, the natural projection ruleconcludes that all emeralds aregrue(t).

From the point of view of means–ends methodology, whether an agentshould follow the natural projection rule depends on whether it furthersepistemic aims. In Section8, I will show that the natural projection rule is

Means–Ends Epistemology 7

Fig 1. The infinitely iterated Riddle of Induction.

Page 8: Means–Ends Epistemology Schulte

indeed the optimal inference procedure for the infinitely iterated Riddle ofInduction, with respect to certain cognitive values—the same ones that singleout the Occam rule as the best one in the Occam problem. I now turn tospecifying these cognitive values.

4 Standards of inductive successEpistemologists have proposed a number of desiderata for empirical inquiry,including: convergence to a correct theory, fast convergence to a correcttheory, convergence with few retractions and vacillations, providing theorieswith interesting content, avoiding error, and defining scientific theories parsi-moniously with few postulates, or ‘laws of nature’. An inductive method mayperform well with respect to one or more of these aims in some circumstancesbut not in others. To compare the performance of methods with regard to arange of possible ways the world might be—more precisely, with regard to allthe data streams consistent with background knowledge—I rely on twofamiliar principles from decision theory:admissibility and minimax.8 Amethod isadmissibleiff it is not dominated. In general, an actA dominatesanother actA0 if A necessarily yields results at least as good as those ofA0, andpossibly better ones, where a given collection of ‘possible states of the world’determines the relevant sense of necessity and possibility. An actA minimaxesif the worst possible outcome fromA is as good as the worst possible outcomefrom any other act. (I give precise definitions of these notions for the inductivecontext below.) Combining the 6 epistemic desiderata mentioned with the2 evaluation criteria we arrive at 12 performance standards for inductivemethods (for example, one of these would be ‘choose admissible methodsfor convergence to a correct theory’).

I refer to these standards of empirical success as ‘means-ends recommenda-tions’ or as ‘hypothetical imperatives’ for inductive inference. The threefollowing sections define and discuss the three most interesting performancestandards: admissibility with respect to convergence (reliability), admissibilitywith respect to convergence time (time efficiency), and minimaxing retractions(avoiding vacillations).

5 Convergence to the truth and nothing but the truth: logicalreliability

Through the ages, sceptical arguments dating back at least to Sextus Empiricushave aimed at showing that we cannot establish generalizations from a finite

Oliver Schulte8

8 Another prominent choice principle is to maximize (subjective) expected utility. I shall say moreabout expected utility at the end of Section9.

Page 9: Means–Ends Epistemology Schulte

sample with certainty—for the very next observation might refute our generalconclusions (Sextus Empiricus [1985]). This observation is also the basis ofHume’s celebratedproblem of induction(Hume [1984]). A fallibilist responseis to give up the quest for certainty and require only that science eventuallysettle on the right answer in the ‘limit of inquiry’, without ever producing asignal as to what the right answer is. As William James put it, ‘no bell tolls’when science has found the right answer (James [1982]). This conception ofempirical success runs through the work of Peirce, James, Reichenbach,Putnam, and others. For discovery problems, we may render it in a precisemanner as follows.

An empirical propositionP is consistentif it is correct on some data stream(i.e. P is not empty). An empirical propositionP entails another empiricalpropositionP0 just in caseP0 is correct wheneverP is (i.e.P is a subset ofP0). IfH is a collection of alternative hypotheses, then say that an inductive methodd converges to the correct hypothesis on a data stream by timen just in caseforever after, the method’s output is consistent and entails the hypothesiscorrect for that data stream. If the method converges to the correct hypothesisby some time (on a given data stream), then we say simply that itconverges tothe correct hypothesis(on that data stream).

The Occam rule has the virtue of converging to the right hypothesis oneverydata stream. If theQ particle exists, then (by our assumptions) it will eventuallybe observed; at that time, the Occam rule concludes with certainty that theQ

particle exists. If theQ particle does not exist, the Occam rule always entailsthis fact, right from the beginning—although never with certainty.

Similarly, in the infinitely iterated Riddle of Induction, the natural projectionrule is guaranteed to converge to the right hypothesis. If a blue emerald isever found, the natural projection rule conclusively identifies the correctgeneralization about emerald colours. If all emeralds are green, the naturalprojection rule always entails this fact, right from the beginning (withoutcertainty). Following Kelly ([1996]), I refer to inductive methods that areguaranteed to eventually entail the right answer no matter what the rightanswer is as (logically)reliable. Logical reliability is the core notion offormal learning theory (Gold [1967], Putnam [1965]).

Definition 2: Let H be a collection of alternative hypotheses, and let K begiven background knowledge. An inductive method isreliable for the discoveryproblem (H ,K) ⇔ the method converges to the correct hypothesis on everydata stream consistent with background knowledge K.

An unreliable method in the Occam problem would be that of a theoristwho maintains his faith in the existence of theQ particle no matter howmany experiments have failed to discover it; on the data stream of infinitelymany failures, this theorist converges to a wrong theory. Similarly, a ‘grue’

Means–Ends Epistemology 9

Page 10: Means–Ends Epistemology Schulte

aficionado would be unreliable in the infinitely iterated Riddle of Induction: ifhe kept conjecturing that some future emerald is blue, he would never arrive atthe generalization that all emeralds are green if in fact they are.

Reliable methods succeed in finding the correct hypothesis where unreliablemethods fail. Those whose aim in inquiry is to find a correct theory prefermethods that converge to the truth on a wider range of possibilities. Forexample, the thrust of Putnam’s critique of Carnap’s confirmation functionswas that Carnap’s confirmation functions are not the best for detecting regu-larities among the data, in the sense that other methods succeed in doing soover a wider range of possibilities (Putnam [1963]). For an evaluation ofPutnam’s argument, see Kellyet al.([1994]). This is just the decision-theoreticprinciple of admissibility applied to inductive methods. The admissibilityprinciple yields the following criterion for comparing the performance oftwo inductive methods with respect to convergence time.

Definition 3: LetH be a collection of alternative hypotheses, and let K be givenbackground knowledge. In the discovery problem (H ,K), an inductive methodddominatesanother inductive methodd0 with respect to convergence⇔1. Background knowledge K entails thatd converges to the correct hypothesis

wheneverd0 does, and2. there is some possible data stream, consistent with background knowledge

K, on whichd converges to the correct hypothesis andd0 does not.An inductive method isconvergence-admissiblefor a discovery problem⇔ themethod is not dominated in that problem with respect to convergence.

It is clear that reliable methods are convergence-admissible because theyeventually arrive at the truth oneverydata stream (consistent with given back-ground knowledge). The converse holds as well: less than fully reliable methodsare dominated with respect to convergence. Thus applying the admissibilityprinciple to the aim of converging to the truth leads to logical reliability.

Proposition 4: Let H be a collection of alternative hypotheses, and let K begiven background knowledge. An inductive method is convergence-admissiblefor the discovery problem (H , K) ⇔ the method is reliable for that problem.

The proof is in Schulte ([forthcoming]). Learning theorists have studied thestructure of discovery problems with reliable solutions extensively, as well asthe properties of reliable methods for given discovery problems. Kelly ([1996],especially Chapter 9), and Oshersonet al.([1986]) introduce many of the key ideas.

I now turn to other epistemic aims in addition to convergence to the truth.

6 Fast convergence to the truth: data-minimalityOther things being equal, we would like our methods to arrive at a correct

Oliver Schulte10

Page 11: Means–Ends Epistemology Schulte

theory sooner rather than later. In isolation from other epistemic concerns,minimizing convergence time is a trivial objective: an inquirer who neverchanges her initial conjecture converges immediately. The interesting questionis which reliable methods converge as fast as possible. We can use theadmissibility principle to evaluate the speed of a reliable method as follows.

Definition 5: Let H be a collection of alternative hypotheses, and let K bebackground knowledge. In the discovery problem (H ,K), an inductive methodd dominatesanother inductive methodd 0 with respect to convergence time⇔1. background knowledge K entails thatd converges at least as soon asd0 does,

and2. there is some data stream, consistent with background knowledge K on

whichd converges befored 0 does.An inductive methodd is data-minimalfor a discovery problem (H ,K) ⇔ d isnot dominated in (H ,K) with respect to convergence time by another methodd 0 that is reliable for that problem.

The term ‘data-minimal’ expresses the idea that methods that converge assoon as possible make efficient use of the data (cf. Gold [1967], Kelly [1996]).Data-minimal methods are the ones that satisfy a simple, intuitive criterion.Let’s say that a methoddoes not take its conjecture seriouslyat a given stageof inquiry if background knowledge entails that the method will abandonits conjecture later no matter what evidence it receives. Then the reliabledata-minimal methods are exactly those reliable methods that take theirconjectures seriously at each stage of inquiry.

If a method does take its conjecture seriously at a given stage of inquiry, thenthere is a data stream, consistent with background knowledge and the evidenceobtained so far, on which the method converges to its current hypothesis by thecurrent stage of inquiry. In this case, I say that the methodprojects its currenthypothesis. Thus the reliable data-minimal methods are exactly those reliablemethods that project their conjectures at each stage of inquiry.

For example, on the data stream that never turns up theQ particle, the Occamrule projects its conjecture ‘theQ particle does not exist’ at each stage of inquiryalong that data stream. By contrast, consider a cautious inference procedure forthe Occam problem that waits until more experiments are performed beforemaking a conjecture about the existence of theQ particle. While the cautiousprocedure is waiting, it (trivially) does not project its conjecture along any datastream, because it is not conjecturing any alternative hypothesis.

The natural projection rule in the infinitary Riddle of Induction projects ‘allemeralds are green’ at each stage along the data stream featuring only greenemeralds. By contrast, another inference rule might wait until one or more of the‘critical times’ have passed before generalizing. Such a cautious inference rulefails toprojectauniversalgeneralizationaboutemeraldcolourswhile it iswaiting.

Means–Ends Epistemology 11

Page 12: Means–Ends Epistemology Schulte

Here’s why a data-minimal method must always take its current conjectureseriously: if a method fails to take its conjecture seriously on some finite datasequencee consistent with the given background knowledgeK, then we canspeed it up on some data stream« consistent with that background knowledgeand evidencee without slowing it down on other data streams, because themethod is not converging on other data streams anyway. Conversely, if areliable methodd always takes its conjectures seriously given the backgroundknowledgeK, any other methodd 0 that seeks to find the true hypothesis befored on some data stream« must disagree with the conjecture ofd at some pointalong«, say at stagen. But sinced always takes its conjectures seriously, it doesso after receiving evidence«jn, and locks onto its conjecturedð«jn) on somedata streamt extending«jn. Since we supposed thatd 0 disagrees withd on«jn ¼ tjn, the theory ofd 0 on «jn ¼ tjn must be false on the data streamt. Sothe methodd 0 converges by timen on data stream«, but only after stagen ondata streamt (if at all). Henced is faster thand 0 ont. This shows that methodsthat always take their conjectures seriously—that always project theirconjectures—are data-minimal.

Theorem 6:LetH be a collection of alternative empirical hypotheses, and letK be given background knowledge. A reliable method is data-minimal for thediscovery problem (H ,K) ⇔ on each finite data sequence consistent withbackground knowledge K, the method projects its conjecture on that datasequence given background knowledge K.

The formal proof is in Schulte ([forthcoming]).

7 Steady convergence to the truth: retraction-minimalityThomas Kuhn argued that one reason for sticking with a scientific paradigm introuble is the cost of re-training and re-tooling the scientific community (Kuhn[1970]). The literature concerning ‘minimal change’ belief revision starts withthe idea that minimizing the extent of retractions is a plausible desideratum fortheory change (Ga¨rdenfors [1988]). Similarly, learning theorists have investi-gated methods that avoid ‘mind changes’ (Putnam [1965], Case and Smith[1983], Sharmaet al. [1997]). For discovery problems, this motivates adifferent criterion for evaluating the performance of a method on a givendata stream: we want methods whose conjectures vacillate as little as possible.Consider an inductive method for a collection of alternatives. I say that amethodretracts its conjecture on a data sequencee1, . . . ,en,enþ 1, orchangesits mind,9 if

Oliver Schulte12

9 Of course, rules do not have minds to change—only rule-followers do. But for my presentpurposes, there is no equally brief and vivid alternative to speaking of a rule or a methodchanging its mind.

Page 13: Means–Ends Epistemology Schulte

1. the method’s theory on the previous datae1, . . . ,en is consistent and entailsone of the alternative hypotheses, and

2. the method’s theory on the current datae1, . . . ,en,enþ 1 either entails adifferent hypothesis, or fails to entail any of the alternative hypotheses.

The aim of avoiding retractions by itself is trivial, because the sceptic whoalways conjectures exactly the evidence never retracts anything. But canwe use this aim toselect amongthe reliable methods the ones that avoidretractions, as we did with convergence time? In this section, I consider theconsequences of using the admissibility principle to do so. We may defineadmissibility with respect to avoiding retractions as we did with respect toconvergence time in Definition 5, replacing time-to-convergence with numberof mind-changes.

Definition 7: Let H be a collection of alternative hypotheses, and let K bebackground knowledge. In the discovery problem (H ,K), an inductive methodddominatesanother inductive methodd0 with respect to mind changes⇔1. background knowledge K entails thatd changes its mind no more often on a

data stream thand 0 does, and2. there is some data stream, consistent with background knowledge K, on

whichd changes its mind less often thand 0 does.

A methodd is retraction-minimal for a discovery problem (H ,K) ⇔ d is notdominated inðH ,K) with respect to mind changes by another methodd 0 that isreliable for that problem.

What does retraction-minimality amount to? Avoiding mind-changes pullsan inquirer away from generalizations that may have to be taken back later, andtowards the ‘sceptic’s’ caution about going beyond the available evidence. Butlong-run reliability forces a sceptic to take a chance eventually and conjecturea hypothesis that goes beyond the evidence (if the discovery problem is agenuine problem of induction, in which the evidence does not eventually entailthe correct hypothesis conclusively). But because long-run reliability is con-sistent with any behaviour in the short run, the sceptic may delay this momentand the risk of having to retract her theories, for as long as she pleases. For anyamount of time that the sceptic may choose to delay taking an inductive risk,she could have delayed more and been just as reliable in the long run. It followsthat a reliable method is retraction-minimal just in case it never changes itsmind; that is, just in case it never goes beyond the available evidence. Hence ifthere is a reliable retraction-minimal method, it must be possible to ‘wait andsee’ until the evidence settles which theory is true. The next propositionformulates this phenomenon precisely.

Proposition 8: Let H be a collection of alternative hypotheses, and let K begiven background knowledge. Then there is a reliable retraction-minimal

Means–Ends Epistemology 13

Page 14: Means–Ends Epistemology Schulte

method for the discovery problem (H ,K) ⇔ the background knowledge Kentails that eventually the evidence will conclusively establish one of thealternative hypotheses as the correct one.

The proof is in Section11.Proposition 8 points to an interesting methodological phenomenon: in

discovery problems that are genuinely inductive, myopically avoiding retrac-tions at each stage of inquiry leads to inferences that are not reliable in the longrun. An example will help to illustrate this.

In their famous Turing address, Newell and Simon formulated thephysicalsymbol system hypothesis, according to which there is some computer programwhose intelligence matches that of humans (Newell and Simon [1976]).10 Ifcognitive scientists experience a series of failures in building an intelligentsystem, they must eventually abandon the physical symbol system hypothesis,or else risk the possibility that they go on for eternity searching for the path tomachine intelligence when there is none (cf. Kelly [1996], ch. 7). But justwhenshould they become pessimistic about the prospects of cognitive science?Suppose that the researchers try programsm1, m2, . . . ,mn, in vain, and nowconsider whether to give up the physical symbol system hypothesis, or else totry one more programmnþ 1 before they conjecture that machine intelligence isimpossible. As far as long-run reliability is concerned, it makes no differencewhether they give up the belief in machine intelligence after seeing the failureson the first n machines, or after trying another one. But with respect toretractions, maintaining faith in cognitive science until the systemmnþ 1 hasbeen tried dominates giving up the belief in cognitive science beforehand. Forif mnþ 1 is successful, the researchers need not have retracted their belief in thephysical symbol system hypothesis.

But if mnþ 1 fails too, AI researchers may reason in the same way again:trying one more systemmnþ 2 before recanting their faith in artificial intelli-gence might save them a retraction, without risking any additional ones. If theresearchers continue to avoid changing their belief in cognitive science in thisway, they will never recognize that machine intelligence is impossible if itactually is.

The cognitive scientists’ dilemma is an instance of the general problem ofwhen exactly an inquirer or a group of inquirers should abandon their currentparadigm. The scientists must eventually jump ship if they want to avoidfollowing the wrong paradigm forever. But as Thomas Kuhn observed, there istypically no particular point at which the revolution must occur (Kuhn [1957]).Indeed, short-run considerations such as avoiding the embarrassment of dis-missing their prior work—that is, avoiding retractions—pull scientists in the

Oliver Schulte14

10 The other part of the physical symbol system hypothesis is the converse: that any intelligentsystem must have the capacities of a computer.

Page 15: Means–Ends Epistemology Schulte

direction of conservatism. Kellyet al. ([1997], section 4) discuss the problemof deciding among scientific paradigms from a reliabilist perspective.

We saw that applying admissibility to the aim of avoiding retractions yieldsa standard of performance that is too high for the interesting inductiveproblems in which we cannot simply rely on the evidence and backgroundknowledge to eliminate all but the true alternative. Learning theoristshave examined another decision criterion by which we may evaluate theperformance of a method with respect to retractions: the classic minimaxcriterion. Unlike retraction-minimality, minimaxing retractions is feasibleeven when there is a genuine problem of induction. Indeed, this criterionturns out to be a very fruitful principle for deriving constraints on the short-run inferences of reliable methods.

8 Steady convergence to the truth: minimaxing retractionsThe minimax principle directs an agent to consider theworst-caseresults of heroptions and to choose the act whose worst-case outcome is the best. So tominimax retractions with respect to given background assumptions, we considerthe maximum number of times that a method might change its mind if thebackground assumptions are correct. Suppose background knowledge entailsthat an inductive methodd changes its mind no more thann times on any datastream, whereas background knowledge does not rule out that another methodd 0 may change its mind more thann times on some data stream. Then theprinciple of minimaxing retractions directs us to prefer the methodd to themethodd 0. The principle of minimaxing retractions by itself is trivial, becausethe sceptic who always conjectures exactly the evidence never retracts any-thing. But using the minimax criterion toselect amongthe reliable methods theones that minimax retractions yields interesting results, as we shall see shortly.The following definition makes precise how we may use the minimax criterionin this way.

Definition 9: Suppose thatd is a reliable discovery method for a collectionof alternative hypothesesH given background knowledge K. Thend mini-maxes retractions⇔ there is no other reliable methodd0 for the discoveryproblem (H ,K) such that the maximum number of times thatd might changeits mind, given background knowledge K, is greater than the same maximumfor d 0.

If there is no bound on the number of times that a reliable method mayrequire a change of mind to arrive at the truth (for examples of such discoveryproblems see Kelly ([1996], ch. 4), or the Hypergrue problem described inSchulte ([forthcoming]), there is no maximum number of mind changes for

Means–Ends Epistemology 15

Page 16: Means–Ends Epistemology Schulte

reliable methods, and the minimax criterion has no interesting consequences.11

But if we know that a reliable method can succeed in identifying the correcthypothesis without ever using more thann mind changes, the principle selectsthe methods with the best such bound on vacillations. I say that a methodidentifies a true hypothesis from a collection of alternativesH given back-ground knowledgeK with at most n retractionsif background knowledge entailsthat (1) the method will identify the correct alternative hypothesis, and (2) thatthe method changes its mind at mostn times on any given data stream. The goalof minimaxing retractions leads us to seek methods that succeed with as fewmind changes as possible; learning theorists refer to this paradigm as discoverywith bounded mind changes(Kelly [1996], ch. 9, Sharmaet al. [1997]).

To get a feel for what minimaxing mind changes requires, consider a theoristwhose initial conjecture in the Occam problem is that theQ particle does exist.Reliability demands that she eventually change her mind if theQ particle failsto appear. But after the point at which she finally concludes that the particlewill not be found, a more powerful experimental design could turn it up,forcing her to change her mind for thesecondtime. By contrast, the Occam rulenever retracts its conjecture if theQ particle does not exist, and changes itsmind exactly once if it does appear, to conclude (with certainty) that it exists.Hence the Occam rule changes its mind at most once, whereas the theorist

Oliver Schulte16

11 Sharmaet al. ([1997]) present some interesting generalizations that apply even in this case.Briefly, the idea is that methods announce mind-change bounds together with a conjecture, andmay dynamically revise their current mind-change bound as they receive further evidence.

Fig 2. In the Occam Problem, violating Occam’s Razor may lead to two retractions.

Page 17: Means–Ends Epistemology Schulte

beginning with faith in the existence of theQ particle may have to change hermind twice in the worst case; see Figure 2.

So reliability and the principle of minimaxing retractions select the Occamrule over the latter inference procedure in this particular problem. If we combinethis with data-minimality, minimizing convergence time requires a theorist toimmediately make a conjecture about whether theQ particle exists or not, andreliability together with minimaxing retractions dictate that this conjecture mustbe ‘theQ particle does not exist’. Hence we have the following result.

Proposition 10:The only inductive method that is reliable, data-minimal andminimaxes retractions in the Occam problem is the Occam rule.

Thus reliability and efficiency considerations underwrite a variant ofOccam’s rule, namely: do not posit the existence of entities that are observablein principle, but that have not yet been observed. Note that this means–endsrecommendation agrees with Occam’s Razor only about entities that areobservable in principle: the very same epistemic aims mayrequirea particletheorist topositthe existence of hidden particles! Briefly, it can be shown thatreliability and minimaxing retractions mandate a theorist to choose a ‘closestfit’ to the available evidence of observed particle reactions; that is, theseprinciples select the theory that conjectures that the reactions seen so far arethe only possible ones. If we follow current practice in particle physics andseek conservation principles (selection rules or quantum properties) thatdescribe which particle reactions are possible, we find that the logic ofconservation principles entails that sometimes the theory that provides theclosest fit to the available evidence must introduce hidden particles into itsontology. Schulte ([1997]) develops a model of theory discovery in particlephysics and shows in detail why this is so; there is no room to pursue thisapplication of reliabilist methodology here.

The same means–ends vindication for Occam’s Rule holds good forthe natural projection rule in Goodman’s Riddle of Induction. First, data-minimality requires projecting one of the alternative colour predicates ateach stage of inquiry. Second, consider an unnatural projection rule thatconjectures that ‘all emeralds aregrue(n)’ when a sample ofk green emeraldshas been obtained, for somek< n. If green emeralds continue to be found pastthe critical timen, this projection rule must eventually change its mind toproject ‘all emeralds are green’—otherwise it would converge to a mistakengeneralization about the colour of emeralds (namely ‘all emeralds aregrue(n)’)and would be unreliable. But after the point at which the unnatural rule projectsthat all emeralds are green, a blue emerald may be found, forcing the rule tochange its mind for thesecondtime.

By contrast, the natural projection rule changes its mind at most once: not atall if all emeralds are green, and once to conclude (with certainty) that all

Means–Ends Epistemology 17

Page 18: Means–Ends Epistemology Schulte

emeralds aregrue(k) if the K-th emerald is found to be blue. Since theunnatural projection rule retracts its conjecture twice in the worst case, theprinciple of minimaxing retractions rejects it in favour of the natural projectionrule;12 as we see in Figure 3.

The next proposition asserts that theonly reliable, data-minimal projectionrule that minimaxes retractions is the natural one. The argument for this isessentially the same as for Proposition 10; the formal proof is in Schulte([forthcoming]).

Proposition 11 [with Kevin Kelly]: In the Riddle of Induction, the only

Oliver Schulte18

12 Kevin Kelly was the first to observe this fact.

Fig 3. An unnatural projection rule may have to retract its conjecture twice—in thiscase, if all emeralds aregrue(3).

Page 19: Means–Ends Epistemology Schulte

reliable and data-minimal projection rule that minimaxes retractions is thenatural one.

The similarity of Figures 2 and 3 suggests that the Occam problem and theinfinitely repeatedRiddle of Induction share a commonstructure,and that it is thisstructure which makes the natural inferences optimal by our means–ends criteria.The structure in question is thetopology of the possible data streamsin relation tothe alternative hypotheses under investigation. Schulte ([forthcoming]) givesan explicit characterization of this structure for arbitrary discovery problems.More precisely, Schulte ([forthcoming]) specifies necessary and sufficientconditions on the topology of the data streams that in a given discoveryproblem must hold if it is possible to reliably identify the correct hypothesiswith a bounded number of retractions in that problem. This characterizationtheorem shows that two facts are crucial in singling out the natural projectionrule as the optimal one in the infinitely repeated Riddle of Induction. First, if allemeralds aregreen, no finite sample of emeralds will be observed that entailsthis fact. Second, if all emeralds aregrue(t), for some critical timet > 0, then bycontrast we will observe a finite sequence of emeralds that falsifies all alter-natives to ‘all emeralds aregrue(t)’ under consideration—namelyt¹1 greenemeralds followed by a blue one. Similarly, the characterization shows that thetwo crucial facts in the Occam problem are, first, that no finite number ofexperiments entails that theQ particle will never be found, whereas second, iftheQ particle is discovered, this conclusively establishes its existence.

We saw that in the Occam problem and in the infinitely iterated Riddle ofInduction, the results of our means–ends analysis depend only onlogicalrelations of verification or falsification between evidence and hypothesis. Thecharacterization theorem cited establishes that this is the case in any discoveryproblem. Since acceptable translations from one language into another preservelogical entailment, it follows that the methodological recommendations thatmeans–ends analysis issues do not depend on the language in which we chooseto express evidence and hypotheses. In particular, they do not change if wedescribe samples of emeralds and universal generalizations about their colourin a grue-bleenvocabulary rather than the familiargreen-blueone. The resultis that the optimal projection rule projects the translation of ‘all emeralds aregreen’ (that is, the optimal rule projects ‘all emeralds aregrue(t) until time t-1and bleen(t) thereafter’ as long as that generalization is consistent with theevidence).

There is an important feature of our means–ends criteria that the twoexamples from this paper do not illustrate: in general, the goal of avoidingretractions—minimaxing retractions—conflicts with the goal of minimizingtime-to-truth. The one-shot Riddle of Induction illustrates this tension (infact, it is one of the simplest possible discovery problems featuring this

Means–Ends Epistemology 19

Page 20: Means–Ends Epistemology Schulte

conflict). In the one-shot Riddle, we are considering only a singlegruepredicated with critical timet, and the standardgreenpredicate; as we see inFigure 4.

In the one-shot Riddle, it is easy to identify the correct universal general-ization with no mind changes simply by waiting until the ‘critical time’t. Onthe other hand, no data-minimal method can do so: by Theorem 6, a data-minimal method must project a universal generalization before timet. Butwhichever generalization a method elects to project, it might be falsified at thecritical time t, forcing the method to retract its conjecture. In general, data-minimality requires an inquirer to make ‘bold’ quick generalizations, whereasavoiding retractions leads to more ‘cautious’ inferences.

One way to describe the conflict between minimizing convergence time andavoiding retractions is to ask how many mind-changes adata-minimalreliablemethod might require, and contrast this with the number of mind changesrequired by a method that does not minimize convergence time. Schulte([forthcoming]) characterizes the structure of discovery problems that permita data-minimal method to reliably find the correct hypothesis with a bounded

Oliver Schulte20

Fig 4. Conflicting Aims in the One-Shot Riddle of Induction: Method 1 minimizesconvergence time (is data-minimal) and may have to retract its conjecture once. Method2 does not minimize convergence time but never retracts its conjecture.

Page 21: Means–Ends Epistemology Schulte

number of mind-changes. Together with the analogous result for methods thatare not data-minimal, the two characterizations provide a precise measure ofthe extent to which the aim of avoiding retractions conflicts with minimizingtime-to-truth in a given discovery problem. When this tension obtains, aninquirer must strike a subjective balance, as in any case of conflicting aims. Butwhen there is a data-minimal method that minimaxes retractions, an inquirercan epistemically ‘have it all’. In such cases, the methods that avoid retractionsand minimize convergence time seem to have special intuitive appeal—witness the natural projection rule in the infinitely iterated Riddle of Inductionand the Occam Rule in the Occam problem.

Data-minimal methods that minimax retractions are not only intuitivelyappealing, they also avoid a standard objection to the minimax principle: theminimax principle tells us to take the bird in the hand rather than the two in thebush, as the saying goes—but we may well be willing to take a chance onwinning big. With regard to retractions, the ideal case is to converge to the truthwith no retractions. But in thebestcase, data-minimal methods converge to thetruth with no more retractions atany stage of inquiry, because they alwaysproject their current conjecture. In terms of the old saying, data-minimalmethods that minimax retractions take the bird in the hand—and then go forthe two in the bush.

9 The hierarchy of hypothetical imperatives for inductiveinference

As we have seen, the standards of efficiency defined in Section4 can differgreatly in the strength of the demands that they impose on inductive inquiry.Data-minimality imposes only the fairly mild constraint that a method musttake its conjectures seriously at each stage of inquiry (Section6). Retraction-minimality by contrast does not allow an inquirer to generalize beyond thedata, a demand which is impossible to meet if inquiry is to find informativetruths. Between these two extremes lies the criterion of minimaxing retractions,which does allow for genuine generalizations, but selects some over others, aswe saw in the Occam problem and the Riddle of Induction. Is this an accident ofthe examples we have chosen, or are there general principles that relate thestrength of one set of epistemic aims to another? In this section I will show that,indeed, the subject of means–ends epistemology has a tidy structure: the variousperformance standards for inductive methods fall into an exact hierarchy offeasibility.

I say that a performance standard isfeasiblein a discovery problem posed bya collection of alternative hypotheses and given background knowledge ifthere is a reliable method for solving the problem that attains the standardin question. For example, minimaxing convergence time requires a reliable

Means–Ends Epistemology 21

Page 22: Means–Ends Epistemology Schulte

method to deliver the correct hypothesis by ana priori fixed deadline. So ifminimaxing convergence time is feasible, there must be a deadline by whichthe evidence conclusively verifies one of the alternative hypotheses. I recordthis observation in the following proposition; the proof is in Section11.

Proposition 12:LetH be a collection of alternative hypotheses, and let K begiven background knowledge. Then there is a reliable method for the discoveryproblem (H ,K) that minimaxes convergence time⇔ background knowledge Kentails that there is a time n by which the evidence will conclusively establishone of the alternative hypotheses as the correct one.

It is not hard to show that whenever this is possible, the other performancestandards are feasible. In this sense minimaxing convergence time is the moststringent demand on inductive methods. It turns out that many performancestandards are equally stringent, in the sense that one is feasible just in case theother is. Moreover, inductive methods can attain any one of the performancestandards from this class just in case background assumptions guarantee thateventually the evidence will entail the correct hypothesis. In other words, justin case there is no problem of induction.

I say that an inductive method makes anerror on a data stream« at stageniff on receiving the evidence from data stream« up to stagen, the methodproduces a conjecture that is false on«. Just as we can count the steps toconvergence and the number of retractions, we can count the number of errorsthat a method commits on a given data stream. Then we may define theconcepts of minimaxing error and admissibility with respect to error in themanner of Definitions 5 and 9. (The explicit definition is in Section11.) Thisadds two more members to the class of performance standards that are feasiblein exactly the same stringent conditions, according to our next theorem.

Theorem 13: Let H be a collection of alternative hypotheses, and let K begiven background knowledge. The following conditions are equivalent:

1. The background knowledgeK entails that eventually the evidence willconclusively establish one of the alternative hypotheses as the correct one.

2. There is a reliable discovery method for the discovery problem (H ,K) thatsucceeds with no retractions.

3. There is a retraction-admissible reliable discovery method for the discoveryproblem (H ,K).

4. There is an error-admissible reliable discovery method for the discoveryproblem (H ,K).

5. There is a reliable discovery methodd for the discovery problem (H ,K) thatsucceeds with a bounded number of errors.

We can order problems that permit reliable methods to minimax retractions

Oliver Schulte22

Page 23: Means–Ends Epistemology Schulte

by whether they are solvable with at most 0 mind-changes, at most 1 mind-change, . . . , at mostn mind-changes etc., which yields an infinite sub-hierarchyof feasibility. Finally, whenever there is reliable method for a discoveryproblem, there is a data-minimal one (see Schulte [forthcoming]); thus data-minimality is the least demanding of our performance standards. Figure 5displays the full feasibility hierarchy of standards of inductive success.

The hierarchy provides ascale of inductive complexityby which we canmeasure the difficulty of a discovery problem: a problemP is easier than a

Means–Ends Epistemology 23

Fig 5. The Hierarchy of Hypothetical Imperatives for Inductive Inference.

Page 24: Means–Ends Epistemology Schulte

problemP0 if a more stringent standard of inductive success is feasible inPthan inP0. Thus the Occam problem and the Riddle of Induction are equallydifficult in the sense that it is possible to reliably identify the correct hypothesiswith at most one mind-change in accommodating both of these problems.Problems such as determining whether human cognition is computable orwhether matter is only finitely divisible go beyond the hierarchy from Figure5: in the natural models of these problems, it is impossible to reliably find the rightanswer with a bounded number of mind changes (see Kelly [1996], Kellyet al.[1997]). A problem of intermediate complexity would be to investigate a hypoth-esis of the form ‘there are exactlyn different types ofx’ (n different elementaryparticles,ndifferent forms of intelligent life, etc.), which requires no more and noless than two mind-changes at worst. Thus we see again that means–ends analysisuncovers common structure among inductive problems that superficially appearquite different, revealing them to be equally complex.

Finally, the hierarchy explains why minimaxing retractions is a powerfulprinciple for deriving short-run constraints on inductive method: unlike manyof the other more demanding standards, the principle applies even when thereis a genuine problem of induction. But unlike data-minimality, minimaxingretractions is stringent enough to yield strong constraints on how inductiveinference ought to proceed.

The hierarchy of performance standards does not cover the principle ofmaximizing expected utility. A full discussion of the differences between ananalysis based on expected utility and the approach of this paper would leadinto deep and complex issues in both decision theory and epistemology. I willleave this for another occasion, and just give some indication of the maindifferences between the two approaches.

One difference is that my treatment distinguishes different cognitive aims,rather than subsuming them under a single utility function. This allows us tostudy explicitly the objective relationships and trade-offs between differentcognitive values.

Another reason, perhaps more fundamental, why expected utility does notappear in the feasibility hierarchy is that the hierarchy measures thestructural(topological) complexity of a discovery problem, establishing a scale of evermore complex problems.13 Probability measures, on the other hand, tend tocollapse these differences in structural complexity.14

Lastly, decision theorists typically take probabilities to besubjective‘degrees of belief’. Thus whether a method maximizes the subjective expectationof a cognitive value will depend on a subjective ranking of possibilities bypersonal probability. In contrast, the project of means–ends methodology is

Oliver Schulte24

13 The characterization theorems in Kelly ([1996]) and Schulte ([forthcoming]) provide explicitcharacterizations of this structural complexity. It is worth noting that the structural complexityof inductive problems is of a kind familiar from analysis and recursion theory.

Page 25: Means–Ends Epistemology Schulte

to establishobjective comparisons between the performance of inductivemethods, with the design of optimal methods tied to theintrinsic structureof a given inductive problem. As we have seen, examining the admissibilityand worst-case performance of inductive methods achieves this goal (see alsoKelly ([1996]), Schulte ([forthcoming]), section 7).

10 Conclusion: a map of means–ends methodologyI will end my survey of means–ends epistemology with an outline of thesubject, to put the results that I have covered in place, and to draw the reader’sattention to some issues and applications that I have not had the space toexplore in detail.

Means–ends analysis begins with the notion of aninductive problem. Thispaper considered discovery problems, which are defined by the inquirer’sbackground assumptions about what the (epistemically) possible observationsequences are, and a set of mutually exclusive alternative hypotheses underinvestigation. The next step is to specify astandard of successfor inductivemethods, a combination of an epistemic value and an evaluation criterion. Wethen ask: when is a given standard of successfeasible? The answer to thisquestion is a set of results that characterize what the structure of an inductiveproblem must be like in order for a given performance standard to be feasible.These results establish a tidyhierarchy of feasibilitythat ranks inductiveproblems by the demands they place on empirical inquiry—a scale of inductivecomplexity. Problems of the same complexity share the same structure;more precisely, they share the same topology of possible data sequences andalternative hypotheses. We find these structural similarities across problemsthat on the surface appear quite different: for example, Goodman’s Riddle ofInduction, the Occamian question whether a given entity exists or not, and thetask of determining whether a given reaction among elementary particles ispossible or not, all have the same topology of data streams.

The next question is whatmethodsattain a given performance standardwhenthe standard is feasible. The answer is another set of characterization theoremsthat describe exactly what the successful methods are like. These results allow us

Means–Ends Epistemology 25

14 Kelly ([1996], ch. 13) shows exactly why this is so for the question of whether there is a reliablemethod for a given discovery problem: it turns out that for any discovery problem and anycountably additiveprobability measure, there is an inductive method that has a probability of 1in that measure of converging to the correct hypothesis. For the auxiliary cognitive valuesconsidered in this paper, there remains an interesting open question: let a discovery problem anda probability measure be given. (Countably additive measures are an important special case.) Isit the case that if there is a reliable method that maximizes the expectation of one cognitive valueconsidered in this paper, then there is a reliable method that maximizes the expectation of theothers? For example, if a reliable method minimizes the expected number of mind changes, isthere also a reliable method that minimizes the expected convergence time? If the answer is yes,we would have another example of how probabilities level differences in the feasibility ofperformance standards and the corresponding structural complexity of inductive problems.

Page 26: Means–Ends Epistemology Schulte

to reinterpret traditional proposals, such as axioms for belief revision andPopper’s conjectures-and-refutations scheme, as means–ends recommenda-tions: they turn out to be the means of choice for certain epistemic ends(Schulte [1997]). This clarifies the normative status of such methodologicalprinciples: they are good advice for agents with certain epistemic values, butagents with other aims need not, and sometimesshouldnot, follow them.

When we apply the characterizations of successful methods to derive whatthe optimal inferences for a particularproblemare, we find new means–endsjustifications for interesting and plausible methodological recommendations.These applications include the natural projection rule in Goodman’s Riddle ofInduction, a version of Occam’s Razor, and an instrumentalist interpretation ofthe role of conservation laws and hidden particles in particle physics (Schulte[1997]).

This paper has examined some of the most natural and interesting epistemicaims. We can equally well ask what means would be optimal for achievingother cognitive objectives. Schulte ([1997]) carries out this investigationfor content, avoiding error and producing succinct (finitely axiomatizable)theories. In principle, all we need to provide a means–ends analysis for furthertheoretical goals is a sufficiently clear description of the goals in question.

11 Appendix: ProofsLet [e] be the empirical proposition that the evidenceehas been observed; thatis, [e] is the set of all data streams extendinge.

Proposition 8: Let H be a collection of alternative hypotheses, and let K begiven background knowledge. Then there is a reliable retraction-minimalmethod for the discovery problem (H ,K) ⇔ on every data stream«, there isa time n such that [«|n] and K entail the correct hypothesis H fromH .

Proof. (⇐) Given the right-hand side, the ‘sceptical’ methodd whose theory isalways exactly the evidence conjoined with background knowledgeK reliablyidentifies a true hypothesis fromH and never retracts its beliefs.

(⇒) Let d be a reliable retraction-minimal discovery method forH . Becausebeginning with the trivial conjectured(B) ¼ K does not increase the number ofmind changes of a method on any data stream, we may without loss ofgenerality assume thatd(B) ¼ K. Now suppose that on some data stream«

consistent withK, d changes its mind at least once. Then there is a first timem0 > 0 at whichd makes a non-vacuous conjecture, that is,d(«jm0) is consistentand entails some hypothesisH from H , and a first timem1 > m0 at whichd

makes a mind change, that is,d(«jm1) does not entailH or d(«j m1) ¼ B. Nowthe following methodd 0 dominatesd with respect to mind changes givenK: if

Oliver Schulte26

Page 27: Means–Ends Epistemology Schulte

«|m0 is an initial segment ofeande is an initial segment of«|m1 let d 0(e) ¼ K.Otherwised 0(e) ¼ d(e). Thend 0 conjecturesK along« until «jm1, sod 0 changesits mind along« exactly one less time thand. And clearlyd 0 never uses moremind changes thand does. Sod is dominated with respect to mind changesgivenK. This shows that if there is a reliable retraction-minimal method forHgivenK, then there is a reliable methodd 0 that never changes its mind alongany data stream inK. Such a methodd 0 never entails more than the evidence.For suppose that [e]∩ K does not entaild(e). Then there is a data stream«consistent withe and K on which d(e) is false. So to succeed on«, d mustchange its mind at least once (aftere), contrary to supposition. Sinced neverentails more than the evidence and is reliable, eventually the evidence andbackground knowledge must conclusively establish which alternative inH isthe true one, no matter what it is.

Proposition 12:LetH be a collection of alternative hypotheses, and let K begiven background knowledge. Then there is a reliable method for the discoveryproblem (H ,K) that minimaxes convergence time⇔ there is a time n such thaton every data stream«, [«jn] and K entail the true hypothesis H inH .

Proof. (⇐) Suppose that there is a deadlinen by which background knowledgeand the data entail which hypothesis is correct. Without loss of generality,assume thatn is the earliest such time. Then a reliable methodd can simplyconjecture the evidence until timen . In the worst case (and in the best case),d

converges to the correct hypothesis by timen. To show thatd minimaxesconvergence time, I establish that no other methodd 0 can achieve a betterguarantee on the time that it requires to find the truth. This is immediate ifn¼ 0. Otherwise, let« be any data stream such thatK, «jn¹1 do not entailwhich hypothesis fromH is correct. Since we chosen to be the earliest timeby which background knowledge and the evidence always entail the correcthypothesis, there is such a data stream. Ifd 0(«|n¹1) is inconsistent or does notentail a hypothesisH from H , then clearlyd 0 does not converge on« beforetime n. Otherwise suppose thatd 0(«jn¹1) entails some hypothesisH. Then bythe choice of« andn, there is another hypothesisH 0 in H such thatH 0 is trueon some extensiont of «jn¹1, consistent withK, andd 0(«jn¹1) does not entailH 0. Henced 0 converges to the correct hypothesis on t only after timen.Therefore every method requires, on some data stream consistent with back-ground knowledgeK, at leastn pieces of evidence before settling on a correcthypothesis, and our ‘wait-and-see’ method minimaxes convergence time.

(⇒) I show the contrapositive. Suppose that for every boundn, there is a datastream« such that K,«|ndo not entail which hypothesis inH is correct. By thesame argument as for the converse implication, this means that for everyreliable methodd, and every boundn, there is some data stream« consistent

Means–Ends Epistemology 27

Page 28: Means–Ends Epistemology Schulte

with K on whichd converges to the correct hypothesis only after timen. Hencethere is no global bound on the convergence time of any reliable methodd.

Definition 16: Let H be a collection of alternative hypotheses, and let K bebackground knowledge. In the discovery problem (H ,K), an inductive methodd dominatesanother inductive methodd 0 with respect to error⇔1. on every data stream« consistent with K,d makes no more errors thand 0,

and2. for some data stream« consistent with K,d makes fewer errors thand 0 does.

An inductive methodd is error-minimal for a discovery problem (H ,K) ⇔ d isnot dominated in (H ,K) with respect to error by another methodd0 that isreliable for (H ,K).

For given background knowledgeK, let MaxErr(d,K) ¼ max{the number oferrors thatd commits on«: « in K} be the maximum number of errors thatd

makes on any data stream consistent with the given background knowledge. Isay thatd succeeds with a bounded number of errorsgivenK iff there is afinite numbern such thatMaxErr(d,K) < n. As in the case of discovery withbounded mind changes, if a methodd minimaxes errors, succeeds with abounded number of errors (namely the lowest possible such bound).

Theorem 13: Let H be a collection of alternative hypotheses, and let K bebackground knowledge. The following conditions are equivalent:

1. The background knowledge K entails that eventually the evidence willconclusively establish one of the alternative hypotheses as the correct one.

2. There is a reliable discovery method for the discovery problem (H ,K) thatsucceeds with no retractions.

3. There is a retraction-admissible reliable discovery method for the discoveryproblem (H ,K).

4. There is an error-admissible reliable discovery method for the discoveryproblem (H ,K).

5. There is a reliable discovery methodd for the discovery problem (H ,K) thatsucceeds with a bounded number of errors.

Proof. The equivalence of claims 1 and 2 follows from the arguments in theproof of Proposition 8. By the same Proposition, claim 3 is equivalent to claim1. I prove the equivalence of claims 4 and 5 with claim 1.

(1 ⇒ 4, 5) If background knowledgeK guarantees that eventually the evidencewill conclusively entail the correct hypothesis, the methodd that conjecturesnothing but the evidence reliably identifies the correct hypothesis fromH andnever makes an error. Henced both is error-admissible and minimaxes error.(1 ⇐ 4) Letd be an error-admissible method. Suppose thatd makes an error on

Oliver Schulte28

Page 29: Means–Ends Epistemology Schulte

some data stream« consistent withK, say at timek. Then the following methodd 0 dominatesd in error:d 0 agrees withd everywhere but at«jk, andd 0(«jk) ¼ K∩ «|k. Thend 0 makes strictly fewer errors thand on «, and never makes more.So the only error-admissible methods are those that never make an error on anydata stream consistent withK. Hence no error-admissible method makes aconjecture that goes beyond the evidence. As we saw in the proof of Proposition8, such a method is reliable only if background knowledgeK guarantees that theevidence eventually entails the true hypothesis.

(1 ⇐ 5) I show the contrapositive. Suppose that background knowledgeK doesnot guarantee that eventually the evidence will conclusively establish one ofthe alternative hypotheses as the correct one. Then there is some hypothesisHin H true on some data stream« consistent withH andK such thatH is neverentailed along« givenK. Now consider some possible boundn on the numberof errors that a reliable discovery methodd may make on any data stream inK.Sinced is reliable,d eventually converges toH on «, say at timeK. Thusd

conjecturesH on «jk,«jkþ 1, . . . ,«jkþ n. SinceH is never entailed along«with respect to background knowledgeK, there is data streamt extending«jkþ n, consistent withK, on whichH is false. Thus ont, d makes at leastnþ 1errors. Since this argument applies to any boundn, there is no bound on thenumber of errors thatd might make on a data stream. Thus there is no reliablediscovery method that minimaxes errors unless background knowledgeKguarantees that the evidence conjoined withK will eventually entail one ofthe alternative hypotheses.

AcknowledgementsKevin Kelly, John King-Farlow, and Bernard Linsky provided much helpfuldiscussion and comments on drafts of this paper. Kevin Kelly made extensivesuggestions for improving the exposition.

Department of Philosophy4-89 Humanities Centre

University of AlbertaEdmonton, AB

T6G 2E5 Canada

ReferencesBub, J. [1994]: ‘Testing Models of Cognition Through the Analysis of Brain-Damaged

Performance’,British Journal for the Philosophy of Science, 45, pp. 837–55.

Carnap, R. [1962]: ‘The Aim of Inductive Logic’, in E. Nagel, P. Suppes, and A. Tarski(eds), Logic, Methodology and Philosophy of Science, Stanford: Stanford UniversityPress.

Means–Ends Epistemology 29

Page 30: Means–Ends Epistemology Schulte

Case, J. and Smith, C. [1983]: ‘Comparison of Identification Criteria for MachineInductive Inference’,Theoretical Computer Science, 25, pp. 193–220.

Earman, J. [1992]:Bayes or Bust?, Cambridge, MA: MIT Press.

Gardenfors, P. [1988]:Knowledge In Flux: Modeling the Dynamics of Epistemic States,Cambridge, MA: MIT Press.

Glymour, C. [1994]: ‘On the Methods of Cognitive Neuropsychology’,British Journalfor the Philosophy of Science, 45, pp. 815–35.

Gold, E. [1967]: ‘Language Identification in the Limit’,Information and Control, 10,pp. 447–74.

Goodman, N. [1983]:Fact, Fiction and Forecast, Cambridge, MA: Harvard UniversityPress.

Howson, C. [1997]: ‘A Logic of Induction’,Philosophy of Science, 64, pp. 268–90.

Howson, C. and Urbach, P. [1989]:Scientific Reasoning: The Bayesian Approach, LaSalle, IL: Open Court.

Hume, D. [1984]:An Inquiry Concerning Human Understanding, C. Hendell (ed.), NewYork: Collier.

James, W. [1982]: ‘The Will to Believe’, in H. S. Thayer (ed.), Pragmatism, Indianapolis:Hackett.

Juhl, C. [1994]: ‘The Speed-Optimality of Reichenbach’s Straight Rule of Induction’,British Journal for the Philosophy of Science, 45, pp. 857–63.

Juhl, C. [1997]: ‘Objectively Reliable Subjective Probabilities’,Synthese, 109,pp. 293–309.

Kelly, K. [1996]: The Logic of Reliable Inquiry, Oxford: Oxford University Press.

Kelly, K. and Schulte, O. [1995]: ‘The Computable Testability of Theories MakingUncomputable Predictions’,Erkenntnis, 43, pp. 29–66.

Kelly, K., Juhl, C., and Glymour, C. [1994]: ‘Reliability, Realism, and Relativism’, inP. Clark (ed.), Reading Putnam, London: Blackwell.

Kelly, K., Schulte, O., and Juhl, C. [1997]: ‘Learning Theory and the Philosophy ofScience’,Philosophy of Science, 64, pp. 245–67.

Keynes, J. M. [1921]:A Treatise on Probability, London: Macmillan.

Kitcher, P. [1993]:The Advancement of Science, Oxford: Oxford University Press.

Kuhn, T. [1957]:The Copernican Revolution, Cambridge, MA: Harvard UniversityPress.

Kuhn, T. [1970]: The Structure of Scientific Revolutions, Chicago: University ofChicago Press.

Laudan, L. [1977]:Progress and Its Problems, Berkeley: University of California Press.

Newell, A. and Simon, H. [1976]: ‘Computer Science as Empirical Inquiry: Symbolsand Search’,Communications of the ACM, 19, pp. 113–26.

Osherson, D., Stob, M. and Weinstein, S. [1986]:Systems That Learn, Cambridge, MA:MIT Press.

Osherson, D. and Weinstein, S. [1988]: ‘Mechanical Learners Pay a Price for Bayesian-ism’, Journal of Symbolic Logic, 53, pp. 1245–52.

Oliver Schulte30

Page 31: Means–Ends Epistemology Schulte

Popper, K. [1968]:The Logic Of Scientific Discovery, New York: Harper.

Putnam, H. [1963]: ‘Degree of Confirmation’ and Inductive Logic’, in A. Schilpp (ed.),The Philosophy of Rudolf Carnap, La Salle, IL: Open Court.

Putnam, H. [1965]: ‘Trial and Error Predicates and a Solution to a Problem ofMostowski’,Journal of Symbolic Logic, 30, pp. 49–57.

Reichenbach, H. [1949]:The Theory of Probability, London: Cambridge UniversityPress.

Salmon, W. [1967]:The Foundations of Scientific Inference, Pittsburgh: University ofPittsburgh Press.

Salmon, W. [1991]: ‘Hans Reichenbach’s Vindication of Induction’,Erkenntnis, 35,pp. 99–122.

Schulte, O. and Juhl, C. [1997]: ‘Topology as Epistemology’,The Monist, 79, pp. 141–8.

Schulte, O. [1997]: ‘Hard Choices in Scientific Inquiry’,doctoral dissertation, Departmentof Philosophy, Carnegie Mellon University.

Schulte, O. [forthcoming]: ‘The Logic of Reliable and Efficient Inquiry’,Journal ofPhilosophical Logic.

Sextus Empiricus [1985]:Selections from the Major Writings on Skepticism, Man andGod, (ed.) P. Halie, trans. S. Etheridge., Indianapolis: Hackett.

Sharma, A., Stephan, F., and Ventsov, Y. [1997]: ‘Generalized Notions of Mind ChangeComplexity’, Proceedings of the Conference of Learning Theory (COLT) 1997.

Spohn, W. [1988]: ‘Ordinal Conditional Functions: A Dynamic Theory of EpistemicStates’, in Skyrms, B. and Harper, W. (eds), Causation in Decision, Belief Changeand Statistics II, Dordrecht: Kluwer.

van Fraassen, B. [1980]:The Scientific Image, Oxford: Clarendon Press.

Means–Ends Epistemology 31