some empirical automated software engineeringmenzies.us/pdf/99ab.pdf · software engineering is a...
TRANSCRIPT
Journal of Automated Software Engineering (submitted), WP:alp2/ase(May
2, 2000).
SomeEmpiricalAutomatedSoftwareEngineering
Tim Menzies�, SamWaugh
��NASA/WVU SoftwareResearchLab,100UniversityDrive,Fairmont,USA; [email protected] m�
DefenceScienceandTechnologyOrganisation,Air OperationsDivision,Melbourne,Australia;
sam.waugh@dsto. defen ce .g ov .a u
May 2, 2000
Abstract
We experimentwith coreprocesseswithin softwaredevelopmenttasks(reuseandconflictmanagementin requirementsengineering).After definingthoseprocesses,we automaticallyperform100,000sof manipulationsin accordancewith that process.The resultsof the ma-nipulationsaresummarizedanddiscussed.In two casestudieswith this techniquewe findthatbuilding widely reusablesoftwarecomponentswill beharderthanwe hadthoughtwhileresolvingconflictsamongstcompetingstakeholdersis easierthanwe hadthought.
1 Intr oduction
Cansoftwareengineeringbe an exact science?Softwareengineeringis a human-intensive pro-cess.Sucha sociological-intensive exercisecanforbid exactconclusions.Nevertheless,in somecircumstances,exactconclusionsarepossible;e.g.� Supposehumanidiosyncracisesarethemaincontrollingfactorin softwaredevelopment.If
so, thenwe might predict that exact softwarecostestimationrequirescomplex equationsabouthumandynamics.� Yet in at leastonecasestudy, this wasnot true. Chulani,Boehm,andSteeceuseda simplemodelof softwarecostsandmerelytwo dozeninputparametersto producereasonablyaccu-rateestimatesof softwareconstructiontime(theirestimateswerewithin 25%of actual,69%of thetime) [Chulani,Boehm& Steece1999].
This articleseeksotherconclusionsthatarenot effectedby theidiosyncracisesof humansoft-wareengineers.We assumethatwhenpeoplewrite software,they aremanipulatingtheories. Aswe hopeto show, certaintheorypropertiesarestableacrossawide rangeof theorymanipulations.Thatis, fortunately, someof thepropertiesof our theoriesmaynotbeasidiosyncraticasthepeoplewho write them.
To make thiscase,wewill performmanipulationsonageneralclassof theories(and-orgraphswith groundnodes).Sincetherearemany possiblemanipulationson a theory, we will useanau-tomaticrig to performmillions of manipulationssuchasinjecting errorsinto theories;changingtheoryconnections;alterthedatasetsgivento a theoryandaltertheprocessingof a theoryinclud-ing totally randomselectionamongstalternatives.In theexperimentsreportedhere,we foundthatcertainpropertieswill emergedasstable,despitewidespreadmanipulations.In particular, sometasksweremuchharderthatexpected,while othersweremucheasier:
Much harder: Softwarereuse.
Much easier: Handlinginconsistenciesin requirements.
The restof this article is structuredis followed. � 2 offers an exampleof our type of experi-mentation.As we defineour experiments,we will encountercertainrestrictive suppositionsthatthreatenthegeneralityof our results. � 3 debatesthosesuppositionsandconcludesthattherig usedin thoseexperimentsis generalto a largeclassof softwareproblems.
1.1 Caveats
Beforeproceeding,we pausefor a methodologicaldigression.The conclusionsof this paperarebasedon experimentswith manipulatingtheories. The experimentalbasisfor our conclusionsis this paper’s greateststrength. Opponentsof our views canquickly refuteour conclusionsbydesigningand executingexperimentsthat generatecontraryconclusions. We would encouragesuchexperiments.This paperis an attemptto addmorerigor to softwareengineeringresearch.If this paperinspiresfurther careful experimentation,then it is a successeven if thosefurtherexperimentsrefuteour conclusions.
On the otherhand,the experimentalbasisfor our conclusionsis this paper’s greatestweak-ness.Our conclusionsstandsonly for thesampleof problemsstudiedhere.If a differentclassofproblemsresultedin a differentbehaviour, theconclusionwould have to berestrictedto theclassof problemsseenin our above experiments.Further, all our conclusionsfail if therestrictionsweobserve disappeardueto somenew methodfor theorymanipulation.For examples,optimizationsarealwaysbeingdiscoveredfor theoreticallyintractabletasks.
Lestwe prematurelyconvinceyou thatthis paperis fatally flawed,we hastento addthatnoneof our conclusionswill basedon runtimes.All our experimentswill run to termination.That is,
2
the effectswe report are not somefunction of processorspeedor datastructures. Instead,ourconclusionscomefrom giving our systemunlimited time to performits manipulations.Further,whenwe designedour experiments,we werenervousof our samplingbias. Hence,we alwaysgeneratedmany, many variantsof eachsampleproblem. Nevertheless,our generatorscan notpossiblygenerateproblemsacrosstheentirespaceof problemtypes.At this time,weareunawareof suchaclassof problemsandleave this issuefor futureresearch.
2 A Study into Theory Reusability
This sectiondescribesan exampleof the kinds of analysisthat is possiblevia large-scaletheorymanipulation.Wewill describeanexperimentalrig andthebehavioursseenin thatrig. In summary,we will explore a suiteof theories378,000timesto concludethat the chancesof building trulyreusablecomponentsis a functionof theamountof datausedto testthosecomponents.
The intent of this sectionis to offer the readera flavor of the kind of analysiswe seektoencourage.Hence,for themoment,wewill not discusstheexternalvalidity of theseobservations.In thesequel,however, wewill arguethattheobservationsseenis this rig areapplicableto a wideclassof softwaretheories.
2.1 Moti vation
Our readingof the literatureis that designvia reuseis the dominantparadigmin contemporaryknowledgeandsoftwareengineering.This paradigmdatesbackat leastto 1964with Alexander’swork on architecture[Alexander1964,Alexander, Ishikawa,Silverstein,Jacobsen,Fiksdahl-King& Angel1977].In summary, whenwedesignvia reuse,were-shufflecomponentsdevelopedprevi-ously, thenabstractedinto areusableform. Modernexpressionsof designvia reuseincludeobject-orienteddesignpatterns[Buschmann,Meunier, Rohnert,Sommerlad& Stal1996,Menzies1997a],andtheknowledgeengineeringresearchinto ontologies[Gruber1993]andproblemsolvingmeth-ods[Schreiber, Wielinga,Akkermans,Velde& deHoog1994].A tacitassumptionin thisparadigmis thata communitysharesa definitionof someconcept.This assumptionwould beviolatedif, intheusualcase,differentdevelopersfacingthesameproblemsdevelopvery differentdefinitionsofsoftwareconcepts.
Whenwon’t reusework? To answerthat question,we assumethat whenwe build reuseli-braries,we arecreatingthoselibraries from an implicit spaceof alternative designs. In our ex-perience,thedesignprocessstopswhenwe have rejectedan adequatenumberof clearly inferiorproposals.Whenexploring thedesignspace,thereexistssituationswheresoftwarereuseis likely,andothersituationswhereit is lesslikely. Supposeonegooddesignwasclearly superiorto allalternatives. In this case,we could dependon differentdesignersworking at differenttimesand
3
00.20.40.60.8
1
0 0.2 0.4 0.6 0.8 1R
elat
ive
cost
s�Amount modified
Figure1: Relativeof reusewith X%changes;i.e. costof adaptiondividedbythecostof rebuildingfromscratch. Data fromtheCOCOMO-IIsoftwarecostestimationproject[Abts etal. 1998,p21].
differentlocationsto arrive at thesamefinal design.Reuseis likely in this casesincetheproblemdomainwould force designersto make the samedecisions. Designerswould henceunderstandreusableartifactssinceit would beclearwhy a particularreusableartifactwasbuilt this wayandnot thatway.
Alternatively, supposea clearlysuperiordesignwashardto detect.In this case,it is possiblethatdifferentdesignerswill arriveat differentfinal systems.If theproblemdomaindoesnot forceasimilarsetof decisions,designersarelesslikely to understandsupposedlyreusableartifacts.Ourdesignersmayalwaysbepuzzlingwhy aparticularreusableartifactwasbuilt thiswayandnot thatway. Worsestill, they maybetemptedto tinkerwith thereusableartifactin orderto contortit intotheir own view of theproblemdomain.Suchtinkeringcanremove theeconomicjustificationforreuse. A learningcurve mustbe traversedbeforeany modulecanbe adapted.By the time youknow enoughto changea little of that module,you may aswell have re-written60% of it fromscratch;seeFigure1.
2.2 From Moti vation to Manipulations
The above motivation cannow be mappedinto a seriesof theorymanipulations.As mentionedabove,duringthismappingprocess,wewill encountercertainrestrictivesuppositionsthatthreatenthegeneralityof our results.Thesesuppositionswill benoted,thendebatedlater.
We begin by commentingthat reuseis lesslikely if clearlysuperiordesignsarenot recogniz-able.Thatis, someoraclecannotdeclarethattheory �� is “muchbetterthan” theory �� (whichwedenote�� ���� ). Let usassumethat this oraclecanaccessa list of outputstatesthat theprogramshouldreachfrom someinputsstates.We definethecover of a theory ��� to bethelargestnumberof outputsin canreachfrom the inputs. As ����������������� approaches100%,the theorycanreachallits desiredoutputs.Usingthis definitionof ����������������� , wecanimplementtheoracleasfollows:�������! #"%$&�'�(�����!�)���)�+* �������������� '� (1)
4
Cover,
# errors-in theory
100%
0%. slow/
decay0
fast 1 decay
none2 some3 many4Figure2: Decaycurvesfor cover.
Thefirst suppositionof this experimentis:
Supposition1 Our oracleof betterdesigns( � ) is widelyapplicable.
Assumingthat Equation1 is valid, we notethat aswe introduceerrorsinto our theories,weexpectcover to decrease.Supposethe decayin cover wasvery rapid; e.g. asshown in the fastdecaycurve of Figure2. In the caseof fastdecayingcover, we canhopethat differentdesignswill convergesince,if ever we stumbledover a superiordesign,we would besureto realizethatit wasindeedsuperior(sinceit’ s cover would be very muchgreaterthanits opposingtheories).Alternatively, if thedecayin cover wasvery shallow (e.g. theslowdecaycurveof Figure2) thendesignsmaynotconvergesincecompetingtheorieswouldbeindistinguishable.
Thediscussionin theabovemotivationsectioncannow beexpressedpreciselyusingEquation1andthenotionof fastdecay:
Reusablereuselibrariesarelikely in thecaseof fastdecaysincetherewill only beasmallnumber(?one)of bestdesignson topof thedecaycurve.
Conversely:
Reusablereuselibrariesareunlikely in thecaseof slow decaysincemany designsexiston top of thedecaycurve. In this case,differentdesignersworking at differenttimesanddifferentlocationsmaygeneratedifferentsolutionsto thesameproblem.
Hence,to experimentallyexplorethelikelihoodof softwarereuse,weneedanexperimentalrigthattracksthedecaycurvesof a theoryaswemanipulatethetheoryto introduceerrors.
5
fish growthrate
boatinvestmentfraction
boatpurchases
catch potential
--
++
++
++
++ ++++
++
++
--
++ ++
++
++
--
fish catch
boatmaintenance
incomenet
catchproceeds
++
--
fish density
change
decomissionsboat
fish population
in boatnumbers
change
Figure3: The fisheries model from [Bossel 1994] (pp135-141).
2.3 Experiment #1: Exploring Reuse
In this experiment,we generatecover decaycurvesof the form of Figure2. To do this, we need(A) a sampletheoryto explore, (B) a methodof introducingerrors,(C) a methodof generatinginput-outputpairsthat the correcttheoryshouldbe ableto cover, and(D) a simulatorwhich cansearchingfrom inputs to find the maximumcoverageof the output. This sectiondescribesourmethodsfor implementing5�687:9;7=<>7=?A@ .
2.3.1 A: SampleTheory
Figure3 showsour sampletheory. Thisfigureis aqualitative form of somequantitativeequationsdescribingfishesgrowing in afishery. For themoment,wedonotdiscussthegeneralityof theoriesof theform of Figure3. This issuewill bediscussedlater.
In Figure3, eachtheoryvariableshasthreestates:up, downor steady. Thesevaluesmodelthesignof thefirst derivativeof thesevariables(i.e. therateof changein eachvalue).X BCBD YdenotesthatYbeingup or downcouldbeexplainedby X beingup or downrespectively. Thatis:E ��BCBD E GF H E ��I J E KIE ��L J E KL
6
(whereI and L denoteupanddown respectively.)X MCMD YdenotesthatYbeingupor downcouldbeexplainedby X beingdownor up respectively.
Thatis: E �NMCMD E GF H E �OI J E KLE �OL J E KITacit in Figure3 is conjunctionsof influences.Weshouldview thistheoryasinfluencessplash-
ing aroundpipesconnectingtubs. Pairsof competinginfluencescancancelout. That is, we canexplain the level of waterin a tub remainingsteadyvia conjunctionof competingupstreaminflu-ences;e.g. �P� E �OIQJ E KIR�TSU� EOV LWJ E KLR�X�YJ�P� E �CIWS ECV LR�ZJ[� E ]\_^�`a�cbedgf!�P�
Figure 3 is a very small theory: 12 three-statevariablescopied6 times (one for eachtimetick). Realworld softwaresystemscanbemuch,muchlarger than12 variables.Hence,our nextsuppositionmustbe:
Supposition2 WecanscaleupconclusionsfromasmalltheorysuchasFigure3 to larger theories.
Apart from size,Figure3 alsohasa fixeddegreeof connectivity, alsoknown asfanout. Thefanoutof Figure3 is
�ih�jlk%majon�p�qXrtso�urav�wxjon \zy|{x} . We will needto checkthatchangingthis fanoutdoesnoteffect ourconclusions.Hence:
Supposition3 Wecanscaleconclusionsfroma theoryof fanout=1.4to theorieswithotherfanouts.
Apart from thesizeandfanoutissues,experimentingwith Figure3 implies:
Supposition4 Wecanrepresenta usefulrangeof softwaresystemsasnetworkslikeFigure3.
2.3.2 B: Intr oducing Err ors
Figure3 containsedgesof two types:X BCBD Y andX MCMD Y. We cancorruptthetheoryby randomlyselecting~ edges,thenflipping their sign( ��� to ��� andvisaversa).For a theorywith � edges( ~���� ), we cannow generate��� variants.Notethatat ~ \�� we have theoriginal theoryand,as ~ increases,wemoveto progressively worsetheories.
7
2.3.3 C: Generating Input/Output Pairs
To generateinput-outputpairsthat shouldgenerate100%cover, we usethe ~ \�� theory(i.e.the correcttheory) to generatedata. To apply this method,the quantitative fisheriestheorywasrun 15 times. Inputsandgoalsweregeneratedby comparingall pairsof themeasurementsin the15 runs(105suchpairsexist). That is, by comparingpairsof runswe canconvert raw numbersto statementslike (e.g.) “when we comparerun 3 to 9, we saw that catchPotential went up andcatchProceedswentdown”.
Next, the comparisonswere convertedinto input-outputpairs. The theory of Figure 3 wascopied < timesto representthe stateof the systemat time <[\��R7=<�\�y|7�{�{�{ . Inputsarecom-parisonsthat appearat time copy <�\�� andoutputsarecomparisonsthat appearat time copy<�@�� .
If wecopy thetheory < times,thenwemustlink timecopy <�\_" to timecopy <�\�"���y usinga temporal linking policy. NotethevariablesChange in boatNumbersandfishpopulationchange.Thesevariablesaretheexplicit timevariables. In theXNODE temporallinking policy usedin thisfirst study, explicit timevariablesconnectsto themselvesin theimmediatefuture.Thatis, for eachexplicit timevariable,XNODE addsa link���C� � BCBD ���C� � B �
We acknowledgethat thechoiceof XNODE asthetemporallinking policy is somewhatarbi-traryandcouldlimit thegeneralityof our conclusions;i.e.
Supposition5 Conclusionsdrawnfroma studyof XNODE-basedsystemsapply to non-XNODEsystems.
This testdatalibrary now containedmeasurementsfor all variablesduring the lifetime of thesimulation(5 years;i.e. <�\z�!7�y|7��R7=�R7:}�7�� ). In practice,it is unlikely that all suchdatawill beavailableto a testingteam.Hence,we generateinput-outputpairsthat refer to lessthan yc���g ofthevariablesin Figure3. To generatetestsetswhereU percentof thesystemwasunmeasured,Upercentof thegoalswasthendiscardedto produce10 variantsof thedatawith U at 0, 10,20,30,40,50,60,70,80,and90 percentunmeasured.
2.3.4 D: Simulating the Theory
A simulatorfor thefisheriesmodelmustbecontradiction-tolerant. To seewhy, notethatFigure3is a qualitative theory[Iwasaki1989]; i.e. it is under-specifiedandhencecangenerateinconsis-tencies.For example,Figure3 saysthat increasingcatchPotentialcanincreasefishCatch. It alsosaysthatdecreasingfishDensitycandecreasefishCatch. It is undeterminedwhathappensto fish-Catch in the caseof increasingcatchPotential AND decreasingfishCatch. SinceFigure3 does
8
not specifytherelative sizesof theseinfluenceson fishCatch, all we cansayis thatfishCatch canrise,fall, or remainsteady(in thecasewherethetwo competinginfluencescanceleachotherout).Hence,if asimulatorarrivesatfishCatch from increasingcatchPotentialanddecreasingfishCatch,thenit mustfork thereasoningoneway for eachpossibility; i.e. oncefor fishCatch=up, onceforfishCatch=down, andoncefor fishCatch=steady.
In thisexperiment,ourcontradiction-tolerantsimulatorwill betheHT4 abductiveinferenceen-gine[Menzies& Waugh1998,Menzies& Michael1999,Menzies1995,Menzies1996a, Menzies1996b, Menzies& Compton1997]. Abductionis informally definedasinferenceto the bestex-planation(e.g. [O’Rourke 1990]). More formally, abductionis thesearchfor assumptionswhich,whencombinedwith sometheoryexplainssomesetof goals[Eshghi1993]. Mutually exclusiveassumptionsimply that many alternative abductive-explanationsmay be generated.Suchmutu-ally exclusiveassumptionsmustbemanagedin seperateworldsof beliefs.Returningto theaboveexample,in the caseof increasing�'b|`a��¡C¢���`a��£�`t"obe¤ anddecreasing$�"o^(¡C<�b|`a��¡ , thenwe generatethreeworlds ¥U��7=¥¦�7=¥¦§ . ¥U� includes$�"%^�¡C<�b|`a��¡¨\ª©g« , ¥� includes$�"o^(¡C<�b|`a��¡¬\d���®¯£ , and¥�§ includes$�"%^�¡C<�b|`a��¡°\±^�`a�cbedef .
In essence,HT4 is likeasearchengineexploringsomenetwork of ideaslooking for usefulandconsistentportions(eachsuchportion is a world). Thegeneralityof our conclusionsdependsonthis styleof searchbeingapplicableto a rangeof softwareengineeringproblems;i.e.
Supposition6 Wecanmodelsoftwareconstructionastheexplorationof networkslikeFigure3.
Supposition7 Wecanmodelnetworkexplorationasabduction.
Onetrapwith usingHT4 is thatourconclusionsmayonly holdwhenprocessingindeterminatetheories.Thatis:
Supposition8 Our conclusionsfrom the studyof indeterminatetheoriesappliesto determinanttheories.
2.3.5 Details
Summarizingtheabove discussion,our experimentalrig containsa theory � generatedvia sometime linking policy (e.g. XNODE). The theoryis a directedgraphcontainingverticesandedges5 E 7=²@ . Eachtheoryvertex
Eis apairof avariableandatimecopy index < � ; e.g. $�"o^�¡O<�b|`a��¡O³�<#
is thevalueof $�"%^�¡C<�b|`a��¡ in time copy <´\�� . ² aretheconnectorsbetweenvariablesareoneofa setof pre-definedtypes;e.g. BCBD or MCMD . Thatis:E �µ\ ��bg�¶"lb�·�¤)��³¯`t"l¸¹�º���=«Cf#"i£Nde��»²¼�A\ E � BCBD E #�(� E � MCMD E � \ 5 E 7=²@
9
Ourgoalswearetrying to explainaretheoutputs.In thegeneralcase,only someof theoutputsare reachable. The reachableoutputsare denoted�(©C`�«C©C`%½ . Recallingthe above discussiononXNODE, all theoutputscomefrom timecopy <�@�� . Thatis:��©O`�«�©O`X^ ½¿¾ ��©O`�«�©O`X^ ¾ E ³�< �ÁÀ�ÂTheassumptionsarethosevariablesusedto connectusedinputsto usedoutputs.Theusedinputsaredenoted"l£�«C©C`X^�½ . Recallingthe above discussionon XNODE, all the inputscomefrom timecopy <�\_� . Thatis: "i£R«C©C`X^ ½ ¾ "l£R«�©O`X^ ¾ E ³�<#Â6 ¾ E �Ã"l£R«�©O`X^ ½ �Ä��©C`�«C©C`X^ ½Å
is thepercentageof thetheoryvariablesthatdonotappearin theinputsandoutputs.Thatis:Å \ Æy¼�µÇ "i£R«C©C`X^KÈÉ�(©C`�«C©C`X^ ÇÇ E Ç ÊÌË yc���An abductive-explanation is a world of mutually compatiblebeliefs ¥Í� . A world connects
someinputsto someoutputswithout causingcontradictions.In HT4, worlds areextractedfromtheedgesin our theories.Thatis: ¥�� ¾ ² (2)¥��!ȹ"l£R«�©O`X^ ½ ÈÎ6 J ��©O`�«�©O`X^ ½ (3)¥��!ȹ"l£R«�©O`X^ ½ ÈÎ6 ÏJ Ð (4)
HT4 seekstheabductive explanation¥Í� with maximal �'�(����� ; i.e. theexplanationthatmaxi-mizesthesizeof �(©C`�«C©C` ½ . Thatis:�'�(�����!�)�¯�Ñ\ ¸¹b|»�� Ç ¥��!ÈÒ�(©C`�«C©C`X^ ½ Ç � (5)
With thesedefinitionsin hand,we cannow specifyour experimentalrig. 20 timesweflipppedtheedgetypeof betweennoneto all of theedgesin Figure3 to createa corruptedtheory ��Ó . Foreachsuchcorruptedtheory ��Ó , we:� Copied��Ó 6 times(onceeachfor <�\_�!7�y�7��R7=�!7P}�7�� ).� ConnectedthecopiesusingtheXNODE linking policy to form ��Ô .� For all 105comparisonsof thedata,we:
10
– ForÅ�ÕZÖ �R7�yc�!7��¶�!{�{�{Ø×|�RÙ , we:Ë Built theoutputsandinputsusing y��|�Ú� Å % of thecomparisondata.Ë CalledHT4 to find maximumcoverusingtheinputs,theoutputsand ��Ô .
Thereare17 edgesin Figure3. Hence,this procedurecallsHT4 378,000times(20 repeats*0..17edgesflipped* 105comparisons* 10
Åvalues).
2.3.6 Results
Thebehaviour of thisexperimentalrig is shown in Figure4. To readtheresults,notethatthex-axisshows a progressionfrom a correctfisherytheory(at 0 edgesflipped) to a very incorrectfisheriestheory(at17edgesflipped).For thetopplot, they-axisshowsthemaximumcover foundby HT4.As we would expect,asthenumberof errorsin a theoryincreases,thecoverageof desiredgoalsdecreases.Recallingour abovediscussion,reuseis unlikely in thecaseof slow decayof thecovercurve. Our readingof Figure4 is that after
Š\��� , the decaycurve becomesvery flat; i.e. itbecomesdifficult to distinguisha theorywith no errors(left handsideof thetop plot in Figure4)from a theorywith multipleerrors(right handsideof thetopplot in Figure4).
Theruntimeresults(Figure4, bottom)show two interestingtrends.Firstly, in nearlyall cases,as the theoriesgrow more corrupted,it becomesfasterto determinewhat goalsare explicable(exception:for thevery unmeasuredtheories,theruntimesfor goodtheoriesthesameastherun-timesof badtheories).This is a niceresult:nonsensetheoriescanberejectedfasterthansensibletheories.
Secondly, theabove experimenttried to find anexplanationfor every memberof theoutputs.Thatis, asthepercentageof unmeasuredvariablesU wasincreased,thesizeof theoutputgoalsetdecreased.A pre-experimentalintuition wasthataswe tried to explain moreandmoregoals,theruntimeswould increase.For theoriesthatweremostlycorrect(corruptionslessthan3 out of 17),this intuition provedcorrect. However, for theoriesthataremoderatelyto wildly incorrect(morethan3 of the17 edgescorrupted),the runtimesdecreasedasthenumberof goalswasincreased.We explain this surprisingresultasfollows. For theoriesthataremostlycorrect,many successfulproofs of goalscan be generated.The overheadsof building theseproofs can slow down theinferenceprocedure.Ontheotherhand,aswesupplymoreandmoredatafor moderatelyto wildlyincorrecttheories,we constrainthe searchspacemoreandmore. For example,in fisheries,anunmeasuredvariablecould take the valueup, down, or steady. Oncethat variableis measured,two-thirds of the possiblesearchspacedisappears.Hence,for thesenot-very correct theories,runtimesdecreaseaswe increasethenumberof goals.
11
0
20
40
60
80
100
0 5 10 15 17
% e
xplic
able
Number of corrupted edges; max=17
XNODE (U% unmeasured)
U=90U=80U=70U=60U=50U=40U=30U=20U=10U= 0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 5 10 15 17
runt
ime
(s)Ü
Number of corrupted edges; max=17
XNODE (U% unmeasured)
U=90U=80U=70U=60U=50U=40U=30U=20U=10U= 0
Figure 4: Experiment #1. XNODE linking: Corrupting 0 to 17 edges, running validation with lessand less data. Percentage explicable (cover) on top; runtimes on bottom.
2.3.7 Experiment #1: Conclusions
At leastfor theexperimentalrig shown above,thelikelihoodof reusablecomponentsis a functionof theamountof dataavailableto testthecomponents.As lessandlessof thetheoryis unmeasuredin theinputs/outputsset,theharderit becomesto recognizeaclearlysuperiortheory. Thethresholdappearsto bearound70%unmeasuredtheoryvariables.
If this is a generalresult,thenwe mustdoubttheutility of building reuselibraries.For nearlya decadewe have beenresearchingsoftwaretesting. In our experience,we believe that it is veryrarethatsoftwaretestsuitesreferto morethan30%of thevariablesin acomponent.If we testourcomponentswith insufficient data,thenwe will not detectclearly superiordesigns.In this case,
12
designersworking in different locationsmay arrive at different final designsfor their softwarecomponents.Thatis, we expectthat:� Supposedlyreusablesoftwarecomponentscontaindesigndecisionsthatareonly acceptable
to thecommunitythatmadethem.� A broaderaudiencemight find thosedecisionsconfusingor unacceptable.� Hence,softwarecomponentswouldnotbewidely reused.
Thereareat leasttwo casesin whichwecoulddemonstratethatthis conclusionis not general:
1. Wecouldshow thatour suppositionsareoverly restrictive.
2. Wecouldshow thatreuseis acommonandsuccessfultechnique.
We will discussthe suppositionsin the next section. As to the secondpoint, it hasyet to beconclusively proved that reuseis a commonandsuccessfultechnique.Menzies[Menzies1998]criticizesoverly-enthusiasticreportsof reusethat lack somemeasureof quality of theconstructedsystem.Without suchquality assessments,it is hardto assessthe overall impactof reuse1. Pro-ponentsof reuserarely track theon-goingcostsof maintainingwith thosecomponents[Menzies,Cukic, Singh& Powell 2000] (exception: [Lim 1994]). Most reusereportsdo not clearlydistin-guishbetweenverbatimreuseandreusewith sometinkering. RecallingFigure1, suchtinkeringto customizea reusablecomponentcansignificantlyincreasethecostanddecreasethebenefitsofusingreusablecomponents.Further, even enthusiasticproponentsof reusetestify to the lack ofwidespreadreuse.For example:
I donotthink wehaveyetsucceededin softwarereuse.In theStateof thePractice,I donotseealot beingapplied.Yes,OOclassframeworks,Java,VisualBasic,etc.havefa-cilitatedthecodereuse,but lessis done,in general,atthehigherlevel suchasin thede-signandrequirementsphasesof aproject.In theStateof theArt, I havenotperceivedany incrementalprogressalthoughmany issueshavebeenaddressed[Guerrieri1999].
Reusecontinuesto be a problemwhosepotentialremainselusive. Eachnew solu-tion remainsfull of promisebut riddled with what look like insurmountableprob-lems.[Perry1999]
Even reuseenthusiastssuchasFrakes[Frakes1999] cautionthat thereexist significantpracticalproblemswith the widespreadproliferationof reuselibraries; e.g. problemswith searchingthereuselibrary.
In summary, our readingof the literatureis that we can’t reject the generalityof the aboveconclusionsbasedon successfulreportsof reuse.
1Oneof the few reusereprotsthat includesquality measuresis [Lim 1994]. However, that report refer to intra-institutionalreuse,notwidespreadinter-institutionalreuse.
13
3 External Validity
We believe that the above resultsaregeneralto a wide classof software. The restof this arti-cle attemptsto justify that that belief. To make this case,we explore the suppositionsdescribedabove. In the processof making that case,we will proposeandperformotherexperimentsthatcommentpositively on the practicalityof processingindeterminatetheories(including early lifecycle requirementstheories).
3.1 Choiceof Oracle
Supposition1 claimedthat theoriesshouldbe assessedvia how well they canreproducea setofdesiredgoals(recallEquation1). Clearly, therearemany otherwaysto assessa theory;e.g.parsi-mony, runtimespeedof its inferences,etc.However, weregardtheseothermeasuresassecondary.Theoriesonly becomeinterestingwhenthey canperformtheir primary tasks.Only afterdemon-stratingtheorycompetency shouldwemoveonto assess(e.g.)its speedor economyof expression.Partial supportfor our view on theoryassessmentcanbe found elsewherein the literature. Forexample,themodelcheckingcommunityassesstheoriesvia their coverageof temporallogic con-straints;e.g.[Holzmann1997,Clarke,Emerson& Sistla1986].
3.2 Bigger Theories
Supposition2 claimedthataconclusiondrawn from atheorywith 12variablescanbeextrapolatedto largertheories.Recallthatourkey observationwasthatour testoraclefailedto distinguishgoodtheoriesfrom badtheoriesafter
Å \ÝÛ��g . What would happento this observation if the theorygetslarger?
In answeringthisquestion,wefirst notethattestinggetsharderasthesystemgetslarger. Somesupportfor this claim can be found in NASA’s experiencewith model checkers. The informalrule-of-thumbat NASA is thatmodelcheckingtechniquessuchasSPIN[Schneider, Easterbrook,Callahan,Holzmann,Reinholtz,Ko & Shahabuddin 1998,Holzmann1997] can’t processlargesystems. The invariantsfrom merely 30 classesin an OO systemcan requireone gigabyteofmemoryto modelcheck(andonegigabyteof ram is our currentpracticallimit of what we canexpectonaworkstation).
Note that if we can’t tell goodfrom badeven for very small theoriesthen,asthe theorygetslargerthanFigure3, it mustbecomeevenharderto distinguishgoodlargertheoriesfrom badlargertheories.Hence,webelieve thatSupposition2 holdssinceour pessimisticconclusionsconcerningreusemustgetevenmorepessimisticassystemsizegrows.
14
25
50
7580859095
100
1 2 3 4 5 6 7 8 9 10%
exp
licab
le
Average fanout
V=449V=480V=487V=494V=511V=535
Figure5: Percentage explainable. From [Menzies 1996b].
3.3 Mor e Intricate Theories
Supposition3 wasa variantof Supposition2. In this supposition,we claimedthat conclusionsfrom a theorywith fixed fanout(1.7) appliesto theorieswith different fanouts. In earlierwork,Menziesstudiedhow different fanoutseffect the conclusionsof HT4 [Menzies1996b]. In thatexperiment,edgeswere addedat randomto variantsof a real-world theory of humanglucoseregulationtakenfrom [Smythe1989]. This theorywaswritten in thesameformatasFigure3 andis describedin detailin [Menzies& Compton1997].Thattheoryhad1246edgesand554variables(fanout=1246/554=2.25).Six “clones”of thattheoryweregeneratedusingrandomvariableswhosedistributionsweretakenfrom theglucosetheory(e.g.themeannumberof parentsin thecloneswasthesameasthemeannumberof parentsin theglucosetheory).Eachclonewasapproximatelythesamesizeastheglucosetheoryandcontained449,480,487,494,511,or 535nodes.ThesewereexecutedusingHT4 with randomlychoseninputsandgoals. Edgeswere thenaddedrandomlyandthe executionsrepeated.Figure5 shows the results. At low fanouts,many behaviours wereinexplicable. However, after a fanoutof 4.4, mostbehaviours wereexplicable. Further, after afanoutof 6.8,nearlyall thebehaviourswereexplicable.
In summary, Menziesfoundthatasfanoutincreases,theprobabilityof explainingany randomlyselectedoutputalsoincreases.As this probability increases,thecover decaycurve would flattenout sinceour ability to distinguisha theorywith � errorsfrom a theorywith �A�Þy errorswoulddecrease.Consequently, we argue that Supposition3 holds: our pessimismaboutdetectingasuperiortheorywould increaseif thefanoutincreasesfrom thevalueof 1.7seenin Figure3.
3.4 Software= Network of Connections
Supposition4 claimedthata wide rangeof softwarecanbecharacterizedasa network of connec-tionsbetweenvariables.This is hardlya controversialposition.Many researchersmodelsoftwareasanetwork of connectinginfluences:
15
� Software coveragetools often view softwareasa network. Good test suitestry to maxi-mizethecoverageof network components(e.g. [Bieman& Schultz1992,Harrold,Jones&Rothermel1998,Weyuker1993]).� Any optimizing compilerbuilds a network (control/dataflow graph)from the code. Opti-mizationis thenamatterof reorganizingthenetwork to speedup theprogram.� Model checkersaccesssuchanetwork whenthey build a spaceof programvariablestates.� Expert systemsand logic programscan be reducedto suchstructures(using a techniquecalledpartial evaluation[Sahlin1991]).
3.5 Beyond XNODE
OurexperimentassumedSupposition5, i.e. for all explicit timevariables
�, weconnect
�at time
copy index < � to � at timecopy index < � B � . Whathappensto our conclusionsif wevary thetimelinking policy? Thisquestionis exploredin experiment#2.
3.5.1 Experiment #2: Other Temporal Linking Policies
Experiment#2repeatstheexperimentalrig of Experiment#1,but with adifferenttemporallinkingpolicy. In the XEDGE (explicit edge)linking policy, time changeson the out-edgesfrom theexplicit time variables.That is, for eachvariable ß downstreamfrom anexplicit time variable
�,
XEDGE addsa link
�Wà!� � D ß à!� � B � . For example,recall from Figure3 that fishGrowthRateisconnectedto fishpopulationchangeasfollows:$�"%^�¡C¢���«C©�¤)b|`t"o��£N<�¡Cbe£�áR� MCMD $�"%^�¡Câ��¶��®ã`X¡Cä�b|`a�
XEDGEwouldaddtheedge:$�"%^�¡C¢��=«C©�¤)bg`t"l�(£�<8¡Cbg£�áR� �C� � MCMD $�"%^�¡Câ��¶��®ã`X¡Cä�b|`a� �C� � B �Figure6 showsthebehaviour seenin Experiment#2. Thecurveshavenumeroussimilaritiesto
theXNODE behaviour seenin Figure4:� As before,nonsensetheoriescanberejectedfasterthansensibletheoriessincetheruntimesdecreaseasadderrors.� As before,thereexistsalowerthresholdin testsuitesize,below whichwecannotdistinguishgoodtheoriesfrom badtheories.That is, onceagain,we seethat aswe offer lessandlessdatain thetestsuite,thecoverdecaycurvesflattenout.
16
0
20
40
60
80
100
0 5 10 15 17
% e
xplic
able
Number of corrupted edges; max=17
XEDGE (U% unmeasured)
U=90U=80U=70U=60U=50U=40U=30U=20U=10U= 0
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0 5 10 15 17
runt
ime
(s)Ü
Number of corrupted edges; max=17
XEDGE (U% unmeasured)
U=90U=80U=70U=60U=50U=40U=30U=20U=10U= 0
Figure 6: Experiment #2. XEDGE linking: Corrupting 0 to 17 edges, running validation with lessand less data. Percentage explicable on top; runtimes below.
Elsewherewe have exploredsix othertemporallinking policies[Waugh,Menzies& Goss1997].Theabove two featuresarenotedin all thoseexperiments.Thatis, Supposition5 holdsfor at leastsevenothertemporallinking policiesthanXNODE.
3.6 SoftwareConstruction = Network Exploration
Supposition6 arguesthatwhenwe searchFigure3, we aredoingsomethingthat is importantto alargeclassof softwaresystems.Wetakethisview from HerbertSimon.Simondescribesdesignasthe”quintessentialhumanactivity” [Simon1969].Everythingwedoasagentsin theworld requires
17
designsolutionsto theproblemsthatfaceus.Hence,to Simon,studyingthedesignproblemwasamethodof studyingthegeneralproblemof artificial intelligence(AI).
Simon’s researchcharacterizeddesignandartificial intelligenceasa searchproblem. In thisframework, designis viewed asthe traversalof a spaceof possibilities,looking for pathwaystogoals.This spacecanbeexplicit or implicit:
Explicit: e.g.takenfrom a library of softwarecomponents.
Implicit: e.g. extractedfrom the designer’s mind asthey imaginedifferentpossible,but asyetunbuilt, softwaresystems.
Modernexpressionsof thisapproachof design-as-searchincludesmostof theAI searchlitera-ture[Pearl& Korf 1987]andtheSOAR knowledge-level (KL) search[Newell 1982,Newell 1993].In a KL search,intelligenceis modeledasa searchfor appropriateoperatorsthat convert somecurrentstateto a goalstate.Domain-specificknowledgeis usedto selecttheoperatorsaccordingto theprinciple of rationality; i.e. anintelligentagentwill selectanoperatorwhich its knowledgetells it will leadtheachievementof someof its goals.
This view as design-as-searchthrougha network is not just an AI concept. OO designersperformsucha searchasthey searchthroughtheir classlibrarieslooking for reusablecode.Moregenerally, any time a software designerpausesto considerthis option vs that option, they arementallyexploringanetworksof options.
3.7 Network Exploration = Abduction
Supposition7 notedthatwe usedabductionasour network explorationtool. We couldrejectourconclusionsif it couldbeshown thatabductive inferenceis relevantonly for a very smallclassofsoftwaretasks.
Our view of abductioncancanbesummarizedasfollows: seekislandsof consistency withina spaceof possiblycontradictoryideas. More precisely, make what inferencesyou canthat arerelevant to somegoal(Equation3), without causingany contradictions(Equation4). Whenmorethanonesolutionexiststo Equation3 andEquation4, thenusesomeBESToperatorto selectapre-ferredsetof solutions.Clearly, this procedureis moregeneralthanjustadescriptionof “inferenceto the bestexplanation”. Many taskshave an abductive mapping,asshown in Figure7. Hence,with onecaveat,we arguethattheuseof anabductive inferenceengineincreasesthegeneralityofour conclusions.Theonecaveatis asfollows. Abductionis anindeterminateinferenceprocedure.Standardsoftwarestrivesto bedeterminant.Theissueof indeterminacy is discussedbelow.
18
3.8 Implications of Indeterminacy
The above discussionarguedthat Suppositions1 through7 did not overly restrict the generalityof our conclusions. Only Supposition8 remains: that conclusionsdrawn from HT4’s indeter-minateinferencearegenerallyapplicable. Reviewing the tasksimplementedby abduction(seeFigure7), it is clearthat mostof thosetaskscomefrom the knowledgeengineeringcommunity.Knowledge-basedsystemsoftenuseheuristicsto simplify complex tasks.Heuristicscanintroduceindeterminacy into a theory. Heuristicsarejust guessesanddifferentguessescandriveprogramtoverydifferentconclusions.
Supposition8 threatensthe generalityof our conclusions.HT4 is a multiple world reasonerandconventionalsoftware is determinant.That is, softwareconstructedfrom standardsoftwareengineeringtechniquesonly ever generatesa singleassignmentsto variables.Conclusionsdrawnfrom HT4 will notapplyto standardsoftwaresystemsif its multiple-world reasoningconfusesthetaskof testingsoftware.
To explore this issue,we needmoredata. The averagenumberof worlds generatedby HT4in Experiment#1 is shown in Figure8. Notethelow total averagenumberof worlds(max=1.28).While HT4 cansupportmultipleworlds,theseexperimentsgeneratedveryfew. Thatis, theconclu-sionsmadeaboveregardingreusewerenotmadein awildly indeterminatedevice. Thisabsenceofwild indeterminacy persists,evenif wemakemajorchangesto theHT4-rig. For example,fisheriesis a sparselyconnectedtheory: 12 variablesconnectedby 17 edges(fanout=17/12=1.4).Perhapstheorieswith higherfanoutsgeneratedmoreexplanations?This hypothesiswastestedin Exper-iment # 3. That experimentaddedin 0, 5, 10, 15, 20, 25 or 30 new edgesat random,checkingall the time that theaddededgesdid not exist alreadyin the theory. This generatedtheorieswithfanoutsrangingfrom 1.4to (17+30/12=3.9).Thetheorywith thenew edgeswasthencopiedover5 time stepsandconnectedvia XNODE andexecutedin thesamemannerasexperiment#1. Theresultsareshown in Figure9. Note that the numberof worlds generatedis not muchmorethanbefore(1.58maximum).Hence,wecouldnotblamethelow numberof worldsseenin HT4 outputon restrictive fanouts.
19
å Abductive-validationasksthequestion“What is themaximumpercentageof known/desiredbehaviour thatcanbeexplainedby sometheory?”.This canbeimplementedasEquation5;i.e. usea BEST that favors explanationswith the largest intersectionto the output goalset[Menzies1996b, Menzies1995,Menzies& Compton1997]å Minimal-fault diagnosismeansusinga BEST that favors explanationswith the fewestpos-sible inputsandthe mostknown goals[Reggia, Nau & Wang1983] (this is only oneof arangeof abductive approachesto diagnosis:otherapproachesaredescribedin [Console&Torasso1991]).å Abductive-classificationusesa BEST that favors explanationswith the largestnumberofmost-specificconcepts[Poole1985].å Abductive-explanationusesaBESTthatfavorsexplanationswith thelargestnumberof con-ceptsthatarefamiliar to theuser[Leake 1993].å Abductive-planningusesa BESTthat favors explanationswith themostgoalsandthe leasttotal cost.å Abductive-monitoringmeanscachingtheexplanationsgeneratedbyabductive-planning,thendeletingany explanationswhoseassumptionscontradictany new data.å Abductive-predictionusesa BEST that favors explanationswith themostnumberof futureeventsthatinterestyou.å Abductive requirementsengineeringis discussedin [Zowghi, Ghose& Peppas1996,Men-zies,Easterbrook,Nuseibeh& Waugh1999]. Supposewe know who wroteeachportionofa theory. Conflictsbetweenuserscanbe detectedwhendifferentpeople’s ideasendup indifferentexplanations.Theseconflictscanthenbe negotiatedby focusingon the key con-flicting guessesthat split theexplanations.Disputesbeenfeudinguserscanbe adjudicatedasfollows: thewinning useroffers theorieswhich build explanationsthatcanexplain mostof thedesiredbehaviour.å Abductionis alsoapplicableto numerousotherareasincludinglegal reasoning[Kowalski&Toni 1996]; visual patternrecognition[Poole1990]; financialreasoning[Hamscher1990];diagrammaticreasoning[Menzies& Compton1994];machinelearning[Hirata 1994];case-basedreasoning[Leake 1993]; natural-languageprocessing[Ng & Mooney 1990]; andanalogicalreasoning[Falkenhainer1990]. For moreon the applicationsof abduction,see[Menzies1996a, Kakas,Kowalski& Toni 1998].
Figure7: Taskswith anabductivemapping. Adaptedfrom[Menzies1996a].
20
0.95
1
1.05
1.1
1.15
1.2
1.25
1.3
0 20 40 60 80 100
Wor
lds
(whe
n ed
ges
corr
upte
d)
Percentage unmeasured
0369
1215
Figure 8: Experiment #1: Average worlds seen as we corrupt 0,3,.. 15 edges from Figure 3 andchange the percentage of theory variables unmeasured in the test set (changing æ]ç ).
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
0 20 40 60 80 100
Wor
lds
(whe
n ed
ges
adde
d)
Percentage Unmeasured
05
1015202530
Figure 9: Experiment #3. Worlds generated edges with XNODE after adding 0,5,10,15,20,25,30edges.
21
Nor canwe blamethe missingworlds on the temporallinking policiesusedin theseexper-iments. Recall that of the 12 variablesin the fisheriestheory, only 2 of themmadecross-timeconnections.Perhapsmoreworldswouldbeobservedif moreconnectionsweremadeacrosstime.Experiment #4 testedthis hypothesis.A new temporallinking policy wasdefined.In theINODE(implicit node)linking policy, any variableè at time copy é caneffect thatvariableat time copyé�ê�ë ; i.e. for everyvariableweaddtherule:è;ìCí!î�ïCïð è;ìCí!î ï�ñ
Experiment#4 repeatsexperiments#1 and#3 usingINODE insteadof XNODE. The resultsareshown in Figure10. The extra time links did leadto moreexplanations,but not many more(maximum=5).
22
1
1.5
2
2.5
3
3.5
4
0 20 40 60 80 100
Wor
lds
(whe
n ed
ges
corr
upte
d)
Percentage unmeasured
0369
1215
1
1.5
2
2.5
3
3.5
4
4.5
5
0 20 40 60 80 100
Wor
lds
(whe
n ed
ges
adde
d)
Percentage Unmeasured
05
1015202530
Figure 10: Experiment #4. Effects of using INODE. Top: flipping the annotation on 0,3,6,9,12,15edges. Bottom: adding 0,5,10,15,20,25,30 edges.
23
SinceHT4 andHT4-likesystemsdonotgeneratewild indeterminacy, wecanendorsesupposi-tion 8. Still, we mustexplain theserepeatedfailuresto generatewild indeterminacy. We have twohypotheses:ò
[Menzies1996a] notedthat HT4 usesset-covering abduction; i.e. it builds worlds onlyfrom variablesthat connectinputs to goals. An alternative approachis consistency-basedabductionusedin (e.g.)theATMS [DeKleer1986]whichbuildsworldsfrom all variablesinthe theory(for moreon thesetwo typesof abduction,see[Console& Torasso1991]). Set-covering canusefewer variablesthanconsistency-basedapproaches.Hence,set-coveringsystemslikeHT4 maygeneratefewerworldsthanexpected.òSupposeour softwarecontainsa small numberof mastervariablesthat control the largernumberof slave variablesin therestof thesystem.In suchmaster-variablesystems,worldsneedonly begeneratedonly for conflictsin themastervariables;i.e. conflictsin the largernumberof slave variablesdo not matter. Note that if therearevery few mastervariables,thenwe needonly generatevery few worlds. Further, we canfind thoseworldsvery easily.Any randomlygeneratedpathway from goalsback to inputshave a high chanceof usingthe mastervariables.After generatinga small numberof suchrandompaths,an inferenceprocedurewould find the key assumptionswithin the mastervariables. Elsewhere,withCukic andSingh,Menzieshasofferedevidencesuchmaster-variablesystemsarecommon.This evidencecomesfrom theory [Menzieset al. 2000], experimentation[Menzieset al.1999],andanextensive literaturereview [Menzies& Cukic2000].
Theabsenceof wild indeterminacy in HT4-likesystemshassignificantimplicationsfor thecon-structionof environmentsthatsupportdebatesbetween(e.g.)multiple-stakeholderduringrequire-mentsengineering.RequirementengineeringresearcherssuchasEasterbrook[Easterbrook1991],Finkelstein [Finkelstein,Gabbay, Hunter, Kramer & Nuseibeh1994], and Nuseibeh[Nuseibeh1997] argue that we should routinely expect specificationsto reflect different and inconsistentviewpoints.A key processwithin deliberationandnegotiationis supportingarguments.Argumen-tation supportenvironmentscanbe swampedby a surplusof choices.20 yes-nochoicesmeansó¶ô%õ#ö ë|÷:ø|ø|ø!÷=ø�ø|ø designchoices.Elaboratemechanismshavebeenbuilt for implementingsuchar-guments;e.g.[Menzies1997b, Mylopoulos,Chung& Nixon 1992,Nuseibeh& Russo1999,Rich& Feldman1992,Boehm,Egyed,Port,Shah,Kwan& Madachy1998]. Theresultsof our exper-imentssuggestthat thesemechanismsmay be overly elaborate;e.g. whenwe searchacrossthatspaceof
ó(ô%õoptionsfromgoalsto inputs,wefind thatoursearchpathsneedonly afew worlds.HT0
is a very simplevariantof HT4 thatexploits this “a few worldsareenough”property[Menzies&Michael1999].HT0 assumesmaster-variablesystems.Thealgorithmgeneratesasmallnumberofrandomlyselectedworldsin sequence;i.e. ù õ , then ù ñ , then ù ô andsoon. HT0 terminateswhen
24
the cover of ù�î is not muchgreaterthanthe maximumcover found by worlds ù õ ÷�ù ñ�ú�ú�ú ùÍîÁû ñ .etc..Experimentswith HT0 show that:òMaximum cover plateausvery quickly. That is, HT0’s partial world searchcan terminatemuchfasterthanHT4’s rigorousexplorationof all worlds. Hence,HT0 is muchfasterthanHT4. HT4 exploresall worlds. In experimentswith oneimplementationof thatalgorithm,HT4 wasobserved to run in exponentialtime (HT4 crashedwhenrunningtheoriesbiggerthan1,000hornclauses).HT0, runningon thesameplatform,executedin polynomialtimeandhasterminatedonvery largetheories(up to 18,000hornclauses).òFor problemswhereHT4 terminates(i.e. theorieswith üÞëcø|ø�ø hornclauses),HT0 finds98%of themaximumcover foundby HT4. Thatis, HT0’spartialworld searchcoverednearlyasmuchof thetheoryatHT4’s rigorousexplorationof all worlds.
The above experiments,plus the HT0 experience,suggestthat exploring conflictswithin re-quirementsis easierthanwe might have thought. This possibility is exploredextensively else-where[Menzieset al. 1999].
4 Conclusion
Basedon four experimentsthat performed100,00sof manipulationson a theory, we madetwocounter-intuitivefindings:ò
Reuseis harderthenwehadthought.Oursupposedlyreusablecomponentswill notconvergeto similardesignsunlesswecanexplorethosecomponentswith testsuitesthatarelargerthanthoseseenin currentpractice.òExploring thespaceof contradictoryideasseenin (e.g.) requirementsengineeringis easierthanwehadthought.A smallnumberof randomprobesacrossthespaceof possibleconflictswill find mostof theworlds.
Weconjecturethattheseresultsarecounter-intuitivesinceour intuitionswerebasedonexperiencewith dozens,not 100,000sof systems. Our expectationsof software developmentis basedonour experience.Over the courseof a singleprofessional’s career, an individual may seeseveraldozensystems(at most). From thosefew dozenexamples,that individual forms perceptionsofthe core issuesand the generalrules of software engineering. In this paperwe have exploredmethodsof finding somegeneralrulesof softwareengineeringbasednot on dozens,but 100,00sof manipulationsof a theory.
A side-effect of the above analysisis the evolution of a generalmethodfor exploring issueswithin softwaredevelopment:
25
1. Collectrepresentative theories.
2. Definerepresentativemanipulations.
3. Executemany manipulationsover themany theories.
4. Summarizetheresults.
5. Considertheimplicationsof thosesummarizeson standardpractice.
Wethink thatthismethodis apowerful technique.For example,ourconclusionscanbedemon-stratedincorrectif, afterapplyingthesefive steps,otherresearchersgeneratevery differentsum-mariesto thosedescribedabove.
Acknowledgements
This paperhasbenefitedenormouslyfrom the commentsof several anonymousreferees.SimonGossprovidedinvaluablesupportin developingthismaterial.Aditya GhoseandSteveEasterbrookofferedinsightful commentsinto earlierdraftsof this paper. During this research,thefirst authorwassupportedby theAI DepartmentUniversityof NSW, andthenaNASA cooperativeagreement(#NCC2-979).
ReferencesAbts,C.,Clark,B.,Devnani-Chulani,S.,Horowitz, E.,Madachy,R.,Reifer, D., Selby, R.& Steece,B. [1998],Cocomo
ii modeldefinitionmanual,Technicalreport,Centerfor SoftwareEngineering,USC,.http://sunset.usc.edu/COCOMOII/cocomox.html#downloads .
Alexander, C. [1964],NotesonSynthesisof Form, HarvardUniversityPress.
Alexander, C., Ishikawa, S.,Silverstein,S., Jacobsen,I., Fiksdahl-King,I. & Angel, S. [1977], A PatternLanguage,Oxford UniversityPress.
Bieman,J. & Schultz,J. [1992], ‘An empirical evaluation(andspecification)of the all-du-pathstestingcriterion’,SoftwareEngineeringJournal7(1), 43–51.
Boehm,B., Egyed,A., Port, D., Shah,A., Kwan, J. & Madachy, R. [1998], ‘A stakeholderwin-win approachtosoftwareengineeringeducation’,Annalsof SoftwareEngineering6, 295–321.
Bossel,H. [1994],ModelingandSimulations, A.K. PetersLtd. ISBN 1-56881-033-4.
Buschmann,F., Meunier, R., Rohnert,H., Sommerlad,P. & Stal,M. [1996], A Systemof Patterns: Pattern-OrientedSoftwareArchitecture, JohnWiley & Sons.
26
Chulani,S.,Boehm,B. & Steece,B. [1999], ‘Bayesiananalysisof empiricalsoftwareengineeringcostmodels’,IEEETransactionon SoftwareEngineerining25(4).
Clarke,E., Emerson,E. & Sistla,A. [1986], ‘Automaticverificationof finite-stateconcurrentsystemsusingtemporallogic specifications’,ACM Transactionson ProgrammingLanguagesandSystems8(2), 244–263.
Console,L. & Torasso,P. [1991], ‘A Spectrumof Definitionsof Model-BasedDiagnosis’,ComputationalIntelligence7, 133–141.
DeKleer, J. [1986], ‘An Assumption-BasedTMS’, Artificial Intelligence28, 163–196.
Easterbrook,S. [1991],Elicitation of Requirementsfrom Multiple Perspectives,PhDthesis,ImperialCollegeof Sci-enceTechnologyandMedicine, University of London. Available from http://research.ivv.nasa.gov/˜steve/papers/index.html .
Eshghi,K. [1993],A TractableClassof AbductiveProblems,in ‘IJCAI ’93’, Vol. 1, pp.3–8.
Falkenhainer, B. [1990],AbductionasSimilarity-DrivenExplanation,in P. O’Rourke,ed.,‘WorkingNotesof the1990SpringSymposiumon AutomatedAbduction’,pp.135–139.
Finkelstein, A., Gabbay, D., Hunter, A., Kramer, J. & Nuseibeh,B. [1994], ‘Inconsistency handling in multi-perspectivespecification’,IEEETransactionson SoftwareEngineering20(8), 569–578.
Frakes,W. [1999], Domainengineeringeducation,in ‘Proceedingsof WISR9: The 9th annualworkshopon Insti-tutionalizing SoftwareReuse’. Available from http://www/umcs.maine.edu/˜ftp/wis r/wis r9/final- papers/Frakes.html .
Gruber, T. [1993], ‘A translationapproachto portableontologyspecifications’,KnowledgeAcquisition5(2), 199–220.
Guerrieri,E. [1999], Reusesuccess- whenandhow?, in ‘Proceedingsof WISR9: The 9th annualworkshopon In-stitutionalizingSoftwareReuse’.Availablefrom http://www/umcs.maine.edu/˜ftp/wis r/wis r9/final- papers/Guerrieri.html .
Hamscher, W. [1990], ExplainingUnexpectedFinancialResults,in P. O’Rourke, ed., ‘AAAI SpringSymposiumonAutomatedAbduction’,pp.96–100.
Harrold,M., Jones,J. & Rothermel,G. [1998], ‘Empirical studiesof controldependencegraphsizefor c programs’,Empirical SoftwareEngineering3, 203–211.
Hirata,K. [1994],A Classificationof Abduction:Abductionfor Logic Programming,in ‘Proceedingsof theFourteenthInternationalMachineLearningWorkshop,ML-14’, p. 16. Also in MachineIntelligence14 (to appear).
Holzmann,G. [1997], ‘The modelcheckerSPIN’, IEEE Transactionson SoftwareEngineering23(5), 279–295.
Iwasaki,Y. [1989], Qualitative physics,in P. C. A. Barr & E. Feigenbaum,eds,‘The Handbookof Artificial Intelli-gence’,Vol. 4, AddisonWesley, pp.323–413.
Kakas,A., Kowalski, R. & Toni, F. [1998], The role of abductionin logic programming,in C. H. D.M. Gabbay&J. Robinson,eds,‘Handbookof Logic in Artificial IntelligenceandLogic Programming5’, Oxford UniversityPress,pp.235–324.
Kowalski,R. & Toni, F. [1996], ‘Abstractargumentation’,Artificial IntelligenceandLawJournal4(3-4),275–296.
Leake,D. [1993],FocusingConstructionandSelectionof AbductiveHypotheses,in ‘IJCAI ’93’, pp.24–29.
27
Lim, W. [1994], ‘Effectsof reuseonquality, productivity, andeconomics’,IEEE Softwarepp.23–30.
Menzies,T. [1995], Principlesfor GeneralisedTestingof KnowledgeBases,PhD thesis,University of New SouthWales.Avaliablefrom http://www.cse.unsw.edu.au/˜timm/pub/ docs/ 95the sis.ps .gz .
Menzies, T. [1996a], ‘Applications of abduction: Knowledge level modeling’, International Journal of HumanComputerStudies45, 305–355.Availablefrom http://www.cse.unsw.edu.au/˜timm/p ub/do cs/96abkl1.ps.gz .
Menzies,T. [1996b], On thepracticalityof abductivevalidation,in ‘ECAI ’96’. Availablefrom http://www.cse.unsw.edu.au/˜timm/pub/docs/96abvalid.p s.gz .
Menzies,T. [1997a], ‘OO patterns:Lessonsfrom expert systems’,Software Practice& Experience27(12), 1457–1478.Availablefrom http://www.cse.unsw.edu.au/˜timm/ pub/do cs/97 probs patt.p s.gz .
Menzies,T. [1997b], Qualitative causaldiagramsfor requirementsengineering,in ‘The SecondAustralianWorkshopon RequirementsEngineering(AWRE’97)’.
Menzies,T. [1998], ‘Towardssituatedknowledgeacquisition’, International Journal of Human-ComputerStudies49, 867–893.Availablefrom http://www.cse.unsw.EDU.AU/˜timm/p ub/do cs/98 ijhcs .
Menzies,T. & Compton,P. [1994],A PreciseSemanticsfor VagueDiagrams, World Scientific,pp.149–156.Availablefrom http://www.cse.unsw.edu.au/˜timm/ pub/do cs/ai 94.ps .gz .
Menzies,T. & Compton,P. [1997],‘Applicationsof abduction:Hypothesistestingof neuroendocrinologicalqualitativecompartmentalmodels’,Artificial Intelligencein Medicine10, 145–175.Availablefrom http://www.cse.unsw.edu.au/˜timm/pub/docs/96aim.ps.gz .
Menzies,T. & Cukic,B. [2000], ‘Adequacy of limited testingfor knowledgebasedsystems’,InternationalJournalonArtificial IntelligenceTools(IJAIT) . (to appear).
Menzies,T., Cukic,B., Singh,H. & Powell, J. [2000], ‘Testingindeterminatesystems’.ISSRE2000(to appear).
Menzies,T., Easterbrook,S., Nuseibeh,B. & Waugh,S. [1999], An empirical investigationof multiple viewpointreasoningin requirementsengineering,in ‘RE ’99’. Availablefrom http://research.ivv.nasa.gov/docs/techreports/1999/NASA- IVV- 99- 009.pd f .
Menzies, T. & Michael, C. [1999], Fewer slices of pie: Optimising mutation testing via abduction, in ‘SEKE’99, June17-19, Kaiserslautern,Germany. Available from http://research.ivv.nasa.gov/docs/techreports/1999/NASA- IVV- 99- 007.pdf ’.
Menzies,T. & Waugh,S. [1998], On the practicalityof viewpoint-basedrequirementsengineering,in ‘Proceedings,Pacific Rim Conferenceon Artificial Intelligence,Singapore’,Springer-Verlag.
Mylopoulos, J., Chung,L. & Nixon, B. [1992], ‘Representingand using nonfunctionalrequirements:A process-orientedapproach’,IEEETransactionsof SoftwareEngineering18(6), 483–497.
Newell, A. [1982], ‘The KnowledgeLevel’, Artificial Intelligence18, 87–127.
Newell, A. [1993], ‘Reflectionson theKnowledgeLevel’, Artificial Intelligence59, 31–38.
Ng, H. & Mooney, R. [1990], The Role of Coherencein ConstructingandEvaluatingAbductive Explanations,in‘Working Notesof the1990SpringSymposiumon AutomatedAbduction’,Vol. TR 90-32,pp.13–17.
28
Nuseibeh,B. [1997],To beandnot to be:Onmanaginginconsistency in softwaredevelopment,in ‘Proceedingsof 8thInternationalWorkshopon SoftwareSpecificationandDesign(IWSSD-8)’, IEEECSPress.,pp.164–169.
Nuseibeh,B. & Russo,A. [1999],Usingabductionto evolveinconsistentrequirementsspecifications,in ‘Proceedingsof ICSE-99Workshopon SoftwareChangeand Evolution (SCE’99), LA, California, USA’. Available fromftp://dse.doc.ic.ac.uk/dse- papers/viewp oints /nuse ibeh.s ce99. pdf .
O’Rourke,P. [1990],Working notesof the1990springsymposiumon automatedabduction,TechnicalReport90-32,Universityof California,Irvine, CA. September27,1990.
Pearl,J.& Korf, R. [1987], ‘Searchtechniques’,Ann.Rev. Comput.Sci.19872, 451–67.
Perry, D. [1999], Someholesin theemperor’s reusedclothes,in ‘Proceedingsof WISR9: The9th annualworkshopon InstitutionalizingSoftware Reuse’. Available from http://www/umcs.maine.edu/˜ftp/wi sr/wisr9/final- papers/Perry.html .
Poole,D. [1985],On theComparisonof Theories:PreferringtheMost SpecificExplanation,in ‘IJCAI ’85’, pp.144–147.
Poole,D. [1990], ‘A Methodologyfor Using a Default andAbductive ReasoningSystem’,InternationalJournal ofIntelligentSystems5, 521–548.
Reggia,J.,Nau,D. & Wang,P. [1983], ‘DiagnosticExpertSystemsBasedon a SetCoveringModel’, Int. J. of Man-MachineStudies19(5), 437–460.
Rich, C. & Feldman,Y. [1992], ‘Seven layersof knowledgerepreseentationand reasoningin supportof softwaredevelopment’,IEEE Transactionson SoftwareEngineering18(6), 451–469.
Sahlin, D. [1991], An Automatic Partial Evaluator for Full Prolog, PhD thesis, The Royal Institute ofTechnology (KTH), Stockholm, Sweden. Available from file://sics.se/pub/isl/papers/dan- sahlin- thesis.ps.gz .
Schneider, F., Easterbrook,S., Callahan,J., Holzmann,G., Reinholtz,W., Ko, A. & Shahabuddin,M. [1998], Vali-datingrequirementsfor fault tolerantsystemsusingmodelchecking,in ‘3rd IEEE InternationalConferenceOnRequirementsEngineering’.
Schreiber, A. T., Wielinga,B., Akkermans,J. M., Velde,W. V. D. & deHoog,R. [1994], ‘Commonkads.a compre-hensivemethodologyfor kbsdevelopment’,IEEEExpert9(6), 28–37.
Simon,H. [1969],TheScienceof theArtificial, MIT Press.
Smythe,G. [1989], ‘Brain-hypothalmus,PituitaryandtheEndocrinePancreas’,TheEndocrinePancreas.
Waugh,S., Menzies,T. & Goss,S. [1997], Evaluatinga qualitative reasoner, in A. Sattar, ed., ‘AdvancedTopicsinArtificial Intelligence:10thAustralianJointConferenceon AI’, Springer-Verlag.
Weyuker, E. [1993],‘More experiencewith dataflow testing’,IEEETransactionsonSoftwareEngineering19(9),912–919.
Zowghi, D., Ghose,A. & Peppas,P. [1996],A framework for reasoningaboutrequirementsevolution, in ‘Proceedingsof the4thPacificRim InternationalConferenceonArtificial Intelligence(PRICAI96),Cairns,Australia,August’.
29