some empirical automated software engineeringmenzies.us/pdf/99ab.pdf · software engineering is a...

29
Journal of Automated Software Engineering (submitted), WP:alp2/ase(May 2, 2000). Some Empirical Automated Software Engineering Tim Menzies , Sam Waugh NASA/WVU Software Research Lab, 100 University Drive, Fairmont, USA; [email protected] Defence Science and Technology Organisation, Air Operations Division, Melbourne, Australia; [email protected] May 2, 2000 Abstract We experiment with core processes within software development tasks (reuse and conflict management in requirements engineering). After defining those processes, we automatically perform 100,000s of manipulations in accordance with that process. The results of the ma- nipulations are summarized and discussed. In two case studies with this technique we find that building widely reusable software components will be harder than we had thought while resolving conflicts amongst competing stakeholders is easier than we had thought. 1 Introduction Can software engineering be an exact science? Software engineering is a human-intensive pro- cess. Such a sociological-intensive exercise can forbid exact conclusions. Nevertheless, in some circumstances, exact conclusions are possible; e.g. Suppose human idiosyncracises are the main controlling factor in software development. If so, then we might predict that exact software cost estimation requires complex equations about human dynamics. Yet in at least one case study, this was not true. Chulani, Boehm, and Steece used a simple model of software costs and merely two dozen input parameters to produce reasonably accu- rate estimates of software construction time (their estimates were within 25% of actual, 69% of the time) [Chulani, Boehm & Steece 1999].

Upload: others

Post on 14-Oct-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Some Empirical Automated Software Engineeringmenzies.us/pdf/99ab.pdf · Software engineering is a human-intensive pro-cess. Such a sociological-intensive exercise can forbid exact

Journal of Automated Software Engineering (submitted), WP:alp2/ase(May

2, 2000).

SomeEmpiricalAutomatedSoftwareEngineering

Tim Menzies�, SamWaugh

��NASA/WVU SoftwareResearchLab,100UniversityDrive,Fairmont,USA; [email protected] m�

DefenceScienceandTechnologyOrganisation,Air OperationsDivision,Melbourne,Australia;

sam.waugh@dsto. defen ce .g ov .a u

May 2, 2000

Abstract

We experimentwith coreprocesseswithin softwaredevelopmenttasks(reuseandconflictmanagementin requirementsengineering).After definingthoseprocesses,we automaticallyperform100,000sof manipulationsin accordancewith that process.The resultsof the ma-nipulationsaresummarizedanddiscussed.In two casestudieswith this techniquewe findthatbuilding widely reusablesoftwarecomponentswill beharderthanwe hadthoughtwhileresolvingconflictsamongstcompetingstakeholdersis easierthanwe hadthought.

1 Intr oduction

Cansoftwareengineeringbe an exact science?Softwareengineeringis a human-intensive pro-cess.Sucha sociological-intensive exercisecanforbid exactconclusions.Nevertheless,in somecircumstances,exactconclusionsarepossible;e.g.� Supposehumanidiosyncracisesarethemaincontrollingfactorin softwaredevelopment.If

so, thenwe might predict that exact softwarecostestimationrequirescomplex equationsabouthumandynamics.� Yet in at leastonecasestudy, this wasnot true. Chulani,Boehm,andSteeceuseda simplemodelof softwarecostsandmerelytwo dozeninputparametersto producereasonablyaccu-rateestimatesof softwareconstructiontime(theirestimateswerewithin 25%of actual,69%of thetime) [Chulani,Boehm& Steece1999].

Page 2: Some Empirical Automated Software Engineeringmenzies.us/pdf/99ab.pdf · Software engineering is a human-intensive pro-cess. Such a sociological-intensive exercise can forbid exact

This articleseeksotherconclusionsthatarenot effectedby theidiosyncracisesof humansoft-wareengineers.We assumethatwhenpeoplewrite software,they aremanipulatingtheories. Aswe hopeto show, certaintheorypropertiesarestableacrossawide rangeof theorymanipulations.Thatis, fortunately, someof thepropertiesof our theoriesmaynotbeasidiosyncraticasthepeoplewho write them.

To make thiscase,wewill performmanipulationsonageneralclassof theories(and-orgraphswith groundnodes).Sincetherearemany possiblemanipulationson a theory, we will useanau-tomaticrig to performmillions of manipulationssuchasinjecting errorsinto theories;changingtheoryconnections;alterthedatasetsgivento a theoryandaltertheprocessingof a theoryinclud-ing totally randomselectionamongstalternatives.In theexperimentsreportedhere,we foundthatcertainpropertieswill emergedasstable,despitewidespreadmanipulations.In particular, sometasksweremuchharderthatexpected,while othersweremucheasier:

Much harder: Softwarereuse.

Much easier: Handlinginconsistenciesin requirements.

The restof this article is structuredis followed. � 2 offers an exampleof our type of experi-mentation.As we defineour experiments,we will encountercertainrestrictive suppositionsthatthreatenthegeneralityof our results. � 3 debatesthosesuppositionsandconcludesthattherig usedin thoseexperimentsis generalto a largeclassof softwareproblems.

1.1 Caveats

Beforeproceeding,we pausefor a methodologicaldigression.The conclusionsof this paperarebasedon experimentswith manipulatingtheories. The experimentalbasisfor our conclusionsis this paper’s greateststrength. Opponentsof our views canquickly refuteour conclusionsbydesigningand executingexperimentsthat generatecontraryconclusions. We would encouragesuchexperiments.This paperis an attemptto addmorerigor to softwareengineeringresearch.If this paperinspiresfurther careful experimentation,then it is a successeven if thosefurtherexperimentsrefuteour conclusions.

On the otherhand,the experimentalbasisfor our conclusionsis this paper’s greatestweak-ness.Our conclusionsstandsonly for thesampleof problemsstudiedhere.If a differentclassofproblemsresultedin a differentbehaviour, theconclusionwould have to berestrictedto theclassof problemsseenin our above experiments.Further, all our conclusionsfail if therestrictionsweobserve disappeardueto somenew methodfor theorymanipulation.For examples,optimizationsarealwaysbeingdiscoveredfor theoreticallyintractabletasks.

Lestwe prematurelyconvinceyou thatthis paperis fatally flawed,we hastento addthatnoneof our conclusionswill basedon runtimes.All our experimentswill run to termination.That is,

2

Page 3: Some Empirical Automated Software Engineeringmenzies.us/pdf/99ab.pdf · Software engineering is a human-intensive pro-cess. Such a sociological-intensive exercise can forbid exact

the effectswe report are not somefunction of processorspeedor datastructures. Instead,ourconclusionscomefrom giving our systemunlimited time to performits manipulations.Further,whenwe designedour experiments,we werenervousof our samplingbias. Hence,we alwaysgeneratedmany, many variantsof eachsampleproblem. Nevertheless,our generatorscan notpossiblygenerateproblemsacrosstheentirespaceof problemtypes.At this time,weareunawareof suchaclassof problemsandleave this issuefor futureresearch.

2 A Study into Theory Reusability

This sectiondescribesan exampleof the kinds of analysisthat is possiblevia large-scaletheorymanipulation.Wewill describeanexperimentalrig andthebehavioursseenin thatrig. In summary,we will explore a suiteof theories378,000timesto concludethat the chancesof building trulyreusablecomponentsis a functionof theamountof datausedto testthosecomponents.

The intent of this sectionis to offer the readera flavor of the kind of analysiswe seektoencourage.Hence,for themoment,wewill not discusstheexternalvalidity of theseobservations.In thesequel,however, wewill arguethattheobservationsseenis this rig areapplicableto a wideclassof softwaretheories.

2.1 Moti vation

Our readingof the literatureis that designvia reuseis the dominantparadigmin contemporaryknowledgeandsoftwareengineering.This paradigmdatesbackat leastto 1964with Alexander’swork on architecture[Alexander1964,Alexander, Ishikawa,Silverstein,Jacobsen,Fiksdahl-King& Angel1977].In summary, whenwedesignvia reuse,were-shufflecomponentsdevelopedprevi-ously, thenabstractedinto areusableform. Modernexpressionsof designvia reuseincludeobject-orienteddesignpatterns[Buschmann,Meunier, Rohnert,Sommerlad& Stal1996,Menzies1997a],andtheknowledgeengineeringresearchinto ontologies[Gruber1993]andproblemsolvingmeth-ods[Schreiber, Wielinga,Akkermans,Velde& deHoog1994].A tacitassumptionin thisparadigmis thata communitysharesa definitionof someconcept.This assumptionwould beviolatedif, intheusualcase,differentdevelopersfacingthesameproblemsdevelopvery differentdefinitionsofsoftwareconcepts.

Whenwon’t reusework? To answerthat question,we assumethat whenwe build reuseli-braries,we arecreatingthoselibraries from an implicit spaceof alternative designs. In our ex-perience,thedesignprocessstopswhenwe have rejectedan adequatenumberof clearly inferiorproposals.Whenexploring thedesignspace,thereexistssituationswheresoftwarereuseis likely,andothersituationswhereit is lesslikely. Supposeonegooddesignwasclearly superiorto allalternatives. In this case,we could dependon differentdesignersworking at differenttimesand

3

Page 4: Some Empirical Automated Software Engineeringmenzies.us/pdf/99ab.pdf · Software engineering is a human-intensive pro-cess. Such a sociological-intensive exercise can forbid exact

00.20.40.60.8

1

0 0.2 0.4 0.6 0.8 1R

elat

ive

cost

s�Amount modified

Figure1: Relativeof reusewith X%changes;i.e. costof adaptiondividedbythecostof rebuildingfromscratch. Data fromtheCOCOMO-IIsoftwarecostestimationproject[Abts etal. 1998,p21].

differentlocationsto arrive at thesamefinal design.Reuseis likely in this casesincetheproblemdomainwould force designersto make the samedecisions. Designerswould henceunderstandreusableartifactssinceit would beclearwhy a particularreusableartifactwasbuilt this wayandnot thatway.

Alternatively, supposea clearlysuperiordesignwashardto detect.In this case,it is possiblethatdifferentdesignerswill arriveat differentfinal systems.If theproblemdomaindoesnot forceasimilarsetof decisions,designersarelesslikely to understandsupposedlyreusableartifacts.Ourdesignersmayalwaysbepuzzlingwhy aparticularreusableartifactwasbuilt thiswayandnot thatway. Worsestill, they maybetemptedto tinkerwith thereusableartifactin orderto contortit intotheir own view of theproblemdomain.Suchtinkeringcanremove theeconomicjustificationforreuse. A learningcurve mustbe traversedbeforeany modulecanbe adapted.By the time youknow enoughto changea little of that module,you may aswell have re-written60% of it fromscratch;seeFigure1.

2.2 From Moti vation to Manipulations

The above motivation cannow be mappedinto a seriesof theorymanipulations.As mentionedabove,duringthismappingprocess,wewill encountercertainrestrictivesuppositionsthatthreatenthegeneralityof our results.Thesesuppositionswill benoted,thendebatedlater.

We begin by commentingthat reuseis lesslikely if clearlysuperiordesignsarenot recogniz-able.Thatis, someoraclecannotdeclarethattheory �� is “muchbetterthan” theory �� (whichwedenote�� ���� ). Let usassumethat this oraclecanaccessa list of outputstatesthat theprogramshouldreachfrom someinputsstates.We definethecover of a theory ��� to bethelargestnumberof outputsin canreachfrom the inputs. As ����������������� approaches100%,the theorycanreachallits desiredoutputs.Usingthis definitionof ����������������� , wecanimplementtheoracleasfollows:�������! #"%$&�'�(�����!�)���)�+* �������������� '� (1)

4

Page 5: Some Empirical Automated Software Engineeringmenzies.us/pdf/99ab.pdf · Software engineering is a human-intensive pro-cess. Such a sociological-intensive exercise can forbid exact

Cover,

# errors-in theory

100%

0%. slow/

decay0

fast 1 decay

none2 some3 many4Figure2: Decaycurvesfor cover.

Thefirst suppositionof this experimentis:

Supposition1 Our oracleof betterdesigns( � ) is widelyapplicable.

Assumingthat Equation1 is valid, we notethat aswe introduceerrorsinto our theories,weexpectcover to decrease.Supposethe decayin cover wasvery rapid; e.g. asshown in the fastdecaycurve of Figure2. In the caseof fastdecayingcover, we canhopethat differentdesignswill convergesince,if ever we stumbledover a superiordesign,we would besureto realizethatit wasindeedsuperior(sinceit’ s cover would be very muchgreaterthanits opposingtheories).Alternatively, if thedecayin cover wasvery shallow (e.g. theslowdecaycurveof Figure2) thendesignsmaynotconvergesincecompetingtheorieswouldbeindistinguishable.

Thediscussionin theabovemotivationsectioncannow beexpressedpreciselyusingEquation1andthenotionof fastdecay:

Reusablereuselibrariesarelikely in thecaseof fastdecaysincetherewill only beasmallnumber(?one)of bestdesignson topof thedecaycurve.

Conversely:

Reusablereuselibrariesareunlikely in thecaseof slow decaysincemany designsexiston top of thedecaycurve. In this case,differentdesignersworking at differenttimesanddifferentlocationsmaygeneratedifferentsolutionsto thesameproblem.

Hence,to experimentallyexplorethelikelihoodof softwarereuse,weneedanexperimentalrigthattracksthedecaycurvesof a theoryaswemanipulatethetheoryto introduceerrors.

5

Page 6: Some Empirical Automated Software Engineeringmenzies.us/pdf/99ab.pdf · Software engineering is a human-intensive pro-cess. Such a sociological-intensive exercise can forbid exact

fish growthrate

boatinvestmentfraction

boatpurchases

catch potential

--

++

++

++

++ ++++

++

++

--

++ ++

++

++

--

fish catch

boatmaintenance

incomenet

catchproceeds

++

--

fish density

change

decomissionsboat

fish population

in boatnumbers

change

Figure3: The fisheries model from [Bossel 1994] (pp135-141).

2.3 Experiment #1: Exploring Reuse

In this experiment,we generatecover decaycurvesof the form of Figure2. To do this, we need(A) a sampletheoryto explore, (B) a methodof introducingerrors,(C) a methodof generatinginput-outputpairsthat the correcttheoryshouldbe ableto cover, and(D) a simulatorwhich cansearchingfrom inputs to find the maximumcoverageof the output. This sectiondescribesourmethodsfor implementing5�687:9;7=<>7=?A@ .

2.3.1 A: SampleTheory

Figure3 showsour sampletheory. Thisfigureis aqualitative form of somequantitativeequationsdescribingfishesgrowing in afishery. For themoment,wedonotdiscussthegeneralityof theoriesof theform of Figure3. This issuewill bediscussedlater.

In Figure3, eachtheoryvariableshasthreestates:up, downor steady. Thesevaluesmodelthesignof thefirst derivativeof thesevariables(i.e. therateof changein eachvalue).X BCBD YdenotesthatYbeingup or downcouldbeexplainedby X beingup or downrespectively. Thatis:E ��BCBD E GF H E ��I J E KIE ��L J E KL

6

Page 7: Some Empirical Automated Software Engineeringmenzies.us/pdf/99ab.pdf · Software engineering is a human-intensive pro-cess. Such a sociological-intensive exercise can forbid exact

(whereI and L denoteupanddown respectively.)X MCMD YdenotesthatYbeingupor downcouldbeexplainedby X beingdownor up respectively.

Thatis: E �NMCMD E GF H E �OI J E KLE �OL J E KITacit in Figure3 is conjunctionsof influences.Weshouldview thistheoryasinfluencessplash-

ing aroundpipesconnectingtubs. Pairsof competinginfluencescancancelout. That is, we canexplain the level of waterin a tub remainingsteadyvia conjunctionof competingupstreaminflu-ences;e.g. �P� E �OIQJ E KIR�TSU� EOV LWJ E KLR�X�YJ�P� E �CIWS ECV LR�ZJ[� E ]\_^�`a�cbedgf!�P�

Figure 3 is a very small theory: 12 three-statevariablescopied6 times (one for eachtimetick). Realworld softwaresystemscanbemuch,muchlarger than12 variables.Hence,our nextsuppositionmustbe:

Supposition2 WecanscaleupconclusionsfromasmalltheorysuchasFigure3 to larger theories.

Apart from size,Figure3 alsohasa fixeddegreeof connectivity, alsoknown asfanout. Thefanoutof Figure3 is

�ih�jlk%majon�p�qXrtso�urav�wxjon \zy|{x} . We will needto checkthatchangingthis fanoutdoesnoteffect ourconclusions.Hence:

Supposition3 Wecanscaleconclusionsfroma theoryof fanout=1.4to theorieswithotherfanouts.

Apart from thesizeandfanoutissues,experimentingwith Figure3 implies:

Supposition4 Wecanrepresenta usefulrangeof softwaresystemsasnetworkslikeFigure3.

2.3.2 B: Intr oducing Err ors

Figure3 containsedgesof two types:X BCBD Y andX MCMD Y. We cancorruptthetheoryby randomlyselecting~ edges,thenflipping their sign( ��� to ��� andvisaversa).For a theorywith � edges( ~���� ), we cannow generate��� variants.Notethatat ~ \�� we have theoriginal theoryand,as ~ increases,wemoveto progressively worsetheories.

7

Page 8: Some Empirical Automated Software Engineeringmenzies.us/pdf/99ab.pdf · Software engineering is a human-intensive pro-cess. Such a sociological-intensive exercise can forbid exact

2.3.3 C: Generating Input/Output Pairs

To generateinput-outputpairsthat shouldgenerate100%cover, we usethe ~ \�� theory(i.e.the correcttheory) to generatedata. To apply this method,the quantitative fisheriestheorywasrun 15 times. Inputsandgoalsweregeneratedby comparingall pairsof themeasurementsin the15 runs(105suchpairsexist). That is, by comparingpairsof runswe canconvert raw numbersto statementslike (e.g.) “when we comparerun 3 to 9, we saw that catchPotential went up andcatchProceedswentdown”.

Next, the comparisonswere convertedinto input-outputpairs. The theory of Figure 3 wascopied < timesto representthe stateof the systemat time <[\��R7=<�\�y|7�{�{�{ . Inputsarecom-parisonsthat appearat time copy <�\�� andoutputsarecomparisonsthat appearat time copy<�@�� .

If wecopy thetheory < times,thenwemustlink timecopy <�\_" to timecopy <�\�"���y usinga temporal linking policy. NotethevariablesChange in boatNumbersandfishpopulationchange.Thesevariablesaretheexplicit timevariables. In theXNODE temporallinking policy usedin thisfirst study, explicit timevariablesconnectsto themselvesin theimmediatefuture.Thatis, for eachexplicit timevariable,XNODE addsa link���C� � BCBD ���C� � B �

We acknowledgethat thechoiceof XNODE asthetemporallinking policy is somewhatarbi-traryandcouldlimit thegeneralityof our conclusions;i.e.

Supposition5 Conclusionsdrawnfroma studyof XNODE-basedsystemsapply to non-XNODEsystems.

This testdatalibrary now containedmeasurementsfor all variablesduring the lifetime of thesimulation(5 years;i.e. <�\z�!7�y|7��R7=�R7:}�7�� ). In practice,it is unlikely that all suchdatawill beavailableto a testingteam.Hence,we generateinput-outputpairsthat refer to lessthan yc���g  ofthevariablesin Figure3. To generatetestsetswhereU percentof thesystemwasunmeasured,Upercentof thegoalswasthendiscardedto produce10 variantsof thedatawith U at 0, 10,20,30,40,50,60,70,80,and90 percentunmeasured.

2.3.4 D: Simulating the Theory

A simulatorfor thefisheriesmodelmustbecontradiction-tolerant. To seewhy, notethatFigure3is a qualitative theory[Iwasaki1989]; i.e. it is under-specifiedandhencecangenerateinconsis-tencies.For example,Figure3 saysthat increasingcatchPotentialcanincreasefishCatch. It alsosaysthatdecreasingfishDensitycandecreasefishCatch. It is undeterminedwhathappensto fish-Catch in the caseof increasingcatchPotential AND decreasingfishCatch. SinceFigure3 does

8

Page 9: Some Empirical Automated Software Engineeringmenzies.us/pdf/99ab.pdf · Software engineering is a human-intensive pro-cess. Such a sociological-intensive exercise can forbid exact

not specifytherelative sizesof theseinfluenceson fishCatch, all we cansayis thatfishCatch canrise,fall, or remainsteady(in thecasewherethetwo competinginfluencescanceleachotherout).Hence,if asimulatorarrivesatfishCatch from increasingcatchPotentialanddecreasingfishCatch,thenit mustfork thereasoningoneway for eachpossibility; i.e. oncefor fishCatch=up, onceforfishCatch=down, andoncefor fishCatch=steady.

In thisexperiment,ourcontradiction-tolerantsimulatorwill betheHT4 abductiveinferenceen-gine[Menzies& Waugh1998,Menzies& Michael1999,Menzies1995,Menzies1996a, Menzies1996b, Menzies& Compton1997]. Abductionis informally definedasinferenceto the bestex-planation(e.g. [O’Rourke 1990]). More formally, abductionis thesearchfor assumptionswhich,whencombinedwith sometheoryexplainssomesetof goals[Eshghi1993]. Mutually exclusiveassumptionsimply that many alternative abductive-explanationsmay be generated.Suchmutu-ally exclusiveassumptionsmustbemanagedin seperateworldsof beliefs.Returningto theaboveexample,in the caseof increasing�'b|`a��¡C¢���`a��£�`t"obe¤ anddecreasing$�"o^(¡C<�b|`a��¡ , thenwe generatethreeworlds ¥U��7=¥¦�7=¥¦§ . ¥U� includes$�"%^�¡C<�b|`a��¡¨\ª©g« , ¥� includes$�"o^(¡C<�b|`a��¡¬\­d���®¯£ , and¥�§ includes$�"%^�¡C<�b|`a��¡°\±^�`a�cbedef .

In essence,HT4 is likeasearchengineexploringsomenetwork of ideaslooking for usefulandconsistentportions(eachsuchportion is a world). Thegeneralityof our conclusionsdependsonthis styleof searchbeingapplicableto a rangeof softwareengineeringproblems;i.e.

Supposition6 Wecanmodelsoftwareconstructionastheexplorationof networkslikeFigure3.

Supposition7 Wecanmodelnetworkexplorationasabduction.

Onetrapwith usingHT4 is thatourconclusionsmayonly holdwhenprocessingindeterminatetheories.Thatis:

Supposition8 Our conclusionsfrom the studyof indeterminatetheoriesappliesto determinanttheories.

2.3.5 Details

Summarizingtheabove discussion,our experimentalrig containsa theory � generatedvia sometime linking policy (e.g. XNODE). The theoryis a directedgraphcontainingverticesandedges5 E 7=²­@ . Eachtheoryvertex

Eis apairof avariableandatimecopy index < � ; e.g. $�"o^�¡O<�b|`a��¡O³�<#

is thevalueof $�"%^�¡C<�b|`a��¡ in time copy <´\�� . ² aretheconnectorsbetweenvariablesareoneofa setof pre-definedtypes;e.g. BCBD or MCMD . Thatis:E �µ\ ��bg�¶"lb�·�¤)��³¯`t"l¸¹�º���=«Cf#"i£Nde��»²¼�A\ E � BCBD E #�(� E � MCMD E � \ 5 E 7=²­@

9

Page 10: Some Empirical Automated Software Engineeringmenzies.us/pdf/99ab.pdf · Software engineering is a human-intensive pro-cess. Such a sociological-intensive exercise can forbid exact

Ourgoalswearetrying to explainaretheoutputs.In thegeneralcase,only someof theoutputsare reachable. The reachableoutputsare denoted�(©C`�«C©C`%½ . Recallingthe above discussiononXNODE, all theoutputscomefrom timecopy <�@�� . Thatis:��©O`�«�©O`X^ ½¿¾ ��©O`�«�©O`X^ ¾ E ³�< �ÁÀ�ÂTheassumptionsarethosevariablesusedto connectusedinputsto usedoutputs.Theusedinputsaredenoted"l£�«C©C`X^�½ . Recallingthe above discussionon XNODE, all the inputscomefrom timecopy <�\_� . Thatis: "i£R«C©C`X^ ½ ¾ "l£R«�©O`X^ ¾ E ³�<#Â6 ¾ E �Ã"l£R«�©O`X^ ½ �Ä��©C`�«C©C`X^ ½Å

is thepercentageof thetheoryvariablesthatdonotappearin theinputsandoutputs.Thatis:Å \ Æy¼�µÇ "i£R«C©C`X^KÈÉ�(©C`�«C©C`X^ ÇÇ E Ç ÊÌË yc���An abductive-explanation is a world of mutually compatiblebeliefs ¥Í� . A world connects

someinputsto someoutputswithout causingcontradictions.In HT4, worlds areextractedfromtheedgesin our theories.Thatis: ¥�� ¾ ² (2)¥��!ȹ"l£R«�©O`X^ ½ ÈÎ6 J ��©O`�«�©O`X^ ½ (3)¥��!ȹ"l£R«�©O`X^ ½ ÈÎ6 ÏJ Ð (4)

HT4 seekstheabductive explanation¥Í� with maximal �'�(����� ; i.e. theexplanationthatmaxi-mizesthesizeof �(©C`�«C©C` ½ . Thatis:�'�(�����!�)�¯�Ñ\ ¸¹b|»�� Ç ¥��!ÈÒ�(©C`�«C©C`X^ ½ Ç � (5)

With thesedefinitionsin hand,we cannow specifyour experimentalrig. 20 timesweflipppedtheedgetypeof betweennoneto all of theedgesin Figure3 to createa corruptedtheory ��Ó . Foreachsuchcorruptedtheory ��Ó , we:� Copied��Ó 6 times(onceeachfor <�\_�!7�y�7��R7=�!7P}�7�� ).� ConnectedthecopiesusingtheXNODE linking policy to form ��Ô .� For all 105comparisonsof thedata,we:

10

Page 11: Some Empirical Automated Software Engineeringmenzies.us/pdf/99ab.pdf · Software engineering is a human-intensive pro-cess. Such a sociological-intensive exercise can forbid exact

– ForÅ�ÕZÖ �R7�yc�!7��¶�!{�{�{Ø×|�RÙ , we:Ë Built theoutputsandinputsusing y��|�Ú� Å % of thecomparisondata.Ë CalledHT4 to find maximumcoverusingtheinputs,theoutputsand ��Ô .

Thereare17 edgesin Figure3. Hence,this procedurecallsHT4 378,000times(20 repeats*0..17edgesflipped* 105comparisons* 10

Åvalues).

2.3.6 Results

Thebehaviour of thisexperimentalrig is shown in Figure4. To readtheresults,notethatthex-axisshows a progressionfrom a correctfisherytheory(at 0 edgesflipped) to a very incorrectfisheriestheory(at17edgesflipped).For thetopplot, they-axisshowsthemaximumcover foundby HT4.As we would expect,asthenumberof errorsin a theoryincreases,thecoverageof desiredgoalsdecreases.Recallingour abovediscussion,reuseis unlikely in thecaseof slow decayof thecovercurve. Our readingof Figure4 is that after

Š\��� , the decaycurve becomesvery flat; i.e. itbecomesdifficult to distinguisha theorywith no errors(left handsideof thetop plot in Figure4)from a theorywith multipleerrors(right handsideof thetopplot in Figure4).

Theruntimeresults(Figure4, bottom)show two interestingtrends.Firstly, in nearlyall cases,as the theoriesgrow more corrupted,it becomesfasterto determinewhat goalsare explicable(exception:for thevery unmeasuredtheories,theruntimesfor goodtheoriesthesameastherun-timesof badtheories).This is a niceresult:nonsensetheoriescanberejectedfasterthansensibletheories.

Secondly, theabove experimenttried to find anexplanationfor every memberof theoutputs.Thatis, asthepercentageof unmeasuredvariablesU wasincreased,thesizeof theoutputgoalsetdecreased.A pre-experimentalintuition wasthataswe tried to explain moreandmoregoals,theruntimeswould increase.For theoriesthatweremostlycorrect(corruptionslessthan3 out of 17),this intuition provedcorrect. However, for theoriesthataremoderatelyto wildly incorrect(morethan3 of the17 edgescorrupted),the runtimesdecreasedasthenumberof goalswasincreased.We explain this surprisingresultasfollows. For theoriesthataremostlycorrect,many successfulproofs of goalscan be generated.The overheadsof building theseproofs can slow down theinferenceprocedure.Ontheotherhand,aswesupplymoreandmoredatafor moderatelyto wildlyincorrecttheories,we constrainthe searchspacemoreandmore. For example,in fisheries,anunmeasuredvariablecould take the valueup, down, or steady. Oncethat variableis measured,two-thirds of the possiblesearchspacedisappears.Hence,for thesenot-very correct theories,runtimesdecreaseaswe increasethenumberof goals.

11

Page 12: Some Empirical Automated Software Engineeringmenzies.us/pdf/99ab.pdf · Software engineering is a human-intensive pro-cess. Such a sociological-intensive exercise can forbid exact

0

20

40

60

80

100

0 5 10 15 17

% e

xplic

able

Number of corrupted edges; max=17

XNODE (U% unmeasured)

U=90U=80U=70U=60U=50U=40U=30U=20U=10U= 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 5 10 15 17

runt

ime

(s)Ü

Number of corrupted edges; max=17

XNODE (U% unmeasured)

U=90U=80U=70U=60U=50U=40U=30U=20U=10U= 0

Figure 4: Experiment #1. XNODE linking: Corrupting 0 to 17 edges, running validation with lessand less data. Percentage explicable (cover) on top; runtimes on bottom.

2.3.7 Experiment #1: Conclusions

At leastfor theexperimentalrig shown above,thelikelihoodof reusablecomponentsis a functionof theamountof dataavailableto testthecomponents.As lessandlessof thetheoryis unmeasuredin theinputs/outputsset,theharderit becomesto recognizeaclearlysuperiortheory. Thethresholdappearsto bearound70%unmeasuredtheoryvariables.

If this is a generalresult,thenwe mustdoubttheutility of building reuselibraries.For nearlya decadewe have beenresearchingsoftwaretesting. In our experience,we believe that it is veryrarethatsoftwaretestsuitesreferto morethan30%of thevariablesin acomponent.If we testourcomponentswith insufficient data,thenwe will not detectclearly superiordesigns.In this case,

12

Page 13: Some Empirical Automated Software Engineeringmenzies.us/pdf/99ab.pdf · Software engineering is a human-intensive pro-cess. Such a sociological-intensive exercise can forbid exact

designersworking in different locationsmay arrive at different final designsfor their softwarecomponents.Thatis, we expectthat:� Supposedlyreusablesoftwarecomponentscontaindesigndecisionsthatareonly acceptable

to thecommunitythatmadethem.� A broaderaudiencemight find thosedecisionsconfusingor unacceptable.� Hence,softwarecomponentswouldnotbewidely reused.

Thereareat leasttwo casesin whichwecoulddemonstratethatthis conclusionis not general:

1. Wecouldshow thatour suppositionsareoverly restrictive.

2. Wecouldshow thatreuseis acommonandsuccessfultechnique.

We will discussthe suppositionsin the next section. As to the secondpoint, it hasyet to beconclusively proved that reuseis a commonandsuccessfultechnique.Menzies[Menzies1998]criticizesoverly-enthusiasticreportsof reusethat lack somemeasureof quality of theconstructedsystem.Without suchquality assessments,it is hardto assessthe overall impactof reuse1. Pro-ponentsof reuserarely track theon-goingcostsof maintainingwith thosecomponents[Menzies,Cukic, Singh& Powell 2000] (exception: [Lim 1994]). Most reusereportsdo not clearlydistin-guishbetweenverbatimreuseandreusewith sometinkering. RecallingFigure1, suchtinkeringto customizea reusablecomponentcansignificantlyincreasethecostanddecreasethebenefitsofusingreusablecomponents.Further, even enthusiasticproponentsof reusetestify to the lack ofwidespreadreuse.For example:

I donotthink wehaveyetsucceededin softwarereuse.In theStateof thePractice,I donotseealot beingapplied.Yes,OOclassframeworks,Java,VisualBasic,etc.havefa-cilitatedthecodereuse,but lessis done,in general,atthehigherlevel suchasin thede-signandrequirementsphasesof aproject.In theStateof theArt, I havenotperceivedany incrementalprogressalthoughmany issueshavebeenaddressed[Guerrieri1999].

Reusecontinuesto be a problemwhosepotentialremainselusive. Eachnew solu-tion remainsfull of promisebut riddled with what look like insurmountableprob-lems.[Perry1999]

Even reuseenthusiastssuchasFrakes[Frakes1999] cautionthat thereexist significantpracticalproblemswith the widespreadproliferationof reuselibraries; e.g. problemswith searchingthereuselibrary.

In summary, our readingof the literatureis that we can’t reject the generalityof the aboveconclusionsbasedon successfulreportsof reuse.

1Oneof the few reusereprotsthat includesquality measuresis [Lim 1994]. However, that report refer to intra-institutionalreuse,notwidespreadinter-institutionalreuse.

13

Page 14: Some Empirical Automated Software Engineeringmenzies.us/pdf/99ab.pdf · Software engineering is a human-intensive pro-cess. Such a sociological-intensive exercise can forbid exact

3 External Validity

We believe that the above resultsaregeneralto a wide classof software. The restof this arti-cle attemptsto justify that that belief. To make this case,we explore the suppositionsdescribedabove. In the processof making that case,we will proposeandperformotherexperimentsthatcommentpositively on the practicalityof processingindeterminatetheories(including early lifecycle requirementstheories).

3.1 Choiceof Oracle

Supposition1 claimedthat theoriesshouldbe assessedvia how well they canreproducea setofdesiredgoals(recallEquation1). Clearly, therearemany otherwaysto assessa theory;e.g.parsi-mony, runtimespeedof its inferences,etc.However, weregardtheseothermeasuresassecondary.Theoriesonly becomeinterestingwhenthey canperformtheir primary tasks.Only afterdemon-stratingtheorycompetency shouldwemoveonto assess(e.g.)its speedor economyof expression.Partial supportfor our view on theoryassessmentcanbe found elsewherein the literature. Forexample,themodelcheckingcommunityassesstheoriesvia their coverageof temporallogic con-straints;e.g.[Holzmann1997,Clarke,Emerson& Sistla1986].

3.2 Bigger Theories

Supposition2 claimedthataconclusiondrawn from atheorywith 12variablescanbeextrapolatedto largertheories.Recallthatourkey observationwasthatour testoraclefailedto distinguishgoodtheoriesfrom badtheoriesafter

Å \ÝÛ��g  . What would happento this observation if the theorygetslarger?

In answeringthisquestion,wefirst notethattestinggetsharderasthesystemgetslarger. Somesupportfor this claim can be found in NASA’s experiencewith model checkers. The informalrule-of-thumbat NASA is thatmodelcheckingtechniquessuchasSPIN[Schneider, Easterbrook,Callahan,Holzmann,Reinholtz,Ko & Shahabuddin 1998,Holzmann1997] can’t processlargesystems. The invariantsfrom merely 30 classesin an OO systemcan requireone gigabyteofmemoryto modelcheck(andonegigabyteof ram is our currentpracticallimit of what we canexpectonaworkstation).

Note that if we can’t tell goodfrom badeven for very small theoriesthen,asthe theorygetslargerthanFigure3, it mustbecomeevenharderto distinguishgoodlargertheoriesfrom badlargertheories.Hence,webelieve thatSupposition2 holdssinceour pessimisticconclusionsconcerningreusemustgetevenmorepessimisticassystemsizegrows.

14

Page 15: Some Empirical Automated Software Engineeringmenzies.us/pdf/99ab.pdf · Software engineering is a human-intensive pro-cess. Such a sociological-intensive exercise can forbid exact

25

50

7580859095

100

1 2 3 4 5 6 7 8 9 10%

exp

licab

le

Average fanout

V=449V=480V=487V=494V=511V=535

Figure5: Percentage explainable. From [Menzies 1996b].

3.3 Mor e Intricate Theories

Supposition3 wasa variantof Supposition2. In this supposition,we claimedthat conclusionsfrom a theorywith fixed fanout(1.7) appliesto theorieswith different fanouts. In earlierwork,Menziesstudiedhow different fanoutseffect the conclusionsof HT4 [Menzies1996b]. In thatexperiment,edgeswere addedat randomto variantsof a real-world theory of humanglucoseregulationtakenfrom [Smythe1989]. This theorywaswritten in thesameformatasFigure3 andis describedin detailin [Menzies& Compton1997].Thattheoryhad1246edgesand554variables(fanout=1246/554=2.25).Six “clones”of thattheoryweregeneratedusingrandomvariableswhosedistributionsweretakenfrom theglucosetheory(e.g.themeannumberof parentsin thecloneswasthesameasthemeannumberof parentsin theglucosetheory).Eachclonewasapproximatelythesamesizeastheglucosetheoryandcontained449,480,487,494,511,or 535nodes.ThesewereexecutedusingHT4 with randomlychoseninputsandgoals. Edgeswere thenaddedrandomlyandthe executionsrepeated.Figure5 shows the results. At low fanouts,many behaviours wereinexplicable. However, after a fanoutof 4.4, mostbehaviours wereexplicable. Further, after afanoutof 6.8,nearlyall thebehaviourswereexplicable.

In summary, Menziesfoundthatasfanoutincreases,theprobabilityof explainingany randomlyselectedoutputalsoincreases.As this probability increases,thecover decaycurve would flattenout sinceour ability to distinguisha theorywith � errorsfrom a theorywith �A�Þy errorswoulddecrease.Consequently, we argue that Supposition3 holds: our pessimismaboutdetectingasuperiortheorywould increaseif thefanoutincreasesfrom thevalueof 1.7seenin Figure3.

3.4 Software= Network of Connections

Supposition4 claimedthata wide rangeof softwarecanbecharacterizedasa network of connec-tionsbetweenvariables.This is hardlya controversialposition.Many researchersmodelsoftwareasanetwork of connectinginfluences:

15

Page 16: Some Empirical Automated Software Engineeringmenzies.us/pdf/99ab.pdf · Software engineering is a human-intensive pro-cess. Such a sociological-intensive exercise can forbid exact

� Software coveragetools often view softwareasa network. Good test suitestry to maxi-mizethecoverageof network components(e.g. [Bieman& Schultz1992,Harrold,Jones&Rothermel1998,Weyuker1993]).� Any optimizing compilerbuilds a network (control/dataflow graph)from the code. Opti-mizationis thenamatterof reorganizingthenetwork to speedup theprogram.� Model checkersaccesssuchanetwork whenthey build a spaceof programvariablestates.� Expert systemsand logic programscan be reducedto suchstructures(using a techniquecalledpartial evaluation[Sahlin1991]).

3.5 Beyond XNODE

OurexperimentassumedSupposition5, i.e. for all explicit timevariables

�, weconnect

�at time

copy index < � to � at timecopy index < � B � . Whathappensto our conclusionsif wevary thetimelinking policy? Thisquestionis exploredin experiment#2.

3.5.1 Experiment #2: Other Temporal Linking Policies

Experiment#2repeatstheexperimentalrig of Experiment#1,but with adifferenttemporallinkingpolicy. In the XEDGE (explicit edge)linking policy, time changeson the out-edgesfrom theexplicit time variables.That is, for eachvariable ß downstreamfrom anexplicit time variable

�,

XEDGE addsa link

�Wà!� � D ß à!� � B � . For example,recall from Figure3 that fishGrowthRateisconnectedto fishpopulationchangeasfollows:$�"%^�¡C¢���«C©�¤)b|`t"o��£N<�¡Cbe£�áR� MCMD $�"%^�¡Câ��¶��®ã`X¡Cä�b|`a�

XEDGEwouldaddtheedge:$�"%^�¡C¢��=«C©�¤)bg`t"l�(£�<8¡Cbg£�áR� �C� � MCMD $�"%^�¡Câ��¶��®ã`X¡Cä�b|`a� �C� � B �Figure6 showsthebehaviour seenin Experiment#2. Thecurveshavenumeroussimilaritiesto

theXNODE behaviour seenin Figure4:� As before,nonsensetheoriescanberejectedfasterthansensibletheoriessincetheruntimesdecreaseasadderrors.� As before,thereexistsalowerthresholdin testsuitesize,below whichwecannotdistinguishgoodtheoriesfrom badtheories.That is, onceagain,we seethat aswe offer lessandlessdatain thetestsuite,thecoverdecaycurvesflattenout.

16

Page 17: Some Empirical Automated Software Engineeringmenzies.us/pdf/99ab.pdf · Software engineering is a human-intensive pro-cess. Such a sociological-intensive exercise can forbid exact

0

20

40

60

80

100

0 5 10 15 17

% e

xplic

able

Number of corrupted edges; max=17

XEDGE (U% unmeasured)

U=90U=80U=70U=60U=50U=40U=30U=20U=10U= 0

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0 5 10 15 17

runt

ime

(s)Ü

Number of corrupted edges; max=17

XEDGE (U% unmeasured)

U=90U=80U=70U=60U=50U=40U=30U=20U=10U= 0

Figure 6: Experiment #2. XEDGE linking: Corrupting 0 to 17 edges, running validation with lessand less data. Percentage explicable on top; runtimes below.

Elsewherewe have exploredsix othertemporallinking policies[Waugh,Menzies& Goss1997].Theabove two featuresarenotedin all thoseexperiments.Thatis, Supposition5 holdsfor at leastsevenothertemporallinking policiesthanXNODE.

3.6 SoftwareConstruction = Network Exploration

Supposition6 arguesthatwhenwe searchFigure3, we aredoingsomethingthat is importantto alargeclassof softwaresystems.Wetakethisview from HerbertSimon.Simondescribesdesignasthe”quintessentialhumanactivity” [Simon1969].Everythingwedoasagentsin theworld requires

17

Page 18: Some Empirical Automated Software Engineeringmenzies.us/pdf/99ab.pdf · Software engineering is a human-intensive pro-cess. Such a sociological-intensive exercise can forbid exact

designsolutionsto theproblemsthatfaceus.Hence,to Simon,studyingthedesignproblemwasamethodof studyingthegeneralproblemof artificial intelligence(AI).

Simon’s researchcharacterizeddesignandartificial intelligenceasa searchproblem. In thisframework, designis viewed asthe traversalof a spaceof possibilities,looking for pathwaystogoals.This spacecanbeexplicit or implicit:

Explicit: e.g.takenfrom a library of softwarecomponents.

Implicit: e.g. extractedfrom the designer’s mind asthey imaginedifferentpossible,but asyetunbuilt, softwaresystems.

Modernexpressionsof thisapproachof design-as-searchincludesmostof theAI searchlitera-ture[Pearl& Korf 1987]andtheSOAR knowledge-level (KL) search[Newell 1982,Newell 1993].In a KL search,intelligenceis modeledasa searchfor appropriateoperatorsthat convert somecurrentstateto a goalstate.Domain-specificknowledgeis usedto selecttheoperatorsaccordingto theprinciple of rationality; i.e. anintelligentagentwill selectanoperatorwhich its knowledgetells it will leadtheachievementof someof its goals.

This view as design-as-searchthrougha network is not just an AI concept. OO designersperformsucha searchasthey searchthroughtheir classlibrarieslooking for reusablecode.Moregenerally, any time a software designerpausesto considerthis option vs that option, they arementallyexploringanetworksof options.

3.7 Network Exploration = Abduction

Supposition7 notedthatwe usedabductionasour network explorationtool. We couldrejectourconclusionsif it couldbeshown thatabductive inferenceis relevantonly for a very smallclassofsoftwaretasks.

Our view of abductioncancanbesummarizedasfollows: seekislandsof consistency withina spaceof possiblycontradictoryideas. More precisely, make what inferencesyou canthat arerelevant to somegoal(Equation3), without causingany contradictions(Equation4). Whenmorethanonesolutionexiststo Equation3 andEquation4, thenusesomeBESToperatorto selectapre-ferredsetof solutions.Clearly, this procedureis moregeneralthanjustadescriptionof “inferenceto the bestexplanation”. Many taskshave an abductive mapping,asshown in Figure7. Hence,with onecaveat,we arguethattheuseof anabductive inferenceengineincreasesthegeneralityofour conclusions.Theonecaveatis asfollows. Abductionis anindeterminateinferenceprocedure.Standardsoftwarestrivesto bedeterminant.Theissueof indeterminacy is discussedbelow.

18

Page 19: Some Empirical Automated Software Engineeringmenzies.us/pdf/99ab.pdf · Software engineering is a human-intensive pro-cess. Such a sociological-intensive exercise can forbid exact

3.8 Implications of Indeterminacy

The above discussionarguedthat Suppositions1 through7 did not overly restrict the generalityof our conclusions. Only Supposition8 remains: that conclusionsdrawn from HT4’s indeter-minateinferencearegenerallyapplicable. Reviewing the tasksimplementedby abduction(seeFigure7), it is clearthat mostof thosetaskscomefrom the knowledgeengineeringcommunity.Knowledge-basedsystemsoftenuseheuristicsto simplify complex tasks.Heuristicscanintroduceindeterminacy into a theory. Heuristicsarejust guessesanddifferentguessescandriveprogramtoverydifferentconclusions.

Supposition8 threatensthe generalityof our conclusions.HT4 is a multiple world reasonerandconventionalsoftware is determinant.That is, softwareconstructedfrom standardsoftwareengineeringtechniquesonly ever generatesa singleassignmentsto variables.Conclusionsdrawnfrom HT4 will notapplyto standardsoftwaresystemsif its multiple-world reasoningconfusesthetaskof testingsoftware.

To explore this issue,we needmoredata. The averagenumberof worlds generatedby HT4in Experiment#1 is shown in Figure8. Notethelow total averagenumberof worlds(max=1.28).While HT4 cansupportmultipleworlds,theseexperimentsgeneratedveryfew. Thatis, theconclu-sionsmadeaboveregardingreusewerenotmadein awildly indeterminatedevice. Thisabsenceofwild indeterminacy persists,evenif wemakemajorchangesto theHT4-rig. For example,fisheriesis a sparselyconnectedtheory: 12 variablesconnectedby 17 edges(fanout=17/12=1.4).Perhapstheorieswith higherfanoutsgeneratedmoreexplanations?This hypothesiswastestedin Exper-iment # 3. That experimentaddedin 0, 5, 10, 15, 20, 25 or 30 new edgesat random,checkingall the time that theaddededgesdid not exist alreadyin the theory. This generatedtheorieswithfanoutsrangingfrom 1.4to (17+30/12=3.9).Thetheorywith thenew edgeswasthencopiedover5 time stepsandconnectedvia XNODE andexecutedin thesamemannerasexperiment#1. Theresultsareshown in Figure9. Note that the numberof worlds generatedis not muchmorethanbefore(1.58maximum).Hence,wecouldnotblamethelow numberof worldsseenin HT4 outputon restrictive fanouts.

19

Page 20: Some Empirical Automated Software Engineeringmenzies.us/pdf/99ab.pdf · Software engineering is a human-intensive pro-cess. Such a sociological-intensive exercise can forbid exact

å Abductive-validationasksthequestion“What is themaximumpercentageof known/desiredbehaviour thatcanbeexplainedby sometheory?”.This canbeimplementedasEquation5;i.e. usea BEST that favors explanationswith the largest intersectionto the output goalset[Menzies1996b, Menzies1995,Menzies& Compton1997]å Minimal-fault diagnosismeansusinga BEST that favors explanationswith the fewestpos-sible inputsandthe mostknown goals[Reggia, Nau & Wang1983] (this is only oneof arangeof abductive approachesto diagnosis:otherapproachesaredescribedin [Console&Torasso1991]).å Abductive-classificationusesa BEST that favors explanationswith the largestnumberofmost-specificconcepts[Poole1985].å Abductive-explanationusesaBESTthatfavorsexplanationswith thelargestnumberof con-ceptsthatarefamiliar to theuser[Leake 1993].å Abductive-planningusesa BESTthat favors explanationswith themostgoalsandthe leasttotal cost.å Abductive-monitoringmeanscachingtheexplanationsgeneratedbyabductive-planning,thendeletingany explanationswhoseassumptionscontradictany new data.å Abductive-predictionusesa BEST that favors explanationswith themostnumberof futureeventsthatinterestyou.å Abductive requirementsengineeringis discussedin [Zowghi, Ghose& Peppas1996,Men-zies,Easterbrook,Nuseibeh& Waugh1999]. Supposewe know who wroteeachportionofa theory. Conflictsbetweenuserscanbe detectedwhendifferentpeople’s ideasendup indifferentexplanations.Theseconflictscanthenbe negotiatedby focusingon the key con-flicting guessesthat split theexplanations.Disputesbeenfeudinguserscanbe adjudicatedasfollows: thewinning useroffers theorieswhich build explanationsthatcanexplain mostof thedesiredbehaviour.å Abductionis alsoapplicableto numerousotherareasincludinglegal reasoning[Kowalski&Toni 1996]; visual patternrecognition[Poole1990]; financialreasoning[Hamscher1990];diagrammaticreasoning[Menzies& Compton1994];machinelearning[Hirata 1994];case-basedreasoning[Leake 1993]; natural-languageprocessing[Ng & Mooney 1990]; andanalogicalreasoning[Falkenhainer1990]. For moreon the applicationsof abduction,see[Menzies1996a, Kakas,Kowalski& Toni 1998].

Figure7: Taskswith anabductivemapping. Adaptedfrom[Menzies1996a].

20

Page 21: Some Empirical Automated Software Engineeringmenzies.us/pdf/99ab.pdf · Software engineering is a human-intensive pro-cess. Such a sociological-intensive exercise can forbid exact

0.95

1

1.05

1.1

1.15

1.2

1.25

1.3

0 20 40 60 80 100

Wor

lds

(whe

n ed

ges

corr

upte

d)

Percentage unmeasured

0369

1215

Figure 8: Experiment #1: Average worlds seen as we corrupt 0,3,.. 15 edges from Figure 3 andchange the percentage of theory variables unmeasured in the test set (changing æ]ç ).

0.9

1

1.1

1.2

1.3

1.4

1.5

1.6

0 20 40 60 80 100

Wor

lds

(whe

n ed

ges

adde

d)

Percentage Unmeasured

05

1015202530

Figure 9: Experiment #3. Worlds generated edges with XNODE after adding 0,5,10,15,20,25,30edges.

21

Page 22: Some Empirical Automated Software Engineeringmenzies.us/pdf/99ab.pdf · Software engineering is a human-intensive pro-cess. Such a sociological-intensive exercise can forbid exact

Nor canwe blamethe missingworlds on the temporallinking policiesusedin theseexper-iments. Recall that of the 12 variablesin the fisheriestheory, only 2 of themmadecross-timeconnections.Perhapsmoreworldswouldbeobservedif moreconnectionsweremadeacrosstime.Experiment #4 testedthis hypothesis.A new temporallinking policy wasdefined.In theINODE(implicit node)linking policy, any variableè at time copy é caneffect thatvariableat time copyé�ê�ë ; i.e. for everyvariableweaddtherule:è;ìCí!î�ïCïð è;ìCí!î ï�ñ

Experiment#4 repeatsexperiments#1 and#3 usingINODE insteadof XNODE. The resultsareshown in Figure10. The extra time links did leadto moreexplanations,but not many more(maximum=5).

22

Page 23: Some Empirical Automated Software Engineeringmenzies.us/pdf/99ab.pdf · Software engineering is a human-intensive pro-cess. Such a sociological-intensive exercise can forbid exact

1

1.5

2

2.5

3

3.5

4

0 20 40 60 80 100

Wor

lds

(whe

n ed

ges

corr

upte

d)

Percentage unmeasured

0369

1215

1

1.5

2

2.5

3

3.5

4

4.5

5

0 20 40 60 80 100

Wor

lds

(whe

n ed

ges

adde

d)

Percentage Unmeasured

05

1015202530

Figure 10: Experiment #4. Effects of using INODE. Top: flipping the annotation on 0,3,6,9,12,15edges. Bottom: adding 0,5,10,15,20,25,30 edges.

23

Page 24: Some Empirical Automated Software Engineeringmenzies.us/pdf/99ab.pdf · Software engineering is a human-intensive pro-cess. Such a sociological-intensive exercise can forbid exact

SinceHT4 andHT4-likesystemsdonotgeneratewild indeterminacy, wecanendorsesupposi-tion 8. Still, we mustexplain theserepeatedfailuresto generatewild indeterminacy. We have twohypotheses:ò

[Menzies1996a] notedthat HT4 usesset-covering abduction; i.e. it builds worlds onlyfrom variablesthat connectinputs to goals. An alternative approachis consistency-basedabductionusedin (e.g.)theATMS [DeKleer1986]whichbuildsworldsfrom all variablesinthe theory(for moreon thesetwo typesof abduction,see[Console& Torasso1991]). Set-covering canusefewer variablesthanconsistency-basedapproaches.Hence,set-coveringsystemslikeHT4 maygeneratefewerworldsthanexpected.òSupposeour softwarecontainsa small numberof mastervariablesthat control the largernumberof slave variablesin therestof thesystem.In suchmaster-variablesystems,worldsneedonly begeneratedonly for conflictsin themastervariables;i.e. conflictsin the largernumberof slave variablesdo not matter. Note that if therearevery few mastervariables,thenwe needonly generatevery few worlds. Further, we canfind thoseworldsvery easily.Any randomlygeneratedpathway from goalsback to inputshave a high chanceof usingthe mastervariables.After generatinga small numberof suchrandompaths,an inferenceprocedurewould find the key assumptionswithin the mastervariables. Elsewhere,withCukic andSingh,Menzieshasofferedevidencesuchmaster-variablesystemsarecommon.This evidencecomesfrom theory [Menzieset al. 2000], experimentation[Menzieset al.1999],andanextensive literaturereview [Menzies& Cukic2000].

Theabsenceof wild indeterminacy in HT4-likesystemshassignificantimplicationsfor thecon-structionof environmentsthatsupportdebatesbetween(e.g.)multiple-stakeholderduringrequire-mentsengineering.RequirementengineeringresearcherssuchasEasterbrook[Easterbrook1991],Finkelstein [Finkelstein,Gabbay, Hunter, Kramer & Nuseibeh1994], and Nuseibeh[Nuseibeh1997] argue that we should routinely expect specificationsto reflect different and inconsistentviewpoints.A key processwithin deliberationandnegotiationis supportingarguments.Argumen-tation supportenvironmentscanbe swampedby a surplusof choices.20 yes-nochoicesmeansó¶ô%õ#ö ë|÷:ø|ø|ø!÷=ø�ø|ø designchoices.Elaboratemechanismshavebeenbuilt for implementingsuchar-guments;e.g.[Menzies1997b, Mylopoulos,Chung& Nixon 1992,Nuseibeh& Russo1999,Rich& Feldman1992,Boehm,Egyed,Port,Shah,Kwan& Madachy1998]. Theresultsof our exper-imentssuggestthat thesemechanismsmay be overly elaborate;e.g. whenwe searchacrossthatspaceof

ó(ô%õoptionsfromgoalsto inputs,wefind thatoursearchpathsneedonly afew worlds.HT0

is a very simplevariantof HT4 thatexploits this “a few worldsareenough”property[Menzies&Michael1999].HT0 assumesmaster-variablesystems.Thealgorithmgeneratesasmallnumberofrandomlyselectedworldsin sequence;i.e. ù õ , then ù ñ , then ù ô andsoon. HT0 terminateswhen

24

Page 25: Some Empirical Automated Software Engineeringmenzies.us/pdf/99ab.pdf · Software engineering is a human-intensive pro-cess. Such a sociological-intensive exercise can forbid exact

the cover of ù�î is not muchgreaterthanthe maximumcover found by worlds ù õ ÷�ù ñ�ú�ú�ú ùÍîÁû ñ .etc..Experimentswith HT0 show that:òMaximum cover plateausvery quickly. That is, HT0’s partial world searchcan terminatemuchfasterthanHT4’s rigorousexplorationof all worlds. Hence,HT0 is muchfasterthanHT4. HT4 exploresall worlds. In experimentswith oneimplementationof thatalgorithm,HT4 wasobserved to run in exponentialtime (HT4 crashedwhenrunningtheoriesbiggerthan1,000hornclauses).HT0, runningon thesameplatform,executedin polynomialtimeandhasterminatedonvery largetheories(up to 18,000hornclauses).òFor problemswhereHT4 terminates(i.e. theorieswith üÞëcø|ø�ø hornclauses),HT0 finds98%of themaximumcover foundby HT4. Thatis, HT0’spartialworld searchcoverednearlyasmuchof thetheoryatHT4’s rigorousexplorationof all worlds.

The above experiments,plus the HT0 experience,suggestthat exploring conflictswithin re-quirementsis easierthanwe might have thought. This possibility is exploredextensively else-where[Menzieset al. 1999].

4 Conclusion

Basedon four experimentsthat performed100,00sof manipulationson a theory, we madetwocounter-intuitivefindings:ò

Reuseis harderthenwehadthought.Oursupposedlyreusablecomponentswill notconvergeto similardesignsunlesswecanexplorethosecomponentswith testsuitesthatarelargerthanthoseseenin currentpractice.òExploring thespaceof contradictoryideasseenin (e.g.) requirementsengineeringis easierthanwehadthought.A smallnumberof randomprobesacrossthespaceof possibleconflictswill find mostof theworlds.

Weconjecturethattheseresultsarecounter-intuitivesinceour intuitionswerebasedonexperiencewith dozens,not 100,000sof systems. Our expectationsof software developmentis basedonour experience.Over the courseof a singleprofessional’s career, an individual may seeseveraldozensystems(at most). From thosefew dozenexamples,that individual forms perceptionsofthe core issuesand the generalrules of software engineering. In this paperwe have exploredmethodsof finding somegeneralrulesof softwareengineeringbasednot on dozens,but 100,00sof manipulationsof a theory.

A side-effect of the above analysisis the evolution of a generalmethodfor exploring issueswithin softwaredevelopment:

25

Page 26: Some Empirical Automated Software Engineeringmenzies.us/pdf/99ab.pdf · Software engineering is a human-intensive pro-cess. Such a sociological-intensive exercise can forbid exact

1. Collectrepresentative theories.

2. Definerepresentativemanipulations.

3. Executemany manipulationsover themany theories.

4. Summarizetheresults.

5. Considertheimplicationsof thosesummarizeson standardpractice.

Wethink thatthismethodis apowerful technique.For example,ourconclusionscanbedemon-stratedincorrectif, afterapplyingthesefive steps,otherresearchersgeneratevery differentsum-mariesto thosedescribedabove.

Acknowledgements

This paperhasbenefitedenormouslyfrom the commentsof several anonymousreferees.SimonGossprovidedinvaluablesupportin developingthismaterial.Aditya GhoseandSteveEasterbrookofferedinsightful commentsinto earlierdraftsof this paper. During this research,thefirst authorwassupportedby theAI DepartmentUniversityof NSW, andthenaNASA cooperativeagreement(#NCC2-979).

ReferencesAbts,C.,Clark,B.,Devnani-Chulani,S.,Horowitz, E.,Madachy,R.,Reifer, D., Selby, R.& Steece,B. [1998],Cocomo

ii modeldefinitionmanual,Technicalreport,Centerfor SoftwareEngineering,USC,.http://sunset.usc.edu/COCOMOII/cocomox.html#downloads .

Alexander, C. [1964],NotesonSynthesisof Form, HarvardUniversityPress.

Alexander, C., Ishikawa, S.,Silverstein,S., Jacobsen,I., Fiksdahl-King,I. & Angel, S. [1977], A PatternLanguage,Oxford UniversityPress.

Bieman,J. & Schultz,J. [1992], ‘An empirical evaluation(andspecification)of the all-du-pathstestingcriterion’,SoftwareEngineeringJournal7(1), 43–51.

Boehm,B., Egyed,A., Port, D., Shah,A., Kwan, J. & Madachy, R. [1998], ‘A stakeholderwin-win approachtosoftwareengineeringeducation’,Annalsof SoftwareEngineering6, 295–321.

Bossel,H. [1994],ModelingandSimulations, A.K. PetersLtd. ISBN 1-56881-033-4.

Buschmann,F., Meunier, R., Rohnert,H., Sommerlad,P. & Stal,M. [1996], A Systemof Patterns: Pattern-OrientedSoftwareArchitecture, JohnWiley & Sons.

26

Page 27: Some Empirical Automated Software Engineeringmenzies.us/pdf/99ab.pdf · Software engineering is a human-intensive pro-cess. Such a sociological-intensive exercise can forbid exact

Chulani,S.,Boehm,B. & Steece,B. [1999], ‘Bayesiananalysisof empiricalsoftwareengineeringcostmodels’,IEEETransactionon SoftwareEngineerining25(4).

Clarke,E., Emerson,E. & Sistla,A. [1986], ‘Automaticverificationof finite-stateconcurrentsystemsusingtemporallogic specifications’,ACM Transactionson ProgrammingLanguagesandSystems8(2), 244–263.

Console,L. & Torasso,P. [1991], ‘A Spectrumof Definitionsof Model-BasedDiagnosis’,ComputationalIntelligence7, 133–141.

DeKleer, J. [1986], ‘An Assumption-BasedTMS’, Artificial Intelligence28, 163–196.

Easterbrook,S. [1991],Elicitation of Requirementsfrom Multiple Perspectives,PhDthesis,ImperialCollegeof Sci-enceTechnologyandMedicine, University of London. Available from http://research.ivv.nasa.gov/˜steve/papers/index.html .

Eshghi,K. [1993],A TractableClassof AbductiveProblems,in ‘IJCAI ’93’, Vol. 1, pp.3–8.

Falkenhainer, B. [1990],AbductionasSimilarity-DrivenExplanation,in P. O’Rourke,ed.,‘WorkingNotesof the1990SpringSymposiumon AutomatedAbduction’,pp.135–139.

Finkelstein, A., Gabbay, D., Hunter, A., Kramer, J. & Nuseibeh,B. [1994], ‘Inconsistency handling in multi-perspectivespecification’,IEEETransactionson SoftwareEngineering20(8), 569–578.

Frakes,W. [1999], Domainengineeringeducation,in ‘Proceedingsof WISR9: The 9th annualworkshopon Insti-tutionalizing SoftwareReuse’. Available from http://www/umcs.maine.edu/˜ftp/wis r/wis r9/final- papers/Frakes.html .

Gruber, T. [1993], ‘A translationapproachto portableontologyspecifications’,KnowledgeAcquisition5(2), 199–220.

Guerrieri,E. [1999], Reusesuccess- whenandhow?, in ‘Proceedingsof WISR9: The 9th annualworkshopon In-stitutionalizingSoftwareReuse’.Availablefrom http://www/umcs.maine.edu/˜ftp/wis r/wis r9/final- papers/Guerrieri.html .

Hamscher, W. [1990], ExplainingUnexpectedFinancialResults,in P. O’Rourke, ed., ‘AAAI SpringSymposiumonAutomatedAbduction’,pp.96–100.

Harrold,M., Jones,J. & Rothermel,G. [1998], ‘Empirical studiesof controldependencegraphsizefor c programs’,Empirical SoftwareEngineering3, 203–211.

Hirata,K. [1994],A Classificationof Abduction:Abductionfor Logic Programming,in ‘Proceedingsof theFourteenthInternationalMachineLearningWorkshop,ML-14’, p. 16. Also in MachineIntelligence14 (to appear).

Holzmann,G. [1997], ‘The modelcheckerSPIN’, IEEE Transactionson SoftwareEngineering23(5), 279–295.

Iwasaki,Y. [1989], Qualitative physics,in P. C. A. Barr & E. Feigenbaum,eds,‘The Handbookof Artificial Intelli-gence’,Vol. 4, AddisonWesley, pp.323–413.

Kakas,A., Kowalski, R. & Toni, F. [1998], The role of abductionin logic programming,in C. H. D.M. Gabbay&J. Robinson,eds,‘Handbookof Logic in Artificial IntelligenceandLogic Programming5’, Oxford UniversityPress,pp.235–324.

Kowalski,R. & Toni, F. [1996], ‘Abstractargumentation’,Artificial IntelligenceandLawJournal4(3-4),275–296.

Leake,D. [1993],FocusingConstructionandSelectionof AbductiveHypotheses,in ‘IJCAI ’93’, pp.24–29.

27

Page 28: Some Empirical Automated Software Engineeringmenzies.us/pdf/99ab.pdf · Software engineering is a human-intensive pro-cess. Such a sociological-intensive exercise can forbid exact

Lim, W. [1994], ‘Effectsof reuseonquality, productivity, andeconomics’,IEEE Softwarepp.23–30.

Menzies,T. [1995], Principlesfor GeneralisedTestingof KnowledgeBases,PhD thesis,University of New SouthWales.Avaliablefrom http://www.cse.unsw.edu.au/˜timm/pub/ docs/ 95the sis.ps .gz .

Menzies, T. [1996a], ‘Applications of abduction: Knowledge level modeling’, International Journal of HumanComputerStudies45, 305–355.Availablefrom http://www.cse.unsw.edu.au/˜timm/p ub/do cs/96abkl1.ps.gz .

Menzies,T. [1996b], On thepracticalityof abductivevalidation,in ‘ECAI ’96’. Availablefrom http://www.cse.unsw.edu.au/˜timm/pub/docs/96abvalid.p s.gz .

Menzies,T. [1997a], ‘OO patterns:Lessonsfrom expert systems’,Software Practice& Experience27(12), 1457–1478.Availablefrom http://www.cse.unsw.edu.au/˜timm/ pub/do cs/97 probs patt.p s.gz .

Menzies,T. [1997b], Qualitative causaldiagramsfor requirementsengineering,in ‘The SecondAustralianWorkshopon RequirementsEngineering(AWRE’97)’.

Menzies,T. [1998], ‘Towardssituatedknowledgeacquisition’, International Journal of Human-ComputerStudies49, 867–893.Availablefrom http://www.cse.unsw.EDU.AU/˜timm/p ub/do cs/98 ijhcs .

Menzies,T. & Compton,P. [1994],A PreciseSemanticsfor VagueDiagrams, World Scientific,pp.149–156.Availablefrom http://www.cse.unsw.edu.au/˜timm/ pub/do cs/ai 94.ps .gz .

Menzies,T. & Compton,P. [1997],‘Applicationsof abduction:Hypothesistestingof neuroendocrinologicalqualitativecompartmentalmodels’,Artificial Intelligencein Medicine10, 145–175.Availablefrom http://www.cse.unsw.edu.au/˜timm/pub/docs/96aim.ps.gz .

Menzies,T. & Cukic,B. [2000], ‘Adequacy of limited testingfor knowledgebasedsystems’,InternationalJournalonArtificial IntelligenceTools(IJAIT) . (to appear).

Menzies,T., Cukic,B., Singh,H. & Powell, J. [2000], ‘Testingindeterminatesystems’.ISSRE2000(to appear).

Menzies,T., Easterbrook,S., Nuseibeh,B. & Waugh,S. [1999], An empirical investigationof multiple viewpointreasoningin requirementsengineering,in ‘RE ’99’. Availablefrom http://research.ivv.nasa.gov/docs/techreports/1999/NASA- IVV- 99- 009.pd f .

Menzies, T. & Michael, C. [1999], Fewer slices of pie: Optimising mutation testing via abduction, in ‘SEKE’99, June17-19, Kaiserslautern,Germany. Available from http://research.ivv.nasa.gov/docs/techreports/1999/NASA- IVV- 99- 007.pdf ’.

Menzies,T. & Waugh,S. [1998], On the practicalityof viewpoint-basedrequirementsengineering,in ‘Proceedings,Pacific Rim Conferenceon Artificial Intelligence,Singapore’,Springer-Verlag.

Mylopoulos, J., Chung,L. & Nixon, B. [1992], ‘Representingand using nonfunctionalrequirements:A process-orientedapproach’,IEEETransactionsof SoftwareEngineering18(6), 483–497.

Newell, A. [1982], ‘The KnowledgeLevel’, Artificial Intelligence18, 87–127.

Newell, A. [1993], ‘Reflectionson theKnowledgeLevel’, Artificial Intelligence59, 31–38.

Ng, H. & Mooney, R. [1990], The Role of Coherencein ConstructingandEvaluatingAbductive Explanations,in‘Working Notesof the1990SpringSymposiumon AutomatedAbduction’,Vol. TR 90-32,pp.13–17.

28

Page 29: Some Empirical Automated Software Engineeringmenzies.us/pdf/99ab.pdf · Software engineering is a human-intensive pro-cess. Such a sociological-intensive exercise can forbid exact

Nuseibeh,B. [1997],To beandnot to be:Onmanaginginconsistency in softwaredevelopment,in ‘Proceedingsof 8thInternationalWorkshopon SoftwareSpecificationandDesign(IWSSD-8)’, IEEECSPress.,pp.164–169.

Nuseibeh,B. & Russo,A. [1999],Usingabductionto evolveinconsistentrequirementsspecifications,in ‘Proceedingsof ICSE-99Workshopon SoftwareChangeand Evolution (SCE’99), LA, California, USA’. Available fromftp://dse.doc.ic.ac.uk/dse- papers/viewp oints /nuse ibeh.s ce99. pdf .

O’Rourke,P. [1990],Working notesof the1990springsymposiumon automatedabduction,TechnicalReport90-32,Universityof California,Irvine, CA. September27,1990.

Pearl,J.& Korf, R. [1987], ‘Searchtechniques’,Ann.Rev. Comput.Sci.19872, 451–67.

Perry, D. [1999], Someholesin theemperor’s reusedclothes,in ‘Proceedingsof WISR9: The9th annualworkshopon InstitutionalizingSoftware Reuse’. Available from http://www/umcs.maine.edu/˜ftp/wi sr/wisr9/final- papers/Perry.html .

Poole,D. [1985],On theComparisonof Theories:PreferringtheMost SpecificExplanation,in ‘IJCAI ’85’, pp.144–147.

Poole,D. [1990], ‘A Methodologyfor Using a Default andAbductive ReasoningSystem’,InternationalJournal ofIntelligentSystems5, 521–548.

Reggia,J.,Nau,D. & Wang,P. [1983], ‘DiagnosticExpertSystemsBasedon a SetCoveringModel’, Int. J. of Man-MachineStudies19(5), 437–460.

Rich, C. & Feldman,Y. [1992], ‘Seven layersof knowledgerepreseentationand reasoningin supportof softwaredevelopment’,IEEE Transactionson SoftwareEngineering18(6), 451–469.

Sahlin, D. [1991], An Automatic Partial Evaluator for Full Prolog, PhD thesis, The Royal Institute ofTechnology (KTH), Stockholm, Sweden. Available from file://sics.se/pub/isl/papers/dan- sahlin- thesis.ps.gz .

Schneider, F., Easterbrook,S., Callahan,J., Holzmann,G., Reinholtz,W., Ko, A. & Shahabuddin,M. [1998], Vali-datingrequirementsfor fault tolerantsystemsusingmodelchecking,in ‘3rd IEEE InternationalConferenceOnRequirementsEngineering’.

Schreiber, A. T., Wielinga,B., Akkermans,J. M., Velde,W. V. D. & deHoog,R. [1994], ‘Commonkads.a compre-hensivemethodologyfor kbsdevelopment’,IEEEExpert9(6), 28–37.

Simon,H. [1969],TheScienceof theArtificial, MIT Press.

Smythe,G. [1989], ‘Brain-hypothalmus,PituitaryandtheEndocrinePancreas’,TheEndocrinePancreas.

Waugh,S., Menzies,T. & Goss,S. [1997], Evaluatinga qualitative reasoner, in A. Sattar, ed., ‘AdvancedTopicsinArtificial Intelligence:10thAustralianJointConferenceon AI’, Springer-Verlag.

Weyuker, E. [1993],‘More experiencewith dataflow testing’,IEEETransactionsonSoftwareEngineering19(9),912–919.

Zowghi, D., Ghose,A. & Peppas,P. [1996],A framework for reasoningaboutrequirementsevolution, in ‘Proceedingsof the4thPacificRim InternationalConferenceonArtificial Intelligence(PRICAI96),Cairns,Australia,August’.

29