working set analytics 6/15/20 -- draft v2...

50
WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate School, Monterey, CA Abstract The working set model for program behavior was invented in 1965. It has stood the test of time in virtual memory management for over fifty years. It is considered the ideal for managing memory in operating systems and caches. Its superior performance was based on the principle of locality, which was discovered at the same time; locality is the observed tendency of programs to use distinct subsets of their pages over extended periods of time. This tutorial traces the development of working set theory from its origins to the present day. We will discuss the principle of locality and its experimental verification. We will show why working set memory management resists thrashing and generates near-optimal system throughput. We will present the powerful, linear-time algorithms for computing working set statistics and applying them to the design of memory systems. We will debunk several myths about locality and the performance of memory systems. We will conclude with a discussion of the application of the working set model in parallel systems, modern shared CPU caches, network edge caches, and inventory and logistics management. Keywords: Working set, working set model, program behavior, virtual memory, cache, paging policy, locality, locality principle, thrashing, multiprogramming, memory management, optimal paging

Upload: others

Post on 16-Aug-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

WORKINGSETANALYTICS6/15/20--DRAFTv2PeterJ.DenningNavalPostgraduateSchool,Monterey,CA

Abstract

Theworkingsetmodelforprogrambehaviorwasinventedin1965.Ithasstoodthetestoftimeinvirtualmemorymanagementforoverfiftyyears.Itisconsideredtheidealformanagingmemoryinoperatingsystemsandcaches.Itssuperiorperformancewasbasedontheprincipleoflocality,whichwasdiscoveredatthesametime;localityistheobservedtendencyofprogramstousedistinctsubsetsoftheirpagesoverextendedperiodsoftime.Thistutorialtracesthedevelopmentofworkingsettheoryfromitsoriginstothepresentday.Wewilldiscusstheprincipleoflocalityanditsexperimentalverification.Wewillshowwhyworkingsetmemorymanagementresiststhrashingandgeneratesnear-optimalsystemthroughput.Wewillpresentthepowerful,linear-timealgorithmsforcomputingworkingsetstatisticsandapplyingthemtothedesignofmemorysystems.Wewilldebunkseveralmythsaboutlocalityandtheperformanceofmemorysystems.Wewillconcludewithadiscussionoftheapplicationoftheworkingsetmodelinparallelsystems,modernsharedCPUcaches,networkedgecaches,andinventoryandlogisticsmanagement.

Keywords:Workingset,workingsetmodel,programbehavior,virtualmemory,cache,pagingpolicy,locality,localityprinciple,thrashing,multiprogramming,memorymanagement,optimalpaging

Page 2: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 2

VirtualmemorymadeitspublicdebutattheUniversityofManchesterin1962.Itwashailedasabreakthroughinautomaticmemorymanagementandenjoyedanenthusiasticreception.Bythemid-1960s,however,operatingsystemengineershadbecomeskepticalofvirtualmemory.Itsperformancewasuntrustworthyanditwaspronetounpredictablethrashing.In1965,IjoinedProjectMACatMIT,whichwasdevelopingMultics.ThedesignersofMulticsdidnotwanttheirvirtualmemorytosuccumbtotheseproblems.MyPhDresearchprojectwastofindanewapproachtomanagingvirtualmemorythatwouldmakeitsperformancetrustworthy,reliable,andnear-optimal.Thenewapproachhadtwoparts.ThefirstwastheWorkingSetModel,whichwasbasedontheideaofmeasuringtheintrinsicmemorydemandsofindividualprograms;workingsetsthendeterminedhowmanymainmemoryslotswereneededandwhatpagesshouldbeloadedintothem.ThesecondpartwasthePrincipleofLocality,whichwasthestrongtendencyofexecutingprogramstoconfinetheirreferencestolimitedlocalitysetsoverextendedphases.Afterextensiveexperimentalverification,thelocalityprinciplewasacceptedasauniversallawofcomputing.Thelocalityprinciplemadeitpossibletoprovethatworking-setmemorymanagementisnear-optimalandimmunetothrashing.ThesediscoveriesenabledthesuccessofvirtualmemoryonMulticsandoncommercialoperatingsystems.

Sincethe1970s,everyoperatingsystemtextbookhasdiscussedtheworkingsetmodel.Further,everymodernoperatingsystemusestheworkingsetmodelasanidealtounderpinitsapproachtomemorymanagement[den16].Thiscanbeseen,forexample,bylookingattheprocessactivitycontrolpanelsinWindows,whereyouwillsee“workingset”mentionedinthememoryusecolumn.AsearchoftheUSpatentdatabaseshowsover12,000patentsthatbasetheirclaimsontheworkingsetorlocalitytheories.

Inthistutorial,Iwilltracehowallthiscametobeandwillshowyouthepowerfulalgorithmswedevelopedtocomputeworkingsetstatisticsandusethemtodesignmemorysystems.

TheGrowingPainsofVirtualMemoryIn1959TomKilburnandhisteam,whowerebuildingtheAtlascomputerand

operatingsystemattheUniversityofManchester,UK,inventedthefirstvirtualmemory[fot61].Theyputitintooperationin1962[kil62].Virtualmemorymanagedthecontentofthesmallmainmemorybyautomaticallytransferringpagesbetweenitandthesecondarymemory.Kilburnargueditwouldimproveprogrammerproductivitybyautomatingthelabor-intensiveworkofplanningmemorytransfers.Itwasimmediatelyseenbymanyasaningenioussolutiontothecomplexproblemsofmanaginginformationflowsinthememoryhierarchy.

TheManchesterdesignersintroducedfourinnovationsthatsoonwereadoptedasstandardsincomputerarchitectureandoperatingsystemsfromthentothepresentday.Onewasthepage,afixedsizedunitofstorageandtransfer.Programs

Page 3: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 3

anddataweredividedintopagesandstoredinmainmemoryslotscalledpageframeswithcopiesonthesecondarymemory.Asecondinnovationwasthedistinctionbetweenaddresses(namesofvalues)andlocations(memoryslotsholdingvalues).Theaddressspacewasalargelinearsequenceofaddressesandthemainmemory(RAM)apoolofpageframes,eachofwhichcouldholdanypagefromtheaddressspace.AstheCPUgeneratedaddressesintoitsaddressspace,ahardwarememorymappingunittranslatedaddressestolocations,usingapagetablethatassociatedpageswithpageframes.Athirdinnovationwasthepagefault,aninterrupttriggeredintheoperatingsystemwhenanexecutingprogrampresentedthemappingunitwithanaddresswhosepagewasnotinmainmemory;theoperatingsystemlocatedthemissingpageinsecondarymemory,chosealoadedpagetoevict,andtransferredthemissingpageintothevacatedpageframe.Afourthinnovationwasthepagereplacementpolicy,thealgorithmthatchoseswhichpagemustbeevictedfrommainmemoryandreturnedtosecondarymemorytomakewayforanincomingpage.Themissratefunction–fractionofreferencesthatproduceapagefault–wasthekeyperformancemeasureforvirtualmemories.

Performanceofoperatingsystemshasalwaysbeenabigdeal.Tobeacceptedintotheoperatingsystem,avirtualmemorysystemhadtobeefficient.Thetwopotentialsourcesofinefficiencywereinaddressmappingandpagetransfers.TheycanbecalledtheAddressingProblemandtheReplacementProblem.

TheAddressingProblemyieldedquicklytoefficientsolutions.Atranslationlookasidebuffer,whichwasasmallcacheinthememorymappingunit,limitedtheslowdownfromaddressmappingto3%ofthenormalRAMaccesstime.Thiswasanacceptablecostforallthebenefits.1Virtualmemorydidindeeddoubleortripleprogrammerproductivity.Italsoeliminatedtheannoyingproblemthathand-craftedtransferscheduleshadtoberedoneforeachdifferentsizeofmainmemory.Evenmore,virtualmemoryenabledthepartitioningofmainmemoryintodisjointsets,oneforeachaddressspace,supportingaprimalobjectivetopreventexecutingjobsfrominterferingwithoneanother.Thiswasallgoodnewstothedesignersandusersofearlyoperatingsystems.

TheReplacementProblemwasmuchmoredifficultandcontroversial.Pagetransferswereverycostlybecausethespeedgapbetweenmainandsecondarymemorywas10,000ormore.Itwascriticaltofindreplacementpoliciesthatminimizedpagetransfers.Earlytestsbroughtgoodnews:thepagetransferschedulesgeneratedautomaticallybythevirtualmemoryusuallygeneratedfewerpagemovesthanhand-craftedschedules[say69].Becausethemainmemorywasveryexpensiveandsmall,2eventhecleverestprogramswerelikelytogeneratealotofpagefaults.Despitethegreatpressureonthemtofindgoodreplacementpolicies,

1Theterm“job”isusedthroughoutthistutorialtomeananyofprocess,thread,orexecutingprogram.Itdenotesacomputationaltaskdoingworkforauserorthesystem.2Mainmemorywasveryexpensive,atleast$0.25abyte;todayagigabyte(GB)costs$5.00,abouttwomillionthsofthat.Eventhoughmemoryaccesstimeshavebecomemuchsmaller,thespeedgapbetweenthemainmemory(RAM)andsecondarymemory(usuallyDISK)hasrisenfrom104in1960to106today.Pagefaultshaveneverbeencheap.

Page 4: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 4

designersofvirtualmemoriesfoundnoclearwinners;by1965therewasconsiderableheateddebateandconflictingdataaboutwhichreplacementpolicywouldbethebest.Manyengineersbegantoharbordoubtsaboutwhethervirtualmemorycouldbecountedontogivegoodperformancetoeveryoneusingacomputersystem[ran68].

TheAtlasdesigners,keenlyawareofthehighcostofpaging,inventedaningeniousreplacementpolicytheycalledthe“learningalgorithm”.Itassumedthattypicalpagesspentthemajorityoftheirtimebeingvisitedinwell-definedloops.Itmeasuredtheperiodofeachpage’sloopandchoseforreplacementthepagenotexpectedtobeaccessedagainforthelongesttimeintothefuture.ThelearningalgorithmemployedtheoptimalityprinciplethatIBM’sLesBeladymadepreciseinhisMINpagingalgorithmin1966[bel66].Unfortunately,thelearningalgorithmdidnotworkwellforprogramsthatdidnothavetight,well-definedloops;itdidnotfindfavoramongtheengineerssearchingforrobustpagingalgorithms.

Theinnovationofmultiprogramminginthemid1960saddedtotheconsternationaboutvirtualmemoryperformance.Withmultiprogramming,severaljobscouldbeloadedsimultaneouslyintoseparatepartitionsofthemainmemory,yieldingsignificantimprovementsofCPUefficiencyandsystemthroughput.Butmultiprogrammedvirtualmemoriesbroughtahostofnewproblems.Howmanyprogramsshouldbeloaded?Howmuchmemoryspaceshouldeachprogramget?Shouldspaceallocationsbeallowedtovary?Intheprocessoftryingtoanswerthesequestions,engineersdiscoveredthatvirtualmemorysystemswerepronetoanew,unexpected,andveryseriousproblem:thrashing.

ThrashingisthesuddencollapseofCPUefficiencyandsystemthroughputwhentoomanyprogramsareloadedintomainmemoryatonce.Itisacatastrophicinstabilitytriggeredwhenthepagingpolicystealspagesfromotherprogramstosatisfypagefaults.Thiscondition,originallycalled“pagingtodeath”,couldbetriggeredbyaddingjustonemoreprogramtothemainmemory.Thetriggerthresholdwasunpredictable.Whowouldpurchaseamillion-dollarcomputersystemwhoseperformancecouldsuddenlycollapseatrandomandunpredictabletimes?

Thus,by1965,operatingsystemsdesignerswerefacinganenormouschallenge.Couldtheydesignpoliciesformanagingvirtualmemorythatminimizedpagefaultsanddidnotthrash?

Thegoodnewsisaffirmative:theworkingsetpolicybecameaclassicidealformanagingmemory[den16].Allthemajormodernoperatingsystems–todayincludingWindows,MacOS,andLinux–usedmemorymanagementpoliciesinspiredbytheworkingsetmodel.

IntotheStormIjoinedMITProjectMACatthestartoftheMulticsprojectin1965.Multics

plannedtohaveamultiprogrammedvirtualmemory.Thedesignersweredeeply

Page 5: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 5

worriedthattheycouldwinduplikesomeofthecommercialoperatingsystemsthatwereadoptingmultiprogrammedvirtualmemory–hobbledbyexcessivepagingandsusceptibletothrashing.IbecamefascinatedwiththesequestionsandtookthemonformyPhDwork.JerrySaltzerposedtheresearchquestioninaniceway:consideringMulticsasablackbox,canyoudesignanautomaticcontrolmechanismforthevirtualmemorywithasingle,tunable,optimizingparameter?[den80]

Isetouttofindanswersfortheskepticswhoseriouslydoubtedthatvirtualmemorycouldbestabilized.Theirprimaryconcernwasthatnooneofavarietyofpagereplacementpoliciesworkedconsistentlywell.Theyalsoknewthatnoreplacementpolicyworkswellforsomecommonproblems.Forexample,matrixmultiplication,amainstayofgraphicsandneuralnetworks,isveryfastifthematricesareallinmemory.Butiftheyarestoredwithrowsorcolumnsonseparatepagesandmemorycannotholdthemall,matrixmultiplicationgeneratesenormousamountsofpagingandgrindstoanearhalt.

In1966,LesBeladyofIBMWatsonResearchLabpublishedafamousexperimentalstudycomparingalargenumberofreplacementalgorithms[bel66].Oneofhisfindingswasthatreplacementpoliciesrelyingonusebitstodecidewhichpagestokeepinmainmemoryperformedbetterthanothers.3Hetookthisasevidenceofa“localityproperty”–processestendtoreusepagesmostrecentlyusedinthepast.Inlate1965Ihadindependentlyreachedasimilarconclusion,whichIproposedtoharnesswitha“workingset”,definedasthepagesusedduringabackward-lookingsamplingwindowofvirtualtime.4Theworkingsetwouldseethemostrecentlyusedpagesandtheoperatingsystemwouldprotectthemfrombeingpagedout[den68a,den72].Theoperatingsystemcouldpreventthrashingbyneverallowingapagefaulttostealapagefromanotherworkingset[den68b].BeladyandIbeganacollaborationin1967,inwhichwepostulatedthatallprogramsobeya“localityprinciple”thatcouldbeusefullyexploitedbypagingalgorithms.

PageReferenceMapsPagereferencemapswereausefultoolinearlyvirtualmemoryresearch.Amap

showstimeonthex-axisandpagesonthey-axis.Eachpointofthetimeaxisrepresentsasampleintervalandaboveitisacolumnofdarkenedpixelsmarkingthepagesareusedinthatsampleinterval.(AnexampleappearsinFigure1.)ThesamplingintervalisTtimeunits,whereonetimeunitisasinglepagereference.Thepagesusedinasamplingintervalarethelocalitysetofthatinterval.Aphaseisasequenceofsamplingintervalsoverwhichthelocalitysetisunchanged.Thesemapsclearlyshowedthatjobsaccessedsmallsubsetsoftheirtotaladdressspaces

3Ausebitisahardwarebitassociatedwitheachpageframe.Whenthepageisaccessed(readorwritten)theusebitissetto1bytheaddressinghardware.Theoperatingsystemcanscanforusedpagesandresetthebitsto0.Unusedpagesareconsideredinactiveandarefirsttoberemovedfrommainmemorywhenspacemustbefreedup.4Virtualtimeisdiscretetimemeasuredasnumberofmemoryaccesses;itisnotinterruptedbyexternaleventssuchaspagefaultsorotherinput-output.

Page 6: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 6

forextendedperiods.Noprogramwithrandomreferencemapwaseverobserved.ThesemapswerestrongevidenceofauniversalLocalityPrincipleexhibitedbyallexecutingprograms.

WhenheundertookhisresearchonpaginginLinuxaround2009,ahalfcenturyaftertheinventionofvirtualmemory,AdrianMcMenaminencounteredalotofskepticismamongLinuxsystemprogrammersaboutlocality[mcm11].Theyregardeditasanobsoleteideafromtheearlydaysofvirtualmemory,nolongerrelevantbecausemoderncomputershavesomuchmorememory.Tothecontrary,McMenaminfoundoutthatLinuxprogramsdisplaylocality–anditisevenmorepronouncedthaninearlyvirtualmemories.Whywouldtheremovalofmemoryconstraintsleadtomorepronouncedlocality?Thereasonappearstobethatunconstrainedprogrammersbuiltmoremodularprograms:thephasesareintervalsofaparticularmodule’suse.

McMenamindemonstratedlocalitybyrecordingthepagereferencemapsfromasignificantsampleofLinuxprograms.Everyprogramhadclearlyidentifiablelocalitysetsandphases–auniquelocality-phase“signature”.5Heconcludedthattheskepticismwasmisplacedandthatconsiderableperformancebenefitswillcometosystemsthatexploitthelocalitybehavior.Pagereferencemapsremainapowerfultoolforvisualizinglocalityandtheoperationofmemorymanagementpolicies.

5Theonlyknownexceptionisadatascanner,aprogramthatexamineseachiteminadatasequencejustonceandthendiscardsit.Thebestreplacementpolicyinthiscaseevictsapagejustafteritisused.

Page 7: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 7

Figure1.ThisisapagereferencemapoftheFirefoxWebbrowserinaLinuxsystem.Thehorizontalaxisrepresentsvirtualtime,dividedintoequalsamplewindowsofabout380Kreferences,andtheverticalrepresentsvirtualaddressesofpages.Acoloredpixelindicatesthatthepagewasreferencedduringtheassociatedsamplewindow;awhitepixelindicatesnoreference.Theverticalgridlinesarespaced200sampleintervalsapartandthehorizontalgridlinesarespaced50pagesapart.Themaprevealsthelocalitysetsoftheprogramandshowsdramaticallythatlocalitysetsarestableoverextendedperiods(phases),punctuatedbysharpshiftstootherlocalitysets.Forover99percentofthetimeinthismap,thepagesseeninasampleintervalareanearperfectpredictorforthepagesusedinthenextsampleinterval.Mostexecutingprogramshavestrikinglocalitymapslikethisone.Eachhasitsownuniquelocalitybehavior,likeadigitalsignature.Thereisnorandomnessinthewayprogramsusetheircodeanddata.(Source:AdrianMcMenamin[mcm11],CreativeCommonslicense.)

Page 8: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 8

OriginsofLocalityandWorkingSetsTheideasoflocalityandworkingsets(WS)areseparateanddistinctbutare

intimatelyconnected.Localityisaboutthepatternsofprogramsreferencingtheirpages.Workingsetisaboutdetectingthosepatternsinrealtimeandusingthemtomakememorymanagementdecisions.

LesBelady’sfamousstudyofpagingalgorithmsin1966demonstratedthatLRU(leastrecentlyused)replacementconsistentlyproducedfewerpagefaultsthanFIFO(firstinfirstout)replacement[bel66].Beladyreasonedthat,ifprogramslocalizetheirreferencesintosubsetsoftheirpages,themostrecentlyusedpagesweremostlikelytobereusedintheimmediatefuture–andthustheleastrecentlyusedpageswerethebestchoicesforreplacement.Beladyalsonotedthataweakerformoflocality–nonuniformuseofpages–explainedwhythefaultratefunctionsofLRUandFIFOwerenotlinear.

In1966therewasaconsensusamongoperatingsystemengineersthatlocalitywasanobservedtendencyforprogramstoreusetheirpagesintheimmediatefuture.Isawaconnectionbetweenthisinformalideaoflocalityandanoldprogrammingconcept.Programmersusedtheterm“workingset”tomeanthepagesthatneededtobeloadedinmainmemorysothataprogramwouldexecuteefficiently.Itwasuptotheprogrammertodeclaretheworkingsetsanddesignascheduleofpagetransferstoensurethatworkingsetswereloadedinmainmemory.Itseemedtomethattheoperatingsystemcoulddetectworkingsetsbymonitoringusebits,enablingittoautomaticallyloadworkingsetsinmemorywithouthavingtoaskprogrammerstodeclarethem.Pageswhoseusebitsweresetduringawindowoffixedsizewouldestimatetheprogram’sworkingset[den68a].6Thusthetwoideas,localityandworkingset,becameapowerfulpartnershipforallocatingmemory.

Let’sexaminelocalitymoreclosely.Frompagereferencemaps,earlyvirtualmemoryresearcherssawthatprogramsusedonlyasmallsubsetoftheirpagesatanygiventime.Thepagesthatwereusedtogetherwereseenas“spatiallyclustered”becausetheuseofoneimpliedthatanotherwouldbeusedsoon.Forexample,thepagesofalooporthepagesofacodemodulearespatiallyclustered.Spatialclusteringimpliedtemporalclustering.Thesetwotermsbecamefavoritewaysofexplainingwhylocalitywasacharacteristicofprogramexecution.

Inthedecadeafter1968,mystudentsandI,alongwithotherresearchers,studiedhowlocalityismanifestedinactualprograms.Westudiedandmeasuredmanyprograms,leadingustointroducetheterms“localitysets”,“phases”,and“transitions”.Thesetermscapturedtherecurringstructureoflocality–periodsofstabilitypunctuatedbyabrupttransitions.

6AlthoughWSandLRUfavormost-recently-usedpages,theyarenotthesame.LRUoperateswithinafixedmemoryspacebutdoesnotadvisetheoperatingsystemwhatsizeitshouldbe.WSmeasuresthelocalitysetandadvisestheoperatingsystemtoallocatejusttherightamountofmemoryrequiredtoholdit.TheWScontentsarethemostrecentlyusedpagesinthewindow,buttherethesimilaritywithLRUends.

Page 9: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 9

Weconcludedthattemporalclusteringisaratherpoordescriptionofthepunctuatedstabilityobservedinexecutingprograms.Temporalclusteringexemplifies“slowdrift”butnot“suddenchange”.Toconfirmthis,webuiltandstudiedmathematicalmodelsoflocality.MystudentJeffSpirntestedthe“slowdrift”hypothesisbyformulatingaseriesofmathematicalmodelsofslowdriftbehaviorandthenexamininghowwelleachmodelpredictedtheobservedLRUmiss-ratefunctionofaprogram[spi72].Spirnfoundthatsimplemodelssuchasindependentreferencemodel(eachpageisreferencedwithafixedprobability)andindependentstackdistancemodel(eachLRUstackdistanceoccurswithafixedprobability)ledtopoorpredictionsofLRUmissrate.

MystudentKevinKahnmodeledthepunctuatedstabilitybehaviordirectly[kah76].Inhismodels,stateswerelocalitysetsandphaseswereholdingtimesinthestates.Phase-transitionmodelsparameterizedfrommeasurementsofpagereferencemapsyieldedexcellentagreementbetweenpredictedandactualLRUmissrates.Moreover,becauseitadjustedtolocalitysets,aWSpolicygeneratedlesspagingthanLRUwithoutusingmorememory.

WayneMadisonandAlanBatsonconfirmedthatthesekeyaspectsoflocality–localitysets,phases,andtransitions–existatthesourcecodelevel[mad76].Theyconcludedthatlocalityisnotanartifactofthewaythatcompilerslayoutdataandcodeblocksonpagesofaddressspace.Designtechniquessuchasloopiteration,divide-and-conquer,andmodularityleadtosubsetsofpagesbeingusedforextendedperiods.Thelocalityseeninpagereferencesistheimageofthehigher-levellocalitycreatedasprogrammersdesigntheiralgorithms.

Programmerswhounderstoodthatvirtualmemoryperformsbetterwithprogramsofgoodlocalityeasilydesignedprogramsthatranwellinvirtualmemory[Say69].Everyprogramweeversawexhibitedlocality.Noprogramuseditspagesrandomly.Bythemid1970s,wehadsettledonthedefinitionoflocalityintheaccompanyingbox[denn80].

TheLocalityPrincipleExecutingprocessesreferencetheirmemoryobjectswithpunctuatedstabilitydescribedby:

(𝐿!, 𝐻!), (𝐿", 𝐻"), … , (𝐿# , 𝐻#), …

whereLiisalocalitysetandHiistheholdingtimeofitsphase.Theshortestsamplingintervalrequiredtoseethefulllocalitysetislikelytobeasmallfractionofholdingtime.Successivelocalitysetsarelikelytobemostlydifferent,withfewoverlaps.

Page 10: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 10

RecentresearchbyChenDingandhisstudentswithlargesharedcacheshasdemonstratedthatlocalityisobservedincachereferencepatterns.Consequently,workingsetmemorymanagementalsoappliestosharedcaches,justasinmultiprogrammedoperatingsystems[xia11,xia13,xia18].

Ihavefoundthatstudentstodayoftenhavetroubleunderstandingtheprincipleoflocality.ManyarelikeMcMenaminbeforehisstudy:itseemscounterintuitivethatprogramshavesuchpronouncedlocalitybehavior,orthattrackinglocalityoptimizesperformance.Thismisunderstandingmayberootedintheirownexperienceofprogramming:theywerenotconcernedaboutmemoryconstraintsanddidnotconsciouslydesigntheirprogramstohavelongphasesofstablereferencestoobjects.Yettheexperimentalevidencerepeatedlyshowsthatwhentheirprogramisembeddedasamoduleintoalargersoftwaresystem,thewholesystemdisplaysthephase-transitionbehavioroflocality.

Thismisunderstandingissupportedbythelimiteddefinitionsoflocalityinoperatingsystemstextbooks.Thebooksusuallydefinelocalityasacombinationoftemporalandspatialclosenessofreferences.Thesedefinitionsignoretheempiricalfactofabruptchangesatphasetransitions.Noticethatrandomassignmentofdatatopagesmightremovespatiallocalitybutitwillnotremovetemporallocality.Thesamephasesandtransitionswillbeobservedonthepagereferencemap.

LetusillustratewithFigure1howwecanmeasuresomepropertiesoflocalitybehaviorfromapagereferencemap.ThefigureshowsfivelocalitysetsandtheirassociatedphasesforasamplingwindowTofapproximately380Ktimeunits.AmeasurementofthegraphyieldsTable1,wheresizeisthenumberofpagesinthelocalityset,lengthisthenumberofsampleintervalsinaphase,andfractionisthepercentageofthefulladdressspacecoveredbythelocalityset.CachepoliciesthatdetectlocalitysetsinFigure1wouldneedonlymemorysufficienttohold15-33%oftheaddressspacetoachieve100%oftheperformance.

Table1

set size length fraction

1 50 180 25%

2 65 220 33%

3 30 220 15%

4 55 50 28%

5 35 180 18%

Page 11: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 11

Noticethatthepunctuated-stabilityideaoflocalityeasilytranslatesintomeasurementsoflocalitysetsizes,phaselengths,andtransitionprobabilitiesonpagereferencemaps,whereasthevaguetermstemporalandspatiallocalitydonotsuggestmeasurementprotocols.

Pagereferencemaps,whichwereausefultoolintheearlystudiesofvirtualmemoryperformance,continuetobeusefultodayinstudiesofcacheperformance.Theyarestrikingevidenceoflocalityandvisuallyconveyconsiderableinsightintothedynamicsofexecutingprograms.

Likeotherscientificdiscoveries,theLocalityPrinciplebeganasahypothesisandwasacceptedasscientificfactonlyaftermanyvalidations.Theoriginalnotionof“slowdrift”temporallocalitygavewaytoamoresophisticatednotionof“punctuatedstability”characterizedbyphasesandtransitions.Workingsetisareal-timedetectorofthisbehavior.Itallowsavirtualmemorymanagertodynamicallyadjustmemoryallocationsandmaximizesystemthroughput.

MemorySpace-TimeLawOperatingsystemdesignershavealwaysbeenconcernedwithperformance.

Theywouldliketostateperformanceguaranteesforthroughputandresponsetimethatwillholdoverawiderangeofworkloadsandnumbersofsimultaneoususers.Oneofthemostchallengingquestionswashowtoestablishaconnectionbetweenmemorymanagementandthekeyperformancemeasureofthroughput.Whencanwesaythatoptimizingmemorymanagementoptimizessystemthroughput?

Whenjobsuselessspace,wecangetmoreofthemintomemoryandincreasethroughputbecauseoftheparallelism.Andwhentheycompleteinlesstime,wecanprocessmoreofthemovertime.Thisiswhymanyofushaveanintuitionthatthesmallerthespace-timefootprintofprograms,thelargeristhesystemthroughput.

Thespacetimefootprintisthenumberofpage-secondsofmemoryusageofanexecutingjob.Apage-secondisaunitofrentforusingmemory.Itisanalogoustotheideaofchargingforofficespacebythenumberofsquare-feetrented,orcharginglaboronaprojectbyperson-hours.Whenaprocessloads1pageintomainmemoryforSseconds,theprocessaddsSpage-secondstoitsmemorybill.LoadingSpagesfor1secondalsoaddsSpage-secondstothebill.

JeffBuzenin1976discoveredthe“memoryspace-timelaw”(MSTLaw),anexactformulathatlinksspace-timeandsystemthroughput[buz76].Itsays“averagetotalmemoryusedbyalljobs=systemthroughput´averagespace-timefootprintperjob”.Insymbols,

𝑀 = 𝑋𝑌

Theproofissimple.MeasurethesysteminanintervaloflengthT.LetZdenotethetotalspace-timeusedbyallthejobsinthatinterval.(Forfixedmemory,Z=MT.)Thethroughputisthenumberofjobscompletedinthatinterval(C)dividedbythe

Page 12: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 12

lengthoftheinterval:X=C/T.Themeanspace-timefootprintofajobisY=Z/C.TheproductofXandYisobviouslyM=Z/T.7

TheMSTLawisgeneralandappliestoanysystemornetworkthathasmemoryusageandthroughput.Somepeoplefindithardtobelievethattherelationbetweenmemoryusageandthroughputisthissimple.Itreallyis.

Theobviousconclusionisthatapolicythatminimizesspace-timewillmaximizethroughput.Tomakeiteasytotellwhenthisishappening,wewill,intheanalyticstofollow,usespace-timetomeasurethememoryusageofworkingsetsandothermemorypolicies.Ifneeded,wecaneasilyconvertspace-timemeasurestotimeaveragesbydividingspace-timebythelengthoftimeofthemeasurement.

Thereisacomplication.Thespace-timemeasuresofmemorypoliciesaredefinedinvirtualtime.Thespace-timeneededfortheMSTLawisdefinedinrealtime.Weneedawaytoconvertvirtualspace-timetorealspace-time.

Themostaccuratewaytodothisiswiththehelpofaqueueingnetworkmodel[den78b].Themodelwouldaccountforalldelaysbeyondvirtualtime,suchasinput-outputandpagetransfers.Settingupsuchamodelisbeyondthescopeofthistutorial.

However,agoodapproximationcanbemadesimplybyaugmentingvirtualspace-timewiththespace-timeaccumulatedwhileservicingpagefaults.Todothis,werequirethesequantities:

D=pagefaultdelay,typically106memoryaccesses

N=lengthofvirtualtimeaprocessisexecutedS=virtualspace-timeaccumulatedbytheprocessduringexecution

C=numberofpagefaultsaccumulatedduringexecutionm=missrate,C/N

Thespace-timecostofonepagefaultisthemeanspaceS/NtimesthedelayD.Forallpagefaultsitis(S/N)(D)(C)=(S)(D)(C/N)=SDm.Thereforethetotalrealspace-timeisestimatedasS+SDm,orinthenotationoftheMSRLaw,

𝑌 = 𝑆(1 + 𝐷𝑚)

Thustherealspace-timeisapproximatelythevirtualspacetimedilatedbythefactor1+Dm.Forexample,ifthemissrateis10-4(1faultin104memoryaccesses),thedilationfactoris101.

PoliciessuchasLRUorFIFOworkwithafixednumberkofpages;theirtotalvirtualspace-timeiskN.Thus,Yissmalleronlywhenmissmallerandthepolicywithlowestmissratewillhavehighestsystemthroughput.Butthereismoretothe

7Thememoryspace-timelawcanbeseenasaninstanceofLittle’slaw.Little’slawsaysthatthemeannumberinsystemistheproductofthethroughputandthesystemresponsetime.WecaninterpretMasthemeannumberofpagesinthesystem;Xasthethroughput;andYastheaggregateholdingtimeaccumulatedbyajobforallitspages.

Page 13: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 13

story.Oncethemissrateasafunctionofkisknown,thereisavalueofkthatminimizestheexpressionforY[den80].Evenfortheoptimalpolicy,thereisabestmemorysizethatminimizesthespace-time.

Whenwediscussthevariable-spacepolicyworkingset,wewillseethatminimizingvirtualspace-timemaximizessystemthroughput.

TwoContrastingViewsofMemoryManagementThefamiliarwayoflookingatthememoryseesasingleCPU(representingajob

inexecution)accessingpagesinafixed-sizememory.NootherCPUormemoryregionisvisible.ThisiscalledtheFixedMemoryView(FMV).(SeeFigure2.)Inthisview,thejobgetsafixedspaceinmainmemoryandisunaffectedbythepresenceofotherjobsinthememory.Theperformanceofareplacementpolicyisthencompletelydeterminedbyitstotalfaultcountfunction.

Figure2.TheFixedMemoryView(FMV)seesthesystemasasingleCPUaccessingasingleRAM(mainmemory)offixedsizekpages.AddressmappinghitsaretranslateddirectlytoRAMaddresses.AddressmappingmissestriggerpagefaultsthatcauseupanddownpagemovesbetweenRAMandDISK(secondarymemory).TheobjectiveistoefficientlycomputethefaultfunctionF(k),whichcountsthenumberofpagefaultswhenmemorysizeiskpages,andthenselectreplacementpoliciesthatminimizeF(k).Inasystemwithmultiprogramming,theRAMvisibleinthisviewisafixedregionofthefullRAM.

Page 14: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 14

Extensiveexperimentalstudiesandexperiencewithoperatingsystemsleddesignerstofavortwobasicreplacementpolicies.FIFO(firstinfirstout)treatsthememoryspaceofkpagesasaFIFOqueueandignorespageusage.Itsprimaryattractionissimplicityandalmostnegligibleoverhead.LRU(leastrecentlyused)treatsthememoryasacontainerofthekmostrecentlyusedpages;atapagefault,itreplacesthepageinmemorythathasnotbeenusedforthelongesttime.AlthoughLRUgeneratesfewerpagefaultsthanFIFO,LRUhasahighimplementationoverhead.ManyanalystsbelievedthatthesavingsofLRUarecancelledbyitsoverhead.

AnelegantcompromisebetweenFIFOandLRUiscalledCLOCK.IttreatstheFIFOqueueasacircularlistofsizekwithascanningpointer;thepagenamesareanalogoustothenumeralsonrimofaclockandthepointertotheclock’shand.Atapagefault,theoperatingsystemmovesthehandalongthelist,skippingoverthosewithusebitson(andresettingthem).Whenitfindsapagewithusebitoff,itselectsthatpageforreplacement.CLOCKhasoverheadcomparabletoFIFOandperformancecomparabletoLRU.CLOCKiscommonlyusedinoperatingsystems.(Inearlyvirtualmemorysystems,CLOCKwascalledFINUFO,forfirst-in-not-used-first-out[den68a].)

In1970,RichardMattsonandhisIBMcolleaguesdiscoveredahighlyefficientwaytorepresentalargeclassofpagingalgorithmsandcomputetheirfaultfunctions.Theycalledtheirtheory“stackalgorithms”[mat70].Astackalgorithmisapagingpolicywhosememorycontentscanberepresentedwithasinglelistofallthejob’spagescalledthestack,suchthatthecontentsofk-pagememoryarethefirstkelementsofthestack.LRU’sstacklistsallthejob’spagesfrommosttoleastrecentlyused;thecontentsofthek-pageLRUmemoryarethefirstkpagesinthestack.Ateachpagereference,LRU’sstackisupdatedbymovingthereferencedpagetothetopandpushingtheinterveningpagesdownoneposition.Forageneralstackalgorithm,muchthesamehappens:thereferencedpagemovestothetopandtheinterveningpagesarerearrangeddownwardaccordingtotheirrelativeprioritiesassignedbythepagingpolicy.Thepositionofthenextreferenceinthestackiscalledstackdistance.ThefaultfunctionF(k)canbecomputedsimplyasthenumberofstackdistanceslargerthank.Thiseleganttheoryisfrequentlydiscussedinoperatingsystemstextbooks.

Thestacktheorydidnotanswertwoimportantquestions:Whatistheoptimalamountofmemorytoallocate?HowissystemthroughputrelatedtothefaultfunctionF(k)?Thesequestionswereansweredbyothermodelingtechniques;detailsarein[den80].

Unfortunately,theFMVanditsanalytictheoryisnotveryhelpfulforrealoperatingsystems,whichallowmultiplejobstoresideinRAMatthesametime.TheFMVgivesnoinsightintoimportantquestionsincluding:

• HowtopartitiontheRAMamongthejobs?

Page 15: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 15

• Howmuchspacetogiveeachjob?

• Howtomanagevariationsinthespaceallocations?

• Howtohandlepagereplacementtomaximizesystemthroughput?

• Howtomanageinteractionsamongjobssuchasapagefaultinonestealingapagefromanother?

• Howtopreventthrashing,acollapseofsystemthroughputwhentoomanyjobsareloadedintoRAM?

Weobviouslyneedadifferentwayofthinkingaboutthecommonsituationofmultiprogramming.Itiscalledthesharedmemoryview(SMV).(SeeFigure3.)

Figure3.Thesharedmemoryview(SMV)seesthesystemasasetofCPUs(oneforeachjob)sharingtheM-pageRAM.ThememoryispartitionedamongNjobs,eachgettingitsownsetofpageframesdisjointfromalltheothers.Apagefaultmaytriggerthepagereplacementalgorithmtostealapagefromanotherjob,therebyincreasingthespaceoccupiedbythefaulteranddecreasingthespaceoccupiedbythevictim.

Page 16: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 16

Becauseofallthefactorsinvolved,theoptimalmanagementofsharedmemorycannotbeinferredfromfixedmemory.Obviousextensionsofthefixedmemoryviewleadtopoorperformanceandinstability.Forexample,manyoperatingsystemssimplyextendedthefixedmemoryviewtoincludeallofRAM.TheresultingglobalLRUwouldreplacetheoldestunusedpageinRAMregardlessofwhichjobitbelongsto.Unfortunately,thegloballistofpagesdoesnotreflectactualrecencyofusewithinjobs.Itisorderedmainlybytheround-robinschedulerofthereadylist:thepagesatoptheglobalLRUstackbelongtothejobthatmostrecentlyreceivedatimeslice.Ifpagingactivityishigh,bythetimeajobcyclesbacktothefrontofthereadylistsomeofitspageshavebeenremoved.Thecascadingeffectcauseshighpagingineveryjob,pushingwholesystemintoastateof“pagingtodeath”,inwhicheveryjobspendsmostofitstimequeuedattheDISKandCPUthroughputcollapses.Thisconditioniscalledthrashing.(SeeFigure4.)

Figure4.ThrashingisthecollapseofsystemthroughputwhentoomanyjobsareloadedintoRAMatonce.Itisachaoticconditionwhoseonsetcannotbepredictedaccurately–thetriggerthresholdN*isunpredictableandverysensitive.Loadingoneadditionaljobcantriggerthrashing.Thiscanhappeninanysharedmemory,suchascache,notjustRAM.

Page 17: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 17

Weneedadifferentwayofthinkingaboutmanagingsharedmemory.Theworkingsetmodelprovidesthebasis.Weneedtoabandonthefixedmemoryviewandinsteadusetheworkingsetinterpretationofsharedmemoryviewtoseewhatisgoingon.Theremainderofthispaperwillapproachthisintwostages.First,wewilldefineanalyticmethodsforcomputingworkingsetstatistics,suchasmissrate,fromagivenaddresstrace.Thesemethodsdonotdependonlocalityoranyotherassumptionsaboutprogrambehavior.Second,wewillshowthattheworkingsetpolicyisclosetooptimalforprogramswhoseaddresstracesconformtotheprincipleoflocality.Inthatcase,theworkingsetspace-timeisveryclosetotheoptimalspace-time.

WorkingSetsOriginallytheterm“workingset”wasaninformaltermmeaningthesmallest

setofajob’spagesthatneededtobeloadedinmainmemoryforefficientexecution.Theworkingsetmodelgaveaprecisedefinitionintermsofthepage-referencebehaviorofanexecutingjob.Theexecutingjobitselfinformsusofwhatmemoryitneeds,withoutregardforexternalfactorssuchasinterrupts.Thus,workingsetsareameasureoftheintrinsic,dynamically-varyingdemandofacomputationformemory[den68a,den72].

Theformaldefinitionisthattheworkingsetataparticulartimetisthepagesreferencedinabackward-lookingwindowofsizeTincludingtimet.ItisdenotedW(t,T).Thesize,w(t,T),isthenumberofdistinctpagesinW(t,T);thesizeisalwaysatleast1andneverlargerthanT.SizemaybeconsiderablysmallerthanTduetorepeatedreferencestothesamepageswiththewindow.Figure5illustrates.

Figure5.Thisexampleshowstheworkingsetofasimplejobattwodistincttimes.Weassumetimeisdiscreteandeachclocktickrepresentsasinglepagereference.Theseriesofnumbersabovethetimeline,calledanaddresstrace,isthesequenceofpagenumbersaccessedbyajob.Inthiscase,thereareaccessesattimest=1,2,…,15.ThewindowsizeisT=4.Thebackwardwindowatt=8contains4referencestothreedistinctpages;itssizeis3.Thebackwardwindowatt=15contains4referencestofourdistinctpages;itssizeis4.Weimaginethewindowslidingalongthetimeline,givingusadynamicallyvaryingseriesofworkingsets.

Page 18: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 18

Theworkingsetwindowcanbethoughtofasalease.AleaseisaguaranteetoholdapageinRAMforminimalperiodoftimeT.Apage’sleasecanbestoredinatimerregisterassociatedwithapageframe.WhenapageisloadedintoRAMorreusedwhileinRAM,itsleaseisresettoT.Itticksdownto0afterTtimeunitsofnon-use.Whentheleaserunsout,thepageisevictedfromtheworkingset.Theleasedefinitionisequivalenttothewindowdefinition[li19].

TheWorkingSetPolicyWorkingsetmemorymanagementpartitionsthemainmemoryinto

dynamicallychangingregions,onefortheworkingsetofeachjobusingthememory.Thebasicideaistomaintaineachjob’sworkingsetandnotallowanyotherprocesstostealpagesfromit[den68a].(SeeFigure6.)Thisisimplementedbymaintainingafreespaceinmemory,usuallysmallerthananyoftheworkingsets.Whenajobreferencesapagenotinitsworkingset(apagemissorfault),theoperatingsystemtransfersapagefromfreespacetothejob,increasingitsworkingsetsizebyone.Inparallel,whenapagetimesoutfromitsjob’swindowT,theoperatingsystemtransfersitbacktothefreespace,decreasingtheworkingsetbyone.Inthisregimepagefaultsandevictionsneednotcoincideastheydowhenmemoryallocationisfixed.

Figure6.Theboxrepresentsmemory,aRAMorcache,inusebyNexecutingjobs.Eachjob’sworkingsetisloadedinmemory.ThememorynotoccupiedbyworkingsetsiscalledFREE.Whenajobencountersapagemiss,thememoryslottoholdthenewpageistransferredfromFREEtothejob’sworkingset,thusenlargingthatworkingsetbyonepage.(IfFREEisempty,theincomingpagereplacestheLRUpageoftheworkingset.)WhenaworkingsetpageleaseTtimesout,thepageisevictedfromtheworkingsetandthe

Page 19: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 19

emptymemoryslotreturnedtoFREE,thusdecreasingthatworkingsetbyonepage.Whenajobquits,allitspagesareerasedandtheirmemoryslotsreturnedtoFREE.Aschedulermaintainsaqueueofjobswantingtousememory,admittingthenextoneonlyifFREEislargeenoughtoholditsworkingset.Thispolicypreventsanyjobfromstealingapagefromanotherduringapagefault,therebyprotectingthesystemfromthrashing.

AnimportantquestionishowtochoosethewindowsizeT?Whenwereturnto

thisquestionlater,wewillseethatwecanselectasingleglobalvalueofthewindowTforwhichworkingsetmanagementdeliversnear-optimalsystemthroughput.

Moreover,becausetheworkingsetpolicypreventsprocessesfromstealingpagesfromeachother,itisimmunetothrashing.

Theseaspects–simplicityofworkingsetmemorymanagement,optimalityofthroughput,andresistancetothrashing–makeworkingsetmemorymanagementtheidealofoperatingsystemsandcaches[den80].

Workingsetanalytics,discussednext,showsushowtoefficientlymeasureworkingsetstatisticstodeterminethememorycapacityofasystemandthethroughputlikelytobeobservedunderaworkingsetpolicy.Allthedataneededfortheworkingsetstatisticscanbemeasuredinasinglepassofanaddresstraceandcapturedasahistogramofreuseintervals.Eachstatisticisasimplelinearcomputationfromthosedata.Theanalyticsdonotdependonanylocalityorstochasticassumptionsabouthowjobsrefertotheirpages.

Traces,Reuses,ColdandWarmStartsInthissectionwewilldefinethebasicterminologyandnotationused

throughoutworkingsetanalytics.

Allthestatisticsaremeasuredindiscretevirtualtime,whichisthetimeofprogramexecution,onetickpermemoryaccess.Delaysforinput,output,ortime-slicingareignored.Measuringinvirtualtimeallowsustoseetheinherentmemorydemandofaprogramwithoutdistortionbyrandominterrupts.Whenneeded,wecanconvertthesemeasuresbackintorealtimebyinsertingdelayswheninterruptsoccur.

Anaddresstraceisarecordingofthesequenceofpagenumbersreferenced(accessed)byajobattimest=1,2,…,N;r(t)=imeansthattheprocessaccessed(used)pageiattimet.OSperformanceanalystsuseaddresstracesasinputstosimulatorsofmemorymanagementpoliciesintheOSorthecache.TracescanalsobeusedtobuildpagereferencemapssuchasFigure1.Theintervalsbetweensuccessivereferencestoapagearecalledreuseintervals.Figure7illustrates.

Page 20: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 20

Figure7.TheaddresstraceofFigure5isshownagainintoprow.Itspans15timeunits(N=15)and5pages(M=5).Thereuseintervalsappearjustbeloweachreference;areuseintervalisthetimesincepriorreference.Themark“x”indicatesfirstreferences,whichhavenoprioruse.Reuseintervalsareimportantbecausetheyindicatewhetherapageisintheworkingset.Forexample,attimet=6,page2isreusedwithinterval4;aworkingsetwithwindows3orlesswillgenerateapagemissatthattime.

Themisscount,mc(T),isthenumberofmisses–referencesnotintheworkingset.Missescausepagefaults.Weoccasionallyspeakofthe“missrate”,whichissimplythemisscountdividedbyN.Itissometimessuggestedthat,topreventthelargespeedgapof106betweenmainandsecondarymemoryfromruiningperformance,weshouldchooseTlargeenough,ifpossible,tokeepmissratesnohigherthan10-6.Aswewillsee,however,thisisnotthebestwayofchoosingT.

Thefirstreferencestopagearedifferentfromtheothersbecausetheyhavenoreuseintervals.Whethertheycauseinitialpagefaultsdependsonhowthememoryisinitialized.Therearetwopossibleinitializations.Thecoldstart(ornormal)initializationissimplyanemptymemory;thefirstreferencescausepagefaults.ThewarmstartinitializationcontainsallMpages;thefirstreferencescausenopagefaults.Theworkingsetcontentsthroughouttheaddresstracearethesameforcoldandwarmstart.Thedifferenceisthatwithcoldstarteveryfirstreferenceisamiss;withwarmstarteveryfirstreferenceisahit.Themisscountmc(T)isusedforcoldstartandanewcountmw(T)isusedforwarmstart;thedifferenceis

𝑚𝑐(𝑇) − 𝑚𝑤(𝑇) = 𝑀Thedistinctionbetweencoldandwarmstartsisusedinoperatingsystemsfor

thegeneralnotionofwhetherresumptionofasuspendedjoboccurswithemptymemoryorwiththepreviouscontentsofmemory.Coldstartsareslowbecauseallthedatamustbereloadedfromthedisksorothersources;warmstartsarefastbutrequirealotofinitialmemory.Theaddresstracemodelimplicitlyassumeswarmre-startsofprocessessuspendedbyinterrupts–thememorycontentsattimet+1arejustwhattheywereaftertimet,whetherornotaninterruptionoccurredbetweentandt+1.

SomeanalystsspeculatedthatvirtualmemoryperformancewouldimproveiftheOSachieveswarmstartbypreloadingpages.Ourformulas,however,showthat

Page 21: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 21

inthelongrunthereisverylittledifferencebetweeninitialcoldandwarmstartsofaddresstraces.

MeanWorkingSetSizeThedefinitionofmeanworkingsetsizeoveranaddresstraceis

𝑠(𝑇) = 1𝑁7𝑤(𝑡, 𝑇)

$

%&!

=𝑠𝑡(𝑇)𝑁

wherest(T)isthespace-timeoccupiedbyworkingsetsinthetrace;oneunitofspace-timemeansthatonepage(space)wasinmemoryforonetimeunit(time).Weseethat[den68a,den72,den78a]

• Whent<T,theT-windowextendsbackwardbeforethestartofthetrace;onlytheportionofthewindowcontainedinthetracecountstowardst(T).Thus,for1≤t<T,theworkingsetsizecanbewrittenw(t,t).

• Fort≥T,thereareN-T+1workingsetswithwindowTfortheremainderoftheaddresstrace.

Insomeoftheanalyticsdiscussedbelow,thisdistinctionisimportant,andwewillseparatetheoriginalsumintotwopartsfort<Tandt≥T.Wewillworkwithfourspace-timeandcountingmeasures:

st(T)=space-timeaccumulatedbyworkingsetsofwindowT mc(T)=cold-startmisscount,thenumberofmisseswithwindowT

mw(T)=warm-startmisscount,sameasmc(T)excludingMfirstreferences

mwh(T)=warm-start-hot-finishmisscount(definedbelow)EachmeasurecanbeconvertedtoatimeaveragebydividingbyN,thetrace

length.Forexample,theclassicalmeanworkingsetsizeandmissrateare:

𝑠(𝑇) = 𝑠𝑡(𝑇)𝑁

𝑚(𝑇) =𝑚𝑐(𝑇)𝑁

ColumnSums,RowSums,andRunsWecandepictdynamicmemoryusewithabit-matrixwithonecolumnforeach

referencefort=1,…,Nandonerowforeachpagei=1,…,M.Position(i,t)is1ifpageiisinworkingsetW(t,T)and0ifiisnotinW(t,T).Letuscallthismatrixtheworking

Page 22: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 22

setresidencymap.Inthemap,thespace-timeoccupiedbyworkingsets,st(T),isthetotalnumberofpositionscontaining1.Figure8illustrates.

Figure8.Theaddresstracegivenearlierisshownacrossthetop,labelingthecolumns,andthepagesalongtheside,labelingtherows.Eachmapposition(i,t)ismarkedwith1ifthepageiisintheworkingset(T=4)atthattimet,or0otherwise.Thecolumnsrepresentthedifferentworkingsetsandthenumberof1sinacolumnistheworkingsetsize.Thecolumnatt=0indicatesthattheworkingsetisinitiallyempty(coldstart).(Warmstartwouldberepresentedbyacolumnof1s.)Notethatamissoccurswheneverthereisa1inarowimmediatelyprecededbya0.Notealsothatthisisaminiaturepagereferencemapwithsamplinginterval1.

Letcol(t)bethenumberof1’sincolumntofthematrix;noticecol(t)istheworkingsetsizew(t,T).

Letrow(i)bethenumberof1’sinrowi.

Arunisaseriesofconsecutive1sstartingwithamissandendingwitheithera0orendoftrace.Figure9illustrates.ThefinalrunmayterminateattimeNwithnomore0’s.

Page 23: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 23

Figure9.Arunisatimeintervalduringwhichaparticularpageiscontinuouslyintheworkingset.Inthisillustration,eachverticalmarkisareferencetotheparticularpageandthedistancesbetweenverticalmarksarereuseintervals.Arunbeginswithamiss,containszeroormorereuseintervals≤T,andendsinareuseinterval>T.Inthereuseinterval>T,thepageremainsintheworkingsetforT-1timeunitsbeyondthepreviousreference–theoverhang.Aruncanendwithasmalleroverhangafterthelastreferencetothepage;afinalintervaloflengthk≤Thousesanoverhangofk-1,notT-1.Eachnewmissstartsanewrun.

Little’sLawforWorkingSetsLittle’slawisveryusefulinqueueinganalysesbecauseitgivesarelation

betweenthreemeanvalues:themeannumberinasystemistheproductofthemeanholdingtimeinthesystemandthethroughputofthesystem.Foraworkingsetconsideredasanevolvingsystemcontainingpages,thenumberisthemeanworkingsetsize,theholdingtimeisthemeanlengthofarun,andthethroughputisthemissrate:

𝑠(𝑇) = 𝑅(𝑇)𝑚(𝑇)

Thislawiseasytoverifyfromourdefinitions.Becauseeveryrunbeginswithamiss,thetotalnumberofrunsismc(T).Themeanrunlengthis

𝑅(𝑇) =1

𝑚𝑐(𝑇)7𝑟𝑜𝑤(𝑖)'

#&!

Then:

𝑠(𝑇) = 1𝑁7𝑐𝑜𝑙(𝑡) =

1𝑁

$

%&!

7𝑟𝑜𝑤(𝑖) = 7𝑟𝑜𝑤(𝑖)𝑚𝑐(𝑇)

'

#&!

'

#&!

𝑚𝑐(𝑇)𝑁 = 𝑅(𝑇)𝑚(𝑇)

IntheexampleofFigure8,thetotalspace-timeis48andnumberofrunsis7;thuss(4)=48/15,m(4)=7/15,andR(4)=48/7.

Little’slawalsoquantifiestheamountofoverhang(seeFigure9).Foraverylongaddresstrace,everyrunterminateswithareuseinterval>TwithoverhangT-1.

Page 24: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 24

Takethe“system”tobeadelaylineofT-1timeunitsrepresentingtheoverhang.Thethroughputism(T).Theresponse(holding)timeofapageinoverhangisT-1.Thereforethemeannumberofpagesinoverhangistheproduct(T-1)m(T).(ThisargumentisnotexactbecausesomeoverhangsattheendofthetraceareshorterthanT-1;wewillshowshortlyhowtocorrectforthis.)ThemeanoverhangisusefultoassessthedistanceWSisfromoptimalbehavior:ifwecouldeliminatealloverhangs,theworkingsetwouldbeoptimal.Wewillprovethisshortly.

WorkingSetRecursionsInworkingsetanalytics,recursionsaresimpleformulasthatrelateameasure

atwindowsizeT-1tothemeasureatwindowsizeT.Wewillderiverecursionsformisscountandworkingsetspace-time.Theserecursionsenableustocalculatemissrateandworkingsetsizeiterativelyinlineartime–thatis,O(N)forNtheaddresstracelength.

ThealgorithmformissrateismuchfasterthanitscounterpartintheFixedMemoryView.ThebestalgorithmintheFixedMemoryViewsimulatesthestackinordertogetthestackdistances,requiringtimeO(MN).Thisdifferencecanbesignificant.Forexample,a32-bitaddressspace(232)composedof4096-bytepages(212)hasM=220(106)distinctpages;computingfixed-memoryfaultcountwouldthereforetakeamilliontimeslongerthantheworkingsetmisscount.

Inasinglepassofthetrace,wecanaccumulateahistogramofreusecounts,𝑐(𝑘) = 𝑛𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝑟𝑒𝑢𝑠𝑒𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙𝑠𝑜𝑓𝑙𝑒𝑛𝑔𝑡ℎ𝑘

fork=1,…,N.Wewillalsoincludeoneadditionalcounterc(x)thatcountsthenumberoffirstreferences.Thesecounterscanallbeupdatedinlineartime.8

Bydefinition,thereisnoreuseintervalprecedingthefirstreferencetoapage.ThenextreferenceisamissifthereuseintervallengthisgreaterthanT.Inotherwords,forwarmstart,

𝑚𝑤(𝑇) = 7 𝑐(𝑘)()*

andthusforcoldstart𝑚𝑐(𝑇) = 𝑀 +𝑚𝑤(𝑇)

Themisscountsatisfiesthesimplerecursion

𝑚𝑐(𝑇 + 1) = 𝑚𝑐(𝑇) − 𝑐(𝑇)withtheinitialconditionmc(0)=N.

8Thiscanbedonebymaintainingtimestamps,last(i),forthelastreferencetopagei.Atareferencetopageiattimet,thereuseintervalisk=t-last(i).Afterthelastreference,theendintervalisk=N+1-last(i).Adding1toc(k)foreachoftheseeventsleavescorrectedvaluesinthecounters.Thisisthesameprocedureasinthe1978paperwithSlutz[den78a].

Page 25: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 25

Nowweturntoarecursionfortheworkingsetsize.Todothis,weexploitaninclusionproperty:therunsforwindowT+1includethoseforwindowT.

WhathappenswhenweincreaseTtoT+1?Atfirstapproximation,everyrunisextendedbyjust1unit:allthe“reuse≤T”intervalsarealso“reuse≤T+1”intervalsandonlythefinal“reuse>T”hasroomforexpansion.Itsexpansionisoneunitofspace-time.Becausethetotalnumberofrunsismc(T),thetotalnumberof1’saddedtothereferencemapismc(T).Thus

𝑠𝑡(𝑇 + 1) = 𝑠𝑡(𝑇) + 𝑚𝑐(𝑇)

Thisrecursionisexactforinfiniteaddresstraces[den68a,den72],butcontainsanerrorforfinitetraces[den78a].Theerrorsoccurintheendintervalsofpagesasfollows.ThereareMendintervals,oneforeachpage.Anendintervalisthetimefromthelastreferencetoapageuntiltheendofthetrace.Noticethatthisisthesameasifwepretendthereisanother,phantomreuseofpageiattimeN+1.Noticealsothatanendintervaloflength≤TwillnotexpandwhenTincreasestoT+1.Tocorrectforthis,weneedtodeductfrommc(T)thenumberofend-intervalsoflength≤T.Wecandefineanendfactor

𝑒(𝑇) = 𝑛𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝑒𝑛𝑑𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙𝑠 ≤ 𝑇andthentheaccuraterecursionis

𝑠𝑡(𝑇 + 1) = 𝑠𝑡(𝑇) + 𝑚𝑐(𝑇) − 𝑒(𝑇)Wecanexpressthiswiththemeans

𝑠(𝑇 + 1) = 𝑠(𝑇) + 𝑚(𝑇) −𝑒(𝑇)𝑁

Becausee(T)≤M,theendfactorvanishesasNbecomeslarge.9Wecansimplifythisbylookingcloselyatthewaytheendintervalscontribute

toe(T).Thereisonlyoneendintervalforeachpagei.Definetheend-counterec(k)=1ifapagehasend-intervaloflengthkand0otherwise.Thenthesumofallthee(k)isM.Thelasttwotermsinthest-equationabovecanbereduced:

𝑚𝑐(𝑇) − 𝑒(𝑇) = 𝑀 +7𝑐(𝑘) − 7 𝑒𝑐(𝑘)!+(+*()*

= 𝑀 +7𝑐(𝑘) −()*

J𝑀 −7𝑒𝑐(𝑘)()*

K

= 7(𝑐(𝑘) + 𝑒𝑐(𝑘))()*

≜ 𝑚𝑤ℎ(𝑇)

9Itisinterestingtonotethate(T)=w(N+1,T),thenumberofpagesinaworkingsetmeasuredattheendofthetrace.ThereasonisthatapageisinthatworkingsetifandonlyifitsfinalreferenceiswithinthelastTwindowofthetrace.

Page 26: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 26

Inotherwords,thetermmc(T)-e(T)=mwh(T)iscomputedbyaddingtheend-correctioncountstothereusecountsc(k).Wecallthemisscountwithwarmstartandcorrectedendintervalsthe“warmstarthotfinish”misscount.Itisawarmstartbecauseinitialpagefaultsarenotcounted.Itis“hotfinish”becausetheendcorrectionsareappliedattheveryendbypretendingthatallMpagesaresimultaneouslyreferencedattimeN+1.Thiscanbesummarizedastheworkingsetsizerecursion:

WorkingSetSizeRecursion

𝑠𝑡(𝑇 + 1) = 𝑠𝑡(𝑇) + 𝑚𝑤ℎ(𝑇)wherest(T)isthespace-timeaccumulatedbyworkingsetsofwindowTandmwh(T)isthewarmstarthotfinishmisscount.Theinitialconditionsarest(0)=0andmwh(0)=N.

Noticethatforverylongaddresstraces(largeN),theMendcorrectionsareinsignificantcomparedtothetotalofallthecounters.ForlargeN,mwh(T)convergestomc(T).

Therecursionwereported1972wasmathematicallythesame,butitdependedontheassumptionthatthereferencestringisarandom(stochastic)processthatentersalong-termsteadystate[den72].In1978weremovedthestochasticassumptionandfoundtherecursionworksforfiniteaddresstracesifwemakeendcorrectionstothecounters.Theworkingsetrecursionstatedherehasamuchsimplerderivation.

Theworkingsetrecursioncanalsoberun“backwards”todeducethemissrategiventhemeanworkingsetsize.Ifwehadadirectmeasurementofthemeanworkingsetsize,wecouldfindthemisscountsbytakingthedifferencesmc(T)=st(T+1)-st(T).[xia13]

Page 27: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 27

Forcompleteness,wesummarizethemissraterecursions:

SummaryofMissRateRecursionsMissratesetrecursionsenablethelinear-timecalculationofmisscountsatwindowsizeT+1intermsofthoseforwindowsizeT.Formisscounts:

𝑚𝑐(𝑇 + 1) = 𝑚𝑐(𝑇) − 𝑐(𝑇)Wherec(T)isthecountofreusedintervalsoflengthT,withinitialconditionsc(0)=0andmc(0)=N.Forwarm-start-hot-finish

𝑚𝑤ℎ(𝑇 + 1) = 𝑚𝑤ℎ(𝑇) − 𝑐𝑐(𝑇)

wherecc(T)=c(T)+ec(T)isthecountcorrectedforendintervals.Theinitialconditionsarecc(0)=0andmwh(0)=N.

ThemisscountmeasuresmwhandmcbothcontainNcounts;mcincludesMfirstreferencesbutnoendcorrections,andmwhincludesnofirstreferencesbuthasMendcorrections.AsNgetslarge,bothratesmc(T)/Nandmwh(T)/Nconvergetom(T).

ExampleFigure10showsthesequenceofreuseintervalsandendintervalsforthe

exampleaddresstrace.Wetallythesedataalongwiththemisscountsandworkingsetspace-timeinthetablebelow.Therowfork=0givestheinitialconditions.

Figure10.ThetoprowistheaddresstracefromFigure7.Thereuseintervalsareshowninthemiddlerow,whereeach“x”marksafirstreference.Theendintervalsareshowninthebottomrow,whereeach“x”marksanon-finalreference.

Page 28: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 28

Table2

k c(k) ec(k) mc(k) mw(k) mwh(k) st(k)

0 0 0 15 10 15 0

1 1 1 14 9 13 15

2 1 1 13 8 11 28

3 1 1 12 7 9 39

4 5 1 7 2 3 48

5 1 0 6 1 2 51

6 0 1 6 1 1 53

7 0 0 6 1 1 54

8 0 0 6 1 1 55

9 0 0 6 1 1 56

10 0 0 6 1 1 57

11 1 0 5 0 0 58

12 0 0 5 0 0 58

13 0 0 5 0 0 58

14 0 0 5 0 0 58

x 5

ThesedataareplottedinFigure11andcomparedwithLRUandMIN.ThegraphshowsaregionatthesmallermemoryallocationswhereLRUisbetterthanWS.Asimilarpatternissometimesobservedinactualprograms[wan15,wir14].

Page 29: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 29

Figure11.Thisgraphcomparesthreefaultcurves.Afaultcurveplotsthenumberoffaults(verticalaxis)versusmemorydemand(horizontalaxis).ThesolidcurveistheWSexampleofFigure8.LRUandMINarealsoshown.Forafixed-spacepolicy,thememorydemandinspace-timeistheproductofthememorysizeandaddresstracelength;heretheallowablememorysizes1,2,3,4,5forLRUandMINcorrespondtospace-times15,30,45,60,75.Incontrast,WSisvariablespaceandisdefinedatpointscorrespondingtonon-integeraveragememorysizes.Forthisaddresstrace,WSismostlybetterthanLRUandmostlyworsethanMIN,althoughatthelargestwindowsize,WSisslightlybetterthanMIN.Thisisnotananomaly,butrathertheconsequenceofworkingsetsbeingvariableinsize.

Page 30: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 30

OptimalMemoryPolicyNowwewillexaminetheoptimalpoliciesformanagingmemory.Theoptimal

policyforFixedMemoryisMIN.ItwasdefinedbyLesBeladyin1966[bel66].MIN’sprinciple,invokedatapagefault,is“replacethepagethatwillnotbereusedforthelongesttimeinthefuture.”Thispolicyisunrealizablebecausetheoperatingsystemcannotknowthefuture.However,itsfaultcountcanbecomputedinasinglepassofanaddresstracewithaboutthesameoverheadasforLRU:orderO(NM)[matt70].

TheoptimalpolicyforSharedMemory,inwhichthespaceallocatedtoajobcanvary,isVMIN.ItwasdefinedbyBartonPrieveandRobertFabryin1976[pre76].VMIN’sprinciple,invokedaftereachreference,is“retainthecurrentreferenceinmemoryifitsforwardreuseintervalis≤T,otherwisedeleteitimmediately.”Thispolicyisalsounrealizable.However,itsfaultfunctionandmeansizecanbecalculatedrapidlyasforWS:orderO(N).

Itisinterestingthatdatabasedesignersdiscoveredthesameoptimizingprincipleinthe1970s[gra85].Hereistheirargument.Supposewewanttodecidewhentokeepapageofdatainmainmemoryversusthemuchslowerharddisk.Justafterthepageisused,welookintothefuturetoseeexactlywhenitwillbeusedagain.Thenwedoasimplecalculationtocomparetherentalcostofkeepingitinmemoryuntilnextusewiththecostofremovingitimmediatelyfrommemoryandpayingtheswappingcostforitsretrievallater.Thedatabasedesignersfoundthatwithtypicalparametersformemorycostanddiskdelay,adatapageshouldbedeletedifithasnotbeenreusedafterabout5minutes.

VMINusesthewindowofsizeTasthethresholdpointatwhichretaininguntilnextreusecoststhesameasapagefault.Tostatethisprecisely,supposetimet+xisthenextreuseofthepagereferencedattimet;VMINdecideswhethertoretainthatpageornotasfollows:

Ifx>Tthenimmediatelyevictthepagefrommemory;

Ifx≤Tthenkeepthepagecontinuouslyinmemoryuntilnextreuse.VMINexercisesthischoiceateverytimet.Becauseeachandeveryreferenceisevaluatedforminimumcost,thetotalcostmustbeminimumtoo.

VMINisjustlikeWS,butwithaforwardlookingratherthanbackwardlookingwindow[den80].Ifthepageisusedagainintheforwardwindow,VMINretainsitinmemoryuntilnextuse.Ifthepageisnotusedagainintheforwardwindow,VMINimmediatelyremovesitfrommemory.VMIN’spagefaultsequenceisidenticaltoWS.

Whenthereuseintervalxis≤T,WSandVMINretainthepagecontinuouslybetweenthetwosuccessivereferences.Ifx>T,WSretainsthepageforadditionaltimeuntilevictingit,whereasVMevictsitimmediately.Thismeansthatthespace-timedifferencebetweenWSandVMINisdueentirelytotheoverhangsinthereuseintervalslongerthanT.ThisobservationallowsustocalculatetheVMINspace-timevt(T)directlyfromreusecounters.Startwithvt(1)=Nbecausewithwindowsize1

Page 31: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 31

onlythecurrentreferencesareintheVMINmemory.Areuseintervaloflengthk≤Tcontributesanadditionalk-1tothespace-time.Thus,

𝑣𝑡(𝑇) = 𝑁 +7(𝑘 − 1)𝑐(𝑘)*

(&!

Wegetarecursionstraightway,

𝑣𝑡(𝑇 + 1) = 𝑁 +7(𝑘 − 1)𝑐(𝑘) = 𝑁 +7(𝑘 − 1)𝑐(𝑘) + 𝑇𝑐(𝑇 + 1)*

(&!

*,!

(&!

orsimply,𝑣𝑡(𝑇 + 1) = 𝑣𝑡(𝑇) + 𝑇𝑐(𝑇 + 1)

Therefore,aswithWS,theVMINspace-timeandmissratecanbecalculateddirectlyfromthereuseintervalcountersc(k),withoutasimulationofVMIN.10

WecancalculatethedifferenceofWSandVMINspace-timesimplyasthetotaloverhanginreuseintervals>T.Specifically,theoverhangisT-1inanyreuseorendinterval>T;byourpreviouscalculationstherearemwh(T)ofthese.Forallreuseintervalsoflengthk≤T,thereisnooverhang,butendintervalsoflengthk≤Thaveanoverhangofk-1.Thisleadstotherelation

𝑠𝑡(𝑇) = 𝑣𝑡(𝑇) + (𝑇 − 1)𝑚𝑤ℎ(𝑇) +7(𝑘 − 1)𝑒𝑐(𝑘)*

(&!

Figure12illustratesthisforthedataoftheprevioustable.

Thelessonfromallthismathisthatwecanquicklycomputethespace-timeofaVMINpolicyfromthesamereuseintervalstatisticsasforWS.

10Bycomparison,theWSspace-timerecursionhasamiss-ratetermmwh(T),thesumofcounters,insteadofasinglecounterc(T+1).ThatisbecausewhenweincreaseTtoT+1,theWSoverhanggrowsinallthereuseintervals>T,thetotalnumberofwhichisthesumofallthecountersc(k)fork>T.VMINisparsimonious:increasingTtoT+1onlyaddsthespace-timeofreuseintervalsofexactlylengthT+1.

Page 32: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 32

Figure12.ThisgraphplotstheVMINspace-timeandtheWSspace-time.Ifwepickanyvalueofmisscount,wecancomparetheVMINandWSmemorydemands.Forexample,atmisscount8,VMINspace-timeis30page-unitsandWSspace-timeisabout47;theVMINmemorydemandisabout1/3smallerthanWSforthesamepagingrate.NoticethattheVMINcurveisalwaysconvex,whereasWSisnot.Becausetheexampleaddresstraceistooshorttoexhibitlocality,WSandVMINarenotclose.

GoodLocalityMakesWorkingSetNearOptimalWewouldliketoclarifyapointabouttheassumptionsbehindthemathematical

formulasgivenabove.Theonlyassumptionisthatanaddresstracecanberecordedfromacomputation.Therecursionformulasaresimplymathematicalrelationships

Page 33: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 33

formeasurescomputedonaddresstraces.Theworkingsetandfaultmeasuressimplycounteventsintheaddresstrace.Optimalitydefinesthebestthatcanbedoneonagivenaddresstrace.Therearenoassumptionsaboutlocalitybuiltintotheworkingsetandoptimalitymeasures.Workingsetsareunbiasedmeanstomeasurewhetherlocalityispresentornot.

Thepagereferencemapisconstructedfromanaddresstrace.Localityappearsinthedistinctivephase-transitionpatternsofthemap.Theworkingsetwasdevisedtomeasurethelocalitysetsseeninthemap.Itissimplyameasurementtool.Itmakesnoassumptionsaboutlocality.

LocalitycomesinwhenwewanttoclaimthatWSisclosetotheoptimalVMIN.Let’sseehowthatworkswiththereferencemapinFigure1.

ConsideralocalitysetofPpagesanditsassociatedphaseconsistingofLsamplewindowsoflengthT.EachsamplewindowwithinthephasecontributesPTspace-time.VMINandWShavetheidenticalmemorycontentseverywhereinthephaseexceptforthefirstandlastsamplingwindows.Inthefirstwindowofthephase,theybothacquirethePpagesofthenewlocalityatthesamemomentsofpagefaults;butWShassomeadditionalspace-timebecauseitretainspagesfromthepreviousphase.LetfandgbethefractionofPTusedrespectivelybyWSandVMINinthefirstsamplewindowandhbethefractionofPTusedbyVMINinthelastwindow.Thenthespace-timeforWSacrossthewholephaseisfPT+(L-1)PT.ForVMINitisgPT+(L-2)PT+hPT.TheratioofWStoVMINis

𝑠𝑡(𝑇)𝑣𝑡(𝑇) =

𝑓𝑃𝑇 + (𝐿 − 1)𝑃𝑇𝑔𝑃𝑇 + (𝐿 − 2)𝑃𝑇 + ℎ𝑃𝑇 <

𝐿𝐿 − 2

Theinequalityresultsfromreplacingfwithitsupperbound(1)inthenumeratorandgandhwiththeirlowerbounds(0)inthedenominator.

Figure1shows5localitysetsandtheirassociatedphasesforasamplingwindowTofapproximately380Kunits.AmeasurementofthegraphtofindthelocalitysetsizesanddurationsyieldstheTable3,wheresizeisthenumberofpagesinthelocalitysetandlengthisthenumberofsampleintervalsinaphase.

Table3

set size length L/(L-2)

1 50 180 1.01

2 65 220 1.01

3 30 220 1.01

4 55 50 1.04

5 35 180 1.01

Page 34: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 34

ThefinalcolumninthetableaboveshowstheboundingratioforeachofthephasesofFigure1.Forthelongerphases,VMINandWSareabout1%apart.Fortheshortphase,theyareupto4%apart.Amorepreciseanalysiswouldhavenarrowerseparationsthanthisapproximateanalysis.

VMIN’svirtualspacetimeislessthanWS’s.Howdoesthistranslatetorealspace-timeasneededbythememoryspace-timelaw?Inournotationthelawbecomesst(T)(1+Dmc(T)).SinceWSandVMINhaveidenticalpaging,theexpressionforVMINrealspace-timeisvt(T)(1+Dmc(T)).ThusVMIN’sminimumvirtualspacetimetranslatesdirectlytominimumrealtimespace-timeandmaximumthroughput.ThisiswhywhenWSisclosetoVMIN,theirrespectivesystemthroughputsareclose.

Insummary,workingsetanalyticsprovidestoolsformeasuringlocality.Whenlocalityispresent,aworkingsetmemorymanagementpolicywillsetsystemthroughputtowithinafewpercentofoptimal.

ComparisonsMemorypoliciescanbeclassifiedintwodimensions:fixedorvariablespace,

optimalornot.Wehavechosenrepresentativesofeachcombination:

Table4

Fixed space Variable space

Not optimal LRU WS

Optimal MIN VMIN

Theoptimalpoliciesaregenerallynotrealizableinrealtimebecausethey

requireknowledgeofthefuturereferencepatterns.Althoughtheoptimalpoliciesarenotrealizable,theirperformanceiseasytocomputefromarecordedaddresstrace.Therefore,itiseasytomeasurehowfararealizablepolicy(LRUorWS)isfromoptimal.

Fixedspacepoliciesareusedformemoriesoffixedsizesuchascachesorhardpartitionsofmemoryinanoperatingsystem.Fixedspacepoliciesaresubjecttothrashingwhenextendedtosharedmemoryorcache,becausethespaceallocatedtoeveryjobdecreasesasthenumberofjobs.TheWSpolicy(Figure6)isresistanttothrashingbecauseitcannotstealpagesfromotherprocessesandbecausetheschedulerwillnotloadnewprocessesiftheirworkingsetsdonotfitintotheavailablefreespaceofmemory.

Page 35: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 35

Whathappenswhenthesepoliciesareappliedforjobsdisplayingastrongdegreeoflocality?Whenthememoryallocatedbythepolicyistoosmalltoholdthejob’slocalitysets,LRUandWSwouldbecomparable–butwithpoorperformance.Whenmemoryissufficienttoholdlocalitysets,LRUandMINwouldbecomparable,andWSbetterthanbothbecausethefixed-spacepoliciesretainpagesnotbeingusedinalocalityset.

Moreover,LRUisalsosusceptibletothrashingwhenextendedtosharedmemory.WSwillnotthrashanditisclosetooptimal.

ChoosingtheWindowSizeWhatisagoodwindowsize?Isthereabestwindowsizeforeachprocess?

Whatwouldhappenifallprocessesweremeasuredbythesamewindowsize?

Thisquestionhasbeeninvestigatedbyresearchersdatingbacktothe1970s.OnebasicfindingisthatgraphsofWSspace-timeversusTshowthattheminimumofst(T)isinawide,near-flatplateauinmostjobs.ThereisusuallyasinglevalueofTthatintersectsalltheplateausofthejobs.Inotherwords,itdoesnotmakemuchdifferencewhatvalueofTyouchoose;Tcanbeasingle,globalvaluechosenoverawiderangewithoutsignificantlychangingthesystemthroughput.

Fromthepagereferencemaps,weseethatanidealTisjustlargeenoughtoseeallthelocalitypagesinthewindowthroughoutthephase.InFigure1,forexample,T=380Kissufficienttoseeallthepagesofthelocalityset.ThatvalueofTisasmallfractionofthephaselength–lessthan0.5%forthefourlongphasesand2%fortheoneshortphase.

Thisgivesapracticalanswertotheperformancetuningproblemmentionedatthebeginning.AsystemadministratorcanadjusttheparameterTinaWS-managedsystemtofindavaluethatmaximizessystemthroughput.Afterthat,theparameterTdoesnotneedtobechanged.ThatmaximumwillbeclosetothetheoreticaloptimumofVMIN.

ImplementationsThereareseveralwaystoimplementtheworkingsetcheaply.Theoriginalideasforimplementation(1968)wereoftwokinds[den68a].The

firstwastohavetheoperatingsystemscanandresettheusebitsinpagetableseveryTtimeunitsofvirtualtime,removinganyunusedpagesfromworkingsets.ThisimplementationwasnotattractivebecausescanningallthepagestableseveryTtimeunitsinalargesystemwouldbeveryexpensive.

ThesecondideawastoassociateahardwaretimerofdurationTwitheachpageframeofmemory.ThetimerissettoTateachaccesstothatframe.Itticksdownto0afterTtimeunitsofnon-use.Theoperatingsystemcandetecttheseunused

Page 36: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 36

framesfromtheirexpiredtimersandremovethemfromtheirworkingsets.Thisimplementationwasnotattractiveatthetimebecausetherequiredhardwarewouldbetooexpensiveand,moreover,becauseofCPUmultiplexing,thetimerswouldneedtobeshutoffwhenthejobowningthepagewasnotrunning.

ChenDingandhisstudentsatUniversityofRochesternotethatinmoderncaches,multipleCPUsexecuteinparallelandshareon-chipL1andL2cache.Thehardwareiseasilydesignedtoincludetimersoncachepages.Becausetherearemultiplecores,thereisnoneedtoshutanytimersofftoaccommodateround-robinCPUscheduling.Theirproposalofa“leasecache”,inwhicheachcachepagehasitsowntimer,isnowfeasible[li19].AleasecachewithleaseTisexactlytheworkingsetpolicywithwindowT.

AsimplemodificationofthepopularCLOCKalgorithmenablestheoperatingsystemtodetecttheworkingsetpagesofajob.Asbefore,theprocess’spagesarearrangedinacircularlist,butnoweachpagehasatimestampinsteadofausebit.Thetimestamprecordsthetimeofthelastaccesstothepage.ThescanningclockhandskipsoverallpageswhosetimestampiswithinTofthecurrenttime,andstopsatthefirstpagewithanoldtimestamp.Thismethod,calledWSCLOCK,wasinventedandvalidatedbyRichardCarr[car81].

NotethatWSCLOCKhasnobuilt-inloadcontrol.Byitself,itcannotimplementthefullWSmultiprogrammingpolicy(Figure6),whichdoesnotloadanotherjobuntilthereissufficientfreespaceforitsworkingset.However,theWSCLOCKsearchtimeforapagewithanoldtimestampcanbeaproxyforthemeasureoffreespace:thelongerthesearchtime,thelessfreespaceisavailable.Thememoryallocatorcanloadanewjobwhenthesearchtimeisbelowasetthreshold.(NotethatthememoryallocatorisnotthesameastheCPUcorescheduler.Thememoryallocatordecideswhentoloadajob’sworkingsetintomemory.TheschedulerassignsfreeCPUcorestojobswhoseworkingsetsareloaded.)

ApplicationinModernCachesModerncomputerchipsrelyoncomplexhierarchiesofcachestoachieveand

maintainhighperformance.TypicalcachelevelsL1,L2,andL3bufferthegapbetweentheRAMandtheCPU.LevelL1isclosesttotheCPUandisthesmallestandfastestofthecaches.Therearetwokindsofcacheconfigurations.Inclusivecachemeansthatcachepages(alsocalledslotsorblocks)ofalevelaresubsetsoflargerpagesatthenextleveldown.Exclusivecachedoesnotrequirethissubsetting[ye17].ThesecachesusuallyusesomeformofLRUreplacementtodeterminewhenacachepageispusheddowntothenextlowerlevelcache.Thehighest-levelcache(L1andL2)isdedicatedtoindividualCPUcores,whileL3cacheissharedamongallthecoressimilartomultiprogramming[ye17].ThesharedL3cachemaybepartitionedunequally,withsomecoresgettingmorethanothersdependingontheirlocality[bro15].ResearchersareinvestigatingwhetheravariablepartitionbasedonCPUcoreworkingsetswouldbemoreefficient.

Page 37: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 37

ChenDing’sresearchgrouplooksdeeplyintopolicyandperformancequestionsforsharedcaches[lia19,xia11,xia13,xia18].Manyoftheiranalyticmethodsforcacheswereinspiredbytheworkingsettheory.Theydefinedtwomeasuresofloadonthecache,calledfootprintandfootmark.I’lldiscussbothofthemandcomparewithWS.

Thefootprintmeasurewasmotivatedbyadesireforaccuracy.ForthefirstT-1referencesofatrace,theworkingsetsarethecontentsofatruncatedwindowbecausetherearenopagereferencespriortotimet=1.Infact,theinitialworkingsetsforthefirstT-1referencesareofsizesw(t,t).Thosetruncatedwindowspresentapotentialunderestimateinthecalculationsofmeanworkingsetsize.ChenDing’sfootprintavoidsthesecomplicationsbyaveragingtogetheralltheworkingsetswhoseTwindowsfitcompletelyinsidetheaddresstrace:

𝑓𝑝(𝑇) = 1

𝑁 − 𝑇 + 17𝑤(𝑡, 𝑇)$

%&*

Dingarguedthatthederivativeofthefootprintfunctionisthemissrate,justasindicatedbytheworkingsetrecursion.Thus,themissratecanbecomputedoncethefootprintiscomputed.Histeamfoundarecursionforcomputingfp(T)fromfp(T-1)andshowedthatthemeanfootprintiswithinonepageofthemeanworkingsetifthewindowisnottoolarge:

𝑠(𝑇) − 𝑓𝑝(𝑇) < 1,𝑖𝑓𝑇 ≤ √2𝑁Thiswillbeeasilytrueforprogramswithgoodlocality.DetailsareintheAppendix.

ChenDingwasnothappywiththemessinessofalgorithmsforcomputingfootprint.HesearchedforarecursionbasedonfootprintthatwasidenticaltotheonereportedbyDenningandSchwartzforlongaddresstracesinstochasticsteadystate.Heandhisstudentsfoundanew,closelyrelatedmeasuretheycalledfootmark[yu18].FootmarksatisfiestheDenning-Schwartzrecursion

𝑓𝑚(𝑇 + 1) = 𝑓𝑚(𝑇) + 𝑚(𝑇)

withinitialconditionsfm(0)=0andm(0)=1.ThedetailsofitsderivationareintheAppendix.

Wenowhavetworecursions–workingsetsizeandfootmark–thatsatisfythesamemathematicalform,

𝐹(𝑇 + 1) = 𝐹(𝑇) + 𝑀(𝑇)

whereFisaspacemeasureandMamissesmeasure.TheworkingsetusesforMthewarm-start-hot-finishmissratemwh(T)andfootmarktheactualworkingsetmissratemc(T).Forfiniteaddresstraces,thetwomissratesdifferbyafewendcorrections.Butforverylongtraces(largeN),theybecomeidenticalandthetworecursionsarethesame.Thatreplicatesthe1972findingforthesteady-statevaluesofworkingsetsizes(T)[den72].Likefootprint,thefootmarkmeasureiswithinonepageofthemeanworkingsetsizewhenT≤√2𝑁.Figure13illustratesthesemeasuresfortheexampleaddresstraceusedpreviously.WhenthereisariskthatT

Page 38: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 38

>√2𝑁,thebeststrategyistousetheworkingsetrecursion,whichisfullyaccurateandiseasilycomputedfrommwh(T),whichinturndiffersfrommc(T)byafewendcorrections.

Figure13.Thisgraphcomparesthefootprint(FP),meanworkingset(WS),andfootmark(FM)measuresforthe5-pageexampleaddresstrace.Theworkingsetrisesasymptoticallytoward4.0pages.Boththefootprintandfootmarkcontinuegrowing.WhenT≤√𝟐𝑵=5(verticalline)footprintandfootmarkarewithin1pageofWSaspredictedbythebound.Forexample,inthepagereferencemapofFigure1,thereareover1000sampleintervals;Tcouldbeaslargeas45(= √𝟐𝟎𝟎𝟎)sampleintervalswithoutcausingsignificanterrorbetweenfootmarkandmeanworkingset.WhenNisverylarge(notshowninthisdiagram),thetwomeasuresFMandWSconvergetothesame.

Page 39: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 39

WorkingSetsinParallelEnvironmentsThecomputingenvironmentsinwhichvirtualmemorywasformulatedfeatured

asingleCPUaccessingamemorysharedbymanyjobs.Itwasbasicallyaserialenvironment.TheCPUofoldhasevolvedintomodernmulticorechips,graphicsprocessors(GPUs),andseveralkindsofcache.Doesallthisparallelisminvalidateanyassumptionsoftheanalytics?Dographicsandneuralnetworks,theprimaryapplicationsofGPUs,exhibitlocality?

Asforthefirstquestion,theassumptionsofworkingsetanalyticsdoindeedapplyinaparallelenvironment.Theanalyticsdependonlyontheassumptionthatanaddresstracecanbemeasured.Aparticularrunofaparallelsystemyieldsanaddresstraceforthatrun.Thenextrunofthesamesystemmayyieldadifferentaddresstracebecausetheorderofeventsisdifferentortheinputdataaredifferent.Butthisdoesnotcreateanyproblemsfortheanalytics,becausetheanalyticsaredefinedforagivenaddresstrace.WSwillrevealthelocalitysetsofthattraceandadapttothem.VMINwillrevealthebestpossibleperformanceforthattrace.Whathappensinothertracesdoesnotaffecttheanalytics.11

Asforthesecondquestion,considerthatasystemisasetofCPUsaccessingasetofpagesinmemory.WheneveranyCPU,runninganyjob,accessesapage,thepage’sleaseissettoTandcountsdowntowardzerowitheachclocktick.Theprotocolfordecidingwhethertokeepapageinmemoryisthesamewhetherornotjobssharepagesorruninparallel.

Whethertheleaseruleoptimizessystemthroughputdependsonthemixbetweenpartsofjobswithgoodlocalityandpartswithout.Thereislotsofdebatearoundthisissue.AGPUworkingwithstreamingdata(forexample,runningagraphicsdisplay)orwithsimulationofaneuralnetwork(doingthelinearalgebracalculationsforeachlayer)maynotexhibitphase-transitionbehavior.Buttherestofthecomputation,suchasfeedbackforreinforcementlearninginaneuralnetworkandtheuserinterface,islikelytoexhibitgoodlocalitybehavior.Thismeansthatlocalityisimportantforoptimizingtheperformanceofthenon-GPUcomponentsandmanagingcommunicationbetweenjob-componentsrunningonconventionalCPUsandthoserunningonGPUs.Otheroptimizations,suchasstreamingcaches,canbeappliedtotheGPUparts.

WiderApplicationofWorkingSetPrinciplesTheprinciplesofmemorymanagementintheworkingsetmodelhavebeenused

outsidethetraditionaloperatingsystemsthathatchedthem.Thetwomost

11InhisbookRethinkingRandomness,JeffBuzendiscusseshowmanyaspectsofcomputersystemscanberepresentedaseventtraces[buz15].Thetraditionalalgorithmsofqueueingtheoryforutilization,throughput,andresponsetimeworkforthesetracesbasedpurelyonwhatcanbeobservedinthetrace.Theaddresstraceandworkingsetanalyticsareofthiskind.

Page 40: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 40

prominentarecontentdeliveryintheInternetandmanagementofinventoriesandlogisticsnetworks.

Startwithcontentdelivery.TheInternethostsmajorservicesthatdistributedatathroughouttheworld.Examplesaredistributionofmusic,video,books,andcloudservices.Eventhoughtheseserviceslooklikeasingleentity,theyarebuiltasdistributednetworksbecausecongestionatasingleserverwouldbetoogreatforacceptableservice.Thenetworksincludecacheslocatedinbusyzones,suchasmajorcities,sothatusersinthoseareascanaccessthecontentviaphysicallyshorthighbandwidthpaths.Contentisautomaticallydownloadedtoalocalcachewhenauserofrequestsnewdata.Datathathavebeenresidentforalongtimeareautomaticallyrefreshedatintervals.

Thesecachesareoftencalled“edgecaches”becausetheyrepresentmovingdataawayfromcentralizedlocationsthatareeasilycongested,totheedgeofthenetworkwheretheusersconnect.Numerouscompaniesemployedgecachesaspartoftheircontentdeliveryservices.

Anedgecachefunctionsinthesamewayasacacheinsidetheoperatingsystem.Whenablockofdataisaccessedforthefirsttime,itisloadedintotheedgecachefromacentralserver.AnLRUreplacementpolicyoraleasepolicyisusedtoremoveolddatawhenthecacheisfull.Thesecachesaccumulateworkingsetsandachievehighperformancebykeepingtheworkingsetsphysicallyclosetotheirusers.Allcachesperformwellbyexploitingtheprincipleoflocalityinthepatternsofhowusersaccessdata.

Turnnowtoinventorymanagement.Aninventoryisasetofitemsorparts.Alogisticsnetworkisanetworkofdepotsthatholdinventoriesofpartsusedbycompaniesormilitaries.Whenappliedtoinventories,aworkingsetisasetofpartsthathavebeenusedrecently.Depotmanagersaimtokeeplocaldepotsfilledwiththemostneededworkingpartssothattheycansupplytheirlocalusersrapidly.Whensomeoneasksforapartnotinthelocaldepot,themanagermustsendarequesttoanotherdepottoobtainthepart.

Giventheprevalenceoflocalitybehavior,itisreasonabletoexpectuseraccesspatternsforinventoriestoexhibitlocalitysets(subsetsofpossibleparts)andphases(intervalsofheavyuseofaparticularlocalityset).Workingsetanalyticscouldbeusedtodeterminethecapacityneededofaninventorydepot.

Thereisadifferencebetweena“logisticsworkingset”anda“memoryworkingset”.Thelogisticsworkingsetincludesmultipleinstancesofapart,whereasthememoryworkingsethasjustoneinstanceofapage.Thenumberofrequestsforaparticularitemintheworkingsetwindowforecaststhequantityofthatitemtokeeponhand.Itwouldbeworthwhiletostudythisideaoflogisticsworkingsetsandtheirphasetransitionmapstoseewhatinventorymanagementstrategiesgiveoptimalperformanceofthelogisticsnetwork.

Page 41: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 41

MemoryMisconceptionsBeforeclosingthistutorial,Iwouldliketocommentonfourcommon

misconceptionsaboutcomputermemory.Theyresultfromunfoundedassumptionsthatleadtopessimisticanswerstofourbigquestions:

1. Ismemoryflattening?2. Isvirtualmemoryobsolete?3. IscomputermemoryirrelevantintheCloud?4. IsthePrincipleofLocalityobsolete?

IsMemoryFlattening?

Memoryisanessentialcomponentofcomputers.MostpeoplethinkofmemoryasacompaniontechnologythattheCPUinteractswithtofetchinstructionsandaccessdata.PerformanceofthecomputerthenseemstobegovernedbytheCPUspeed.

Butthisisnotso.MemoryisnotasinglecomponentasisaCPUchip.MemoryisasystemthatincludescachesintheCPU,RAM,disks,networkservices,andCloudstorage.WhereasaCPUcorerunsonlyonejobatatime,memoryissharedamongmanyjobs.Memorycontentionbirthsqueueingathigh-demandservers.Diskandnetworkbottlenecksatthesequeueswillkillperformanceofmemorysystems.Asdiscussedinthistutorial,wehavelearnedhowtoorganizetheallocationofmemoryanddatatransferssothatweavoidthesebottlenecks.

Athoughtexperimenthighlightsthisbasicrealityaboutmemory.Alotofpeoplebelievetheclaimthatwithinafewyearswewillknowhowtomakememoriesthatareessentiallyinfiniteandflat.Flatmeansthattheaccesstimeofanyiteminthememoryisapproximatelythesame.Flatmemorywouldnotneedlocalityoptimizationbecauseeverypagewouldhavethesameaccesstime.Butflatmemoryisnotathingofthefuture.Itisalreadyhere–theInternet.Ifyouissuea“ping”commandfromanywhereintheInternetforanyIPaddressintheInternet,youwillseethatthepacketroundtriptimesaremostly30-90milliseconds.Thatisnotperfectlyflatbutisclose.Doesthatmeanyourcomputerwillexperienceamaximumdelayof90millisecondswhenitreachesouttoawebserver?Ofcoursenot.Iftheserverisverypopular,manycomputersallovertheInternetwillbetryingtoaccessit.Theserverwillqueueuptherequestsbecauseitsdisksareabottleneck.Thegreaterthedemandforthepopularwebserver,thelongerthewaittimeinthequeue–andthelongertheresponsetimetogetawebpage.Inotherwords,flatmemorysystemsdonotguaranteegoodperformance.ThisiswhytheInternetcontainssomanyedgecaches–theyspreadtheloadandavoidthequeueing.

ThesameproblemappearsatsmallerscalewhenmanyjobsshareRAMtogetherandcontendforuseofthepagingdisks.Workingsetmemorymanagementpreventsdiskoverloadandprotectsthesystemfromthrashing.

Page 42: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 42

Thecontrolsystemsthatpreventexcessdelayinaccessingsharedmemoryresourcesareanimportantpartofthememorystory.Mostusersandprogrammersarenotawareofthosecontrolsystems.

IsVirtualmemoryObsolete?

Virtualmemorywasinventedtoovercomethehighprogrammingcostofsolvingthe“overlayproblem”--manuallyplanningpagetransfersthatoverwritepreviouslyloadedpages.Today’smemoriesareusuallylargeenoughtocontainanentireprogramanditsdata.Thus,itwouldseemthatvirtualmemoryisnotusefulanymoreforsolvingtheoverlayproblem.

Theoverlayproblemismuchlessimportantthaninearlyvirtualmemories.Virtualmemory’srealvaluecomesfromitsabilitytopartitionmemory.Itallocatesnonoverlappingsubsetsofpagestoeachjob’smemoryandpreventsanyjobfromaccessingdatainanotherjob’smemory.Virtualmemoryprovidesthebasistoencapsulateuntrustedsoftwaresothatitcannotdamageanythingoutsideitsallocatedmemoryspace.Virtualmemoryprovidesbasicaccesscontrolbydistinguishingread,write,andexecutepermissionsforindividualpages.Inotherwords,virtualmemoryimplementsthebasicguaranteesofanoperatingsystemfordataprotection.

Virtualmemorydoesmorethanpartitionmemoryandprovidebasicaccesscontrol.ItalsomanagesmultiprogrammedRAMtoavoidthrashingandtomaximizesystemthroughput.ThesameconcernsforstabilityandthroughputnowoccurinthecacheontheCPUchip,whichissharedamongthemultipleCPUcoresonthechip.Theprinciplesofvirtualmemoryareimportantincachedesign.

IsComputermemoryirrelevantintheCloud?

TheCloudprovidesverylargestorageinthenetworkoutsideyourcomputer.Itdoesnotincludethestorageinsideyourcomputer.YourcomputerstillneedsL1,L2,L3cache,localRAM,andlocalstorage.Theoperatingsystemonyourcomputerneedstomanagetheselocalmemoryresources.TheCloudenlargesthestorageaccessibletoyourcomputerbutdoesnotreplacestoragemanagementwithinyourcomputer.

TheCloudisacomplex,distributedstoragesystem.Itconsistsofmanydatacentersaroundtheworld.Eachdatacenterincludestensofthousandsofcomputersanddisks.Sophisticatedredundancycontrolsmanagemultiplecopiesoffilesdistributedacrossmultipledatacenters,providinghighreliabilityandeaseofrecoveryfromfailuresofcomputersanddisks.TheoperatingsystemsrunningtheCloudmustalsomanagememory,avoidthrashing,andmaximizeCloudthroughput.

TheCloudisnotthefinalanswertostoragemanagement.OneoftheprimarylimitationsonperformanceofcomputersistheinterfacesbetweenCPUsandmemorysystem.MovingdataacrossthoseinterfacessignificantlyslowsCPU

Page 43: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 43

speeds.TheCloudisanotherinterface.ThoseinterfacedelaysareoftencalledvonNeumannbottlenecksbecausetheseparationofCPUandmemorywasacentralprincipleofthestored-programcomputerarchitecturethatbecameubiquitousafter1945.ManyresearchprojectstodayareexaminingnewarchitecturesthanhavenovonNeumannbottleneck.Theideaistobuildthecomputersothatcomputationsareperformedbyhugenumbersofprocessingelementsthatrequireonlylocalaccesstolimitedstorage.Theseareoftencalled“processing-in-memoryarchitectures”.Neuralnetworksareaprominentexample.

IsThePrincipleofLocalityobsolete?

TheheartofallthemethodstocontrolmemoryandoptimizeitsperformanceisthePrincipleofLocality.Eachprogramgeneratesauniquefootprintoflocalitysetsandphases.Highperformancecaches,RAM-DISKinterfaces,andnetworksallowetheirsuccesstothisprinciple.

Thelocalityprinciplerunsdeepincomputing.Algorithmtheoristshaveshownthataprocedurecannotbeanalgorithmunlesseachofitsoperationscanapplyonlytoabounded,localsetofdata.Computingmachinesthemselvesarebuiltfrommanycomponentsandmodulesthatuseonlylocalinputs.

Yetthelocalityprincipleformemoryisoftenmisunderstood.Somepeoplebelieveitsimplymeansunequalfrequenciesofuseofeachpage.Othersbelieveitmeansaslowdriftamongasetoffavoredpages.Fewseeitaslongphrasesofnearconstantlocalitysetspunctuatedbysharptransitionstonewlocalitysets.Yetthephase-transitionbehavioriscommonandisthereasonwhyworkingsetisabletogivenearoptimalsystemthroughput.

Anothermisunderstandingisthebeliefthatincentiveshavechanged.Intheearlydays,programmersofearlyvirtualmemorysystemspurposelyinducedlocalitybehaviorinordertogetthebestperformancefromthepagingalgorithms.Weareinadifferentagenow:memoryisnowherenearasscarceandprogrammersfeelnopressuretoinduceworkingsetsintotheirprograms.Thus,itwouldseemthatthemotivationforlocalityisgone.Thismisunderstandingisrefutedbytwofacts.Oneistheexperimentalstudiesshowingthatlocalitybehaviorispresentinthesourcecodeofprograms–thephase-transitionbehaviorappearinginpagereferencemapsisanimageofsourcelocality.Localityistheconsequenceofourproblem-solvingstrategiessuchasiterativeloopsanddivide-and-conquer.Theotherrefutationisrecentinstrumentationsshowingpagereferencemapswithevenmorepronouncedphase-transitionbehaviorthanwesawintheearlydays.Theincreaseduseofmodularprogramsisthemostlikelyexplanation.(SeeMcMenamin’sstudyofLinuxprograms[mcm11].)

Page 44: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 44

ConclusionTheWorkingSetModelforprogrambehavior,firstarticulatedin1968,has

stoodthetestoftime.Itstimulatedtheresearchthatrevealedthatalmostallexecutingprocessesexhibitlocality,establishingthePrincipleofLocalityasoneofthefundamentalprinciplesofcomputing.Itledtoasimple,precisewaytomeasurealltheworkingsetstatisticsinrealtimebyrecordingthereuseintervalstatisticsofanaddresstrace.ItledtotheconclusionthatWSisnear-optimalforprocessesexhibitinglocality.Itprovidedanexplanationforthemysteryofthrashingandshowedhowtobuildacontrolsystemtoavoidthrashing.

Since1968,computerCPUshavebecomeprogressivelyfaster,wideningthecostgapofretrievingapagefromalowerlevelofmemory.Atthesametimememoryhasbecomefarcheapersothatmostroutineapplicationsarefullyloadedinmemoryanddonotnormallycausepagefaults.Somepeoplehaveaskedwhetherthetheoryappliestorealsystemstoday.

Yes,itdoes.Realsystemstodayhavememoriesconsistingofmultiplelevelsofcache,RAM,anddisk.ThecachesnearCPU(L1andL2)aresharedamongmultiplecores(CPUs).Mostofthecurrentcachemanagementstrategiesdonotvarythepartitionofthecacheamongthecores,butresearchershavebeendemonstratingthatvariablepartitioncachesaremuchmoreefficient.Unfortunately,variablepartitionLRUcachesaresusceptibletothrashing.TheWStheorycanbehelpfultodesigncachemanagementstrategies,suchasleasecache,thatdonotthrashandmaintaincachemissestoclosetotheoptimallevels.

Despitethetremendousadvancesinmemorytechnologyoverthepasthalfcentury,thebasicassumptionsbehindmemorymanagementhavenotchanged.Virtualmemoryremainsusefulbecauseofitsabilitytoconfinejobstotheirlimitedregionsofmemory,toencapsulateuntrustedsoftware,andtomanageloadtoavoidthrashing.Flatmemorydoesnoteliminatetheneedforvirtualmemory;itintroducesitsownproblemsduetoqueueingandcongestionasmanyjobsaccesssharedmemoryresources.TheCloudaugmentsbutdoesnotreplacememorymanagementonlocalcomputers.Moreover,theCloudexacerbatesthevonNeumannbottleneckbetweenCPUandmemory.Thelocalityprincipleisfarfromobsolete–itcontinuestounderpinhighperformancememorysystems.Theworkingsettheorycanbeextendedandcombinedwithnetworkcachingtheoryforpossibleapplicationinlogisticsnetworksandinventorymanagement.

Workingsetsandvirtualmemorywillbepartsofcomputingforalongtimetocome.

Bibliography[aho71] Aho,A.,P.J.Denning,J.Ullman.1971.Principlesofoptimalpage

replacement.J.ACM18,1(January),80-93.

Page 45: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 45

[bel66] Belady,L.A.1966.Astudyofreplacementalgorithmsforvirtualstoragecomputers.IBMSystemsJ.5,2,78-101.

[bro15] Brock,J.,C.Ye,C.Ding,Y.Li,X.Wang,andY.Luo.2015.Optimalcachepartitionsharing.In44thIEEEInt’lConf.onParallelProcessing(September),749-758.

[buz76] Buzen,J.P.1976.Fundamentaloperationallawsofcomputersystemperformance.ActaInformatica7,2,167-182.

[buz15] Buzen,J.P.2015.RethinkingRandomness.Amazon.comPlatform.SeealsoACMUbiquityinterviewsoftheauthor,https://ubiquity.acm.org/article.cfm?id=2986329andhttps://ubiquity.acm.org/article.cfm?id=2986331

[car81] Carr,R.W.andJ.Hennessy.1981.WSCLOCK—asimpleandeffectivealgorithmforvirtualmemorymanagement.InProceedingsoftheeighthACMsymposiumonOperatingsystemsprinciples(SOSP‘81).

[den68a] Denning,P.J.1968.Theworkingsetmodelforprogrambehavior.CommunicationsofACM11,5(May),323-333.

[den68b] Denning,P.J.1968.Thrashing:itscausesandprevention.InProceedingsofthe1968,FallJointComputerConference(FJCC),partI(AFIPS‘68(Fall,partI)).

[den72] Denning,P.J.andS.C.Schwartz.1972.Propertiesoftheworkingsetmodel.CommunicationsofACM15,3(March),191-198.

[den78a] Denning,P.J.andD.L.Slutz.1978.Generalizedworkingsetsforsegmentreferencestrings.CommunicationsofACM21,9(September),750-759.

[den78b] Denning,P.J.,andJ.P.Buzen.1978.Theoperationalanalysisofqueueingnetworkmodels.ACMComputingSurveys10,3(September),225-261.

[den80] Denning,P.J.1980.Workingsetspastandpresent.IEEETrans.SoftwareEngineeringSE-6,1(January),64-84.

[den16] Denning,P.J.2016.Fiftyyearsofoperatingsystems.ACMCommunications59,3(March),30-32.

[fot61] Fotheringham,J.1961.DynamicstorageallocationintheAtlascomputer,includinganautomaticuseofabackingstore.CommunicationsofACM4,10(October1961),435-436.

[gra76] Graham,G.S.1976.Astudyofprogramandmemorypolicybehavior.PhDdissertation,PurdueUniversityDeptofComputerScience.

[gra85] Gray,Jim,andFrancoPutzolu.1985.The5minuterulefortradingmemoryfordiskaccesses.TandemCorporationTechnicalReport86.1.Availablehttps://www.hpl.hp.com/techreports/tandem/TR-86.1.pdf

Page 46: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 46

[kah76] Kahn,K.C.1976.Programbehaviorandloaddependentsystemperformance.PhDdissertation,PurdueUniversityDeptofComputerScience.

[kil62] T.Kilburn,D.B.G.Edwards,M.J.Lanigan,F.H.Sumner.1962.One-levelstoragesystem.IRETransEC-11(April),223-235.

[li19] Li,P.,C.Pronovost,W.Wilson,B.Tait,J.Zhou,C.Ding,C.,andJ.Criswell.2019.BeatingOPTwithstatisticalclairvoyanceandvariablesizecaching.InProc.24thInt’lConf.onArchitecturalSupportforProgrammingLanguagesandOperatingSystems(April),243-256.

[lia19] LiangYuan,ChenDing,WesleySmith,PeterDenning,andYunquanZhang.2019.Arelationaltheoryoflocality.ACMTransactionsonArchitectureandCodeOptimization(TACO)16,3(August),1-26.

[mad76] Madison,A.W.andA.P.Batson.1976.Characteristicsofprogramlocalities.CommunicationsofACM19,5(May),285-294.

[mat70] Mattson,R.,J.Gecsei,D.R.Slutz,andI.L.Traiger.1970.Evaluationtechniquesforstoragehierarchies.IBMSystemsJournal9,78-117.

[mcm11] McMenamin,A.2011.ApplyingWorkingSetHeuristicstotheLinuxKernel.MastersThesis,BirkbeckCollege,UniversityofLondon.Availableathttp://cartesianproduct.files.wordpress.com/2011/12/main.pdf

[pri76] Prieve,B.G.andR.S.Fabry.1976.VMIN--anoptimalvariable-spacepagereplacementalgorithm.CommunicationsofACM19,5(May1976),295-297.

[ran68] Randell,B.andC.J.Kuehner.1968.Dynamicstorageallocationsystems.CommunicationsofACM11,5(May),297-306.

[say69] Sayre,D.1969.Isautomaticfoldingofprogramsefficientenoughtodisplacemanual?ACMCommunications13,12(December),656-660.

[spi72] Spirn,J.R.andP.J.Denning.1972.Experimentswithprogramlocality.ProcAFIPSConf41,SJCC.AFIPSPress.

[wan15] Wang,X.,etal.,"OptimalFootprintSymbiosisinSharedCache,"201515thIEEE/ACMInternationalSymposiumonCluster,CloudandGridComputing,Shenzhen,2015,pp.412-422

[wir14] Wires,J.etal.2014.Characterizingstorageworkloadswithcounterstacks.USENIXProc.11thConf.onOperatingSystemsDesignandImplementation,335-349.

[xia11] XiaoyaXiang,BinBao,ChenDing,YaoqingGao.2011.Linear-timeModelingofProgramWorkingSetinSharedCache.PACT2011:350-360

[xia13] XiaoyaXiang,ChenDing,HaoLuo,BinBao:2013.HOTL:ahigherordertheoryoflocality.ASPLOS2013,343-356.

Page 47: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 47

[xia18] XiamengHu,XiaolinWang,LanZhou,YingweiLuo,ZhenlinWang,ChenDing,andChenchengYe.2018.Fastmissratiocurvemodelingforstoragecache.ACMTransactionsonStorage14,2(April),Article12,34pp.

[ye17] Ye,C.,C.Ding,H.Luo,J.Brock,D.Chen,andH.Jin.2017.Cacheexclusivityandsharing:Theoryandoptimization.ACMTrans.onArchitectureandCodeOptimization(TACO)14,4,1-26.

[yua18] Yuan,L.,W.Smith,S.Fan,Z.Chen,C.Ding,andY.Zhang.2018.Footmark:Anewformulationforworkingsetstatistics.InInt’lWorkshoponLanguagesandCompilersforParallelComputing(October),61-69.Springer.

AcknowledgementsIamdeeplygratefultoJeffBuzen,ChenDing,ErolGelenbe,RolandIbbett,and

AdrianMcMenaminformanyconversationsthatsharpenedtheideaswrittenhere.AndIfondlyremembermyearlyteachersandcollaboratorsinthiswork,LesBelady,FernandoCorbato,JackDennis,RogerNeedham,BrianRandell,JerrySaltzer,StuartSchwartz,DonaldSlutz,andMauriceWilkes.

Page 48: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 48

APPENDIX:FOOTPRINTANDFOOTMARKChenDingattheUniversityofRochesterdefinedtwomeasuresofloadonthe

cache,calledfootprintandfootmark[lia19,xia11,xia13,xia18].Thefootmarkmeasurewasmotivatedbyadesireforaccuracy:forthefirstT-1

referencesofatrace,theworkingsetscontentsaretruncatedwindows,potentiallyunderestimatingmeanworkingsetsize.ThefootprintavoidsthisbyaveragingtogetheronlytheworkingsetswhoseTwindowsfitcompletelyinsidetheaddresstrace:

𝑓𝑝(𝑇) = 1

𝑁 − 𝑇 + 17𝑤(𝑡, 𝑇)$

%&*

Thefootprintandmeanworkingsetmeasuresdonotdiffersignificantlyunderpracticalconditions.Thedefinitionofmeanworkingsetsizeincludesthedefinitionoffootprint:

𝑠(𝑇) = 1𝑁7𝑤(𝑡, 𝑇) =

1𝑁

$

%&!

7𝑤(𝑡, 𝑡) +𝑁 − 𝑇 + 1

𝑁

*-!

%&!

𝑓𝑝(𝑇)

Sincewindowsizeisanupperboundforanyworkingsetsize,w(t,t)≤t,andthefirstsumhasanupperboundof(1+2+3+…+T-1)/N=T(T-1)/2N<T2/2N.Thesecondtermhasanupperboundoffp(T).Therefore,

𝑠(𝑇) < 𝑇"

2𝑁 + 𝑓𝑝(𝑇)

Thisreducesto

𝑠(𝑇) − 𝑓𝑝(𝑇) < 1,𝑖𝑓𝑇 ≤ √2𝑁

Inotherwords,themeanworkingsetsizeandfootprintarewithin1pageofeachotheraslongas𝑇 ≤ √2𝑁.Thiswillbeeasilytrueforprogramswithgoodlocality,asinFigure1.

Dingandhiscolleaguesalsodevelopedarecursionforfp(T)thatwouldenablecalculatingfootprintforallTinlineartime.Hereisaderivationofarecursion.StartbywritingthefootprintforwindowT+1:

𝑓𝑝(𝑇 + 1) = 1

𝑁 − 𝑇 7 𝑤(𝑡, 𝑇 + 1) = 1

𝑁 − 𝑇

$

%&*,!

J7𝑤(𝑡, 𝑇 + 1) − 𝑤(𝑇, 𝑇 + 1)$

%&*

K

Thetermsinvolvingw(t,T+1)canbereducedtotermsinvolvingw(t,T)usingtheworkingsetsizerecursiondevelopedearlier.Recallthat,whenTisincreasedtoT+1,alltherunsendinginareuseinterval>Tarelengthenedby1.Becauseeveryrunbeginswithapagemiss,andallthefirstreferencesintheinterval[1,…,T-1]startrunsthatextendinto[T,…,N],thenumberof1saddedtothepagereferencemapintheinterval[T,…,N]ismc(T).Inotherwords,thesamerecursionappliestotheextendedsum,withthesameendcorrectionsasbefore:

Page 49: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 49

7𝑤(𝑡, 𝑇 + 1) = 7𝑤(𝑡, 𝑇) + 𝑚𝑐(𝑇) − 𝑒(𝑇)$

%&*

$

%&*

Asaboveearlier,e(T)isthenumberoflastreferencesoccurringthelastTtimeunits.Definef(T)asthenumberoffirstreferencesinthefirstTtimeunits,andnoticethatf(T)=w(T,T+1).Thefootprintformulabecomes

𝑓𝑝(𝑇 + 1) = 1

𝑁 − 𝑇 J7𝑤(𝑡, 𝑇) + 𝑚𝑐(𝑇) − 𝑒(𝑇) − 𝑓(𝑇)$

%&*

K

Applyingthedefinitionoffp(T)andrecallingthatm(T)=mc(T)/N,thisbecomesthedesiredrecursion,

𝑓𝑝(𝑇 + 1) = 𝑁 − 𝑇 + 1𝑁 − 𝑇 𝑓𝑝(𝑇) +

𝑁𝑁 − 𝑇𝑚

(𝑇) −𝑒(𝑇) + 𝑓(𝑇)𝑁 − 𝑇

Thismessyexpressioncanbecomputedinlineartimebecausemc(T),e(T),andf(T)areallcomputableinlineartime.

ChenDingwasnothappywiththemessinessofalgorithmsforcomputingfootprint.Heandhisstudentsdefinedanewmeasuretheycalledfootmark.FootmarkwouldbelikeWSbutwouldcontainadditionaltermsthatcompensatedfortheshortworkingsetwindowsduringtheinitialsegmentofthetrace.Theydividedanaddresstraceintothreeregions:

• TheinitialsegmentoflengthT-1.Inthisintervaltheworkingsetsizeshavetheformw(t,t),effectivelyawindowsmallerthanT.

• ThesecondsegmenthaslengthN-T+1.Inthissegmenttheworkingsetsizesareoftheformw(t,T).ThissegmentincludesallthewindowsoflengthTasinfootmark.

• ThethirdsegmentisthelastT-1referencesofthetrace.Inthissegmentaseriesofphantomworkingsetsofsizesw(N,k)aredefinedwithprogressivelyshorterwindowsklookingbackfromtheendofthetrace.Theyare“phantoms”becausetheyarenotactuallyobservedinarealcache.Theideaisw(N,k)canbepairedwithw(T-k,T-k)intheinitialsegment;everypairspansafullwindowlengthT.Afterthepairing,theshort-windowworkingsetsoftheinitialsegmentarereplacedbyfull-windowworkingsets.TheneteffectisthatNworkingsetswithwindowTdefinefootmark.

Theseideasproducedthefollowingdefinitionoffootmarkspace-timeFM(T):

𝐹𝑀(𝑇) = 7𝑤(𝑡, 𝑡) +7𝑤(𝑡, 𝑇) +7𝑤(𝑁, 𝑘)*-!

(&!

$

%&*

*-!

%&!

Page 50: WORKING SET ANALYTICS 6/15/20 -- DRAFT v2 …denninginstitute.com/pjd/PUBS/working-set-analytics-2020.pdfWORKING SET ANALYTICS 6/15/20 -- DRAFT v2 Peter J. Denning Naval Postgraduate

Denning WorkingSetAnalytics 50

ThepairingargumentsaysthatthefirstandthirdsumscanbereplacedwithasinglesumofT-1termsw(t,t)+w(N,t),sothateveryworkingsetinFM(T)haswindowT.Therefore,thefootmarkisfm(T)=FM(T)/N.

Becausethefirsttwosumsarethedefinitionofworkingsetspace-time,

𝐹𝑀(𝑇) = 𝑠𝑡(𝑇) +7𝑤(𝑁, 𝑘)*-!

(&!

Wecannowapplytheworkingsetrecursiontodefineafootmarkrecursion:

𝐹𝑀(𝑇 + 1) = 𝑠𝑡(𝑇 + 1) +7𝑤(𝑁, 𝑘)*

(&!

= 𝑠(𝑇) + 𝑚𝑐(𝑇) − 𝑒(𝑇) +7𝑤(𝑁, 𝑘) + 𝑤(𝑁, 𝑇)*-!

(&!

wheree(T)isthenumberofendintervals≤T.Nowconsidertheworkingsetw(N,T).ApageisvisibleinthatworkingsetifandonlyifitsfinalreferenceoccursbeforetimeN-T+1.Thusthecontentsofthatworkingarepreciselythepageswhoseendintervalsare≤T,whichisthedefinitionofe(T).Thiscancelsthee(T)term.Thes(T)andsumtermscombineintothedefinitionofFM(T).Thus

𝐹𝑀(𝑇 + 1) = 𝐹𝑀(𝑇) + 𝑚𝑐(𝑇)WhenalltermsaredividedbyN,wegetthefootmarkrecursion:

𝑓𝑚(𝑇 + 1) = 𝑓𝑚(𝑇) + 𝑚(𝑇)

wherefm(T)isthefootmarkforwindowsizeTandm(T)istheworkingsetmissrate.Theinitialconditionsarefm(0)=0andm(0)=1.