intermittent computation without hardware support or programmer

17
This paper is included in the Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’16). November 2–4, 2016 • Savannah, GA, USA ISBN 978-1-931971-33-1 Open access to the Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation is sponsored by USENIX. Intermittent Computation without Hardware Support or Programmer Intervention Joel Van Der Woude, Sandia National Laboratories; Matthew Hicks, University of Michigan https://www.usenix.org/conference/osdi16/technical-sessions/presentation/vanderwoude

Upload: votruc

Post on 01-Jan-2017

225 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Intermittent Computation without Hardware Support or Programmer

This paper is included in the Proceedings of the 12th USENIX Symposium on Operating Systems Design

and Implementation (OSDI ’16).November 2–4, 2016 • Savannah, GA, USA

ISBN 978-1-931971-33-1

Open access to the Proceedings of the 12th USENIX Symposium on Operating Systems

Design and Implementation is sponsored by USENIX.

Intermittent Computation without Hardware Support or Programmer Intervention

Joel Van Der Woude, Sandia National Laboratories; Matthew Hicks, University of Michigan

https://www.usenix.org/conference/osdi16/technical-sessions/presentation/vanderwoude

Page 2: Intermittent Computation without Hardware Support or Programmer

Intermittent Computation without Hardware Supportor Programmer Intervention

Joel Van Der WoudeSandia National Laboratories∗

Matthew HicksUniversity of Michigan

AbstractAs computation scales downward in area, the limi-

tations imposed by the batteries required to power thatcomputation become more pronounced. Thus, many fu-ture devices will forgo batteries and harvest energy fromtheir environment. Harvested energy, with its frequentpower cycles, is at odds with current models of long-running computation.

To enable the correct execution of long-running appli-cations on harvested energy—without requiring special-purpose hardware or programmer intervention—we pro-pose Ratchet. Ratchet is a compiler that adds lightweightcheckpoints to unmodified programs that allow exist-ing programs to execute across power cycles correctly.Ratchet leverages the idea of idempotency, decompos-ing programs into a continuous stream of re-executablesections connected by lightweight checkpoints, storedin non-volatile memory. We implement Ratchet on topof LLVM, targeted at embedded systems with high-performance non-volatile main memory. Using eightembedded systems benchmarks, we show that Ratchetcorrectly stretches program execution across frequent,random power cycles. Experimental results show thatRatchet enables a range of existing programs to run onintermittent power, with total run-time overhead averag-ing below 60%—comparable to approaches that requirehardware support or programmer intervention.

1 IntroductionImprovements in the design and development of com-

puting hardware have driven hardware size and cost torapidly shrink as performance improves. While earlycomputers took up entire rooms, emerging computers aremillimeter-scale devices with the hopes of widespreaddeployment for sensor network applications [23]. These

∗Work completed while at the University of Michigan

rapid changes drive us closer to the realization of smartdust [20], enabling applications where the cost and sizeof computation had previously been prohibitive. We arerapidly approaching a world where computers are notjust your laptop or smart phone, but are integral partsyour clothing [47], home [9], or even groceries [4].

Unfortunately, while the smaller size and lower cost ofmicrocontrollers enables new applications, their ubiqui-tous adoption is limited by the form factor and expense ofbatteries. Batteries take up an increasing amount of spaceand weight in an embedded system and require specialthought to placement in order to facilitate replacing bat-teries when they die [20]. In addition, while Moore’sLaw drives the development of more powerful comput-ers, batteries have not kept pace with the scaling ofcomputation [18]. Embedded systems designers attemptto address this growing gap by leveraging increasinglypower-efficient processors and design practices [49]. Un-fortunately, these advances have hit a wall—the batterywall; enabling a dramatic change in computing necessi-tates moving to batteryless devices.

Batteryless devices, instead of getting energy from thepower grid or a battery, harvest their energy from theirenvironment (e.g., sunlight or radio waves). In fact,the first wave of energy harvesting devices is availabletoday [4, 50, 9]. These first generation devices provethat it is possible to compute on harvested energy. Thisaffords system designers the novel opportunity to removea major cost and limitation to system scaling.

Unfortunately, while harvested energy represents anopportunity for system designers, it represents a chal-lenge for software developers. The challenge comesfrom the nature of harvested energy: energy harvest-ing provides insufficient power to perform long-running,continuous, computation [37]. This results in frequentpower losses, forcing a program to restart from the be-

USENIX Association 12th USENIX Symposium on Operating Systems Design and Implementation 17

Page 3: Intermittent Computation without Hardware Support or Programmer

Figure 1: Energy harvesting devices replace batteries with a smallenergy storage capacitor. This is the ideal charge and decay of thatcapacitor when given a square wave input power source. The voltageacross the capacitor must be above v trig for program computationto take place as that is the minimum energy required to store thelargest possible checkpoint in the worst-case environmental and deviceconditions. The energy expended going from v on to v trig andfrom v trig to v off is wasted in what is called the guard band.The guard band is essential for correctness in one-time checkpointingsystems [39, 17, 3]. In real systems, the charge and discharge rateis chaotic, depending on many variables, including temperature anddevice orientation.

ginning, in hopes of more abundant power next time.While leveraging faster non-volatile memory technolo-gies might seem like an easy way to avoid the problemsassociated with these frequent power cycles, previouswork exposes the inconsistent states that can result frompower cycles when using these technologies [26, 38].

Previous research attempts to address the problem ofintermittent computation using two broad approaches:rely on specialized hardware [39, 29, 3, 17, 27] or requirethe programmer to reason about the effects of commoncase power failures on mixed-volatility systems [26].Hardware-based solutions, rely on a single checkpointthat gets saved just before power runs out. It is criticalfor correctness that every bit of work gets saved by thecheckpoint and that there is no (non-checkpointed) workafter a checkpoint [38]. On the other hand, taking acheckpoint too early wastes energy after the checkpointthat could be used to perform meaningful work. Thistradeoff mandates specialized hardware to measure avail-able power and predict when power is likely to fail. Dueto the intermittent nature of harvested energy, predict-ing power failure is a risky venture that requires largeguard bands to accommodate a range of environmentalconditions and hardware variances. Figure 1 depicts howguard-bands waste energy that could otherwise be usedto make forward progress. In addition to the guard-bands

wasting energy, the power monitoring hardware itselfconsumes power. Even simple power monitoring circuits(think 1-bit voltage level detector) consume power up to33% of the power of modern ultra-low-power microcon-trollers [5, 43].

An alternative approach, as taken by DINO [26], isto forgo specialized hardware, instead, placing the bur-den on the programmer to reason about the possibleoutcomes of frequent, random, power failures. DINOrequires that programmers divide programs into a seriesof checkpoint-connected tasks. These tasks then use dataversioning to ensure that power cycles do not violatememory consistency. Smaller tasks increase the like-lihood of eventual completion, at the cost of increasedoverhead. Larger tasks result in fewer checkpoints, butrisk never completing execution. Thus, the burden is onthe programmer to implement—for all control flows—correct-sized tasks given the program and the expectedoperating environment. Note that even small changesin the program or operating environment can changedramatically the optimal task structure for a program.

Our goal is to answer the question: What can bedone without requiring hardware modifications or bur-dening the programmer? To answer this question, wepropose leveraging information available to the compilerto preserve memory consistency without input from theprogrammer or specialized hardware. We draw upon thewealth of research in fault tolerance [31, 24, 21, 51, 25,13] and static analysis [7, 22] and construct Ratchet,a compiler that is able to decompose unmodified pro-grams into a series of re-executable sections, as shownin Figure 2. Using static analysis, the compiler canseparate code into idempotent sections—i.e., sequencesof code that can be re-executed without entering a stateinconsistent with program semantics. The compileridentifies idempotent sections by looking for loads andstores to non-volatile memory and then enforcing that nosection contains a write after read (WAR) to the sameaddress. By decomposing programs down to a seriesof re-executable sections and gluing them together withcheckpoints of volatile state, Ratchet supports existing,arbitrary-length programs, no matter the power source.Ratchet shifts the burden of reasoning about the effectsof intermittent computation and mixed-volatility awayfrom the programmer to the compiler—without relyingon hardware support.

We implement Ratchet as a set of modifications to theLLVM compiler [22], targeting energy harvesting plat-forms that use the ARM architecture [2] and have whollynon-volatile main memory. We also implement an ARM-based energy-harvesting simulator that simulates power

18 12th USENIX Symposium on Operating Systems Design and Implementation USENIX Association

Page 4: Intermittent Computation without Hardware Support or Programmer

failures at frequencies experienced by existing energyharvesting devices. To benchmark Ratchet, we use it toinstrument the newlib C-library [45], libgcc, and eightembedded system benchmarks [14]. Finally, we ver-ify the correctness of Ratchet by executing all instru-mented benchmarks, over a range of power cycle rates,on our simulator, which includes a set of memory con-sistency invariants dynamically check for idempotenceviolations. Our experimental results show that Ratchetcorrectly stretches program execution across frequent,random power cycles, while adding 60% total run-timeoverhead1. This is comparable to approaches that requirehardware support or programmer intervention and muchbetter than the alternative of most benchmarks nevercompleting execution with harvested energy.

This paper makes several key contributions:• We design and implement the first software-only

system that automatically and correctly stretchesprogram execution across random, frequent powercycles. As a part of our design, we extend the notionof idempotency to memory.

• We evaluate Ratchet on a wider range of bench-marks than previously explored and show that itstotal run-time overhead is competitive with existingsolutions that require hardware support or program-mer intervention.

• We open source Ratchet, including our energy har-vesting simulator [16], our benchmarks [15], andmodifications to LLVM to implement Ratchet [44].

2 BackgroundThe emergence of energy harvesting devices poses the

question: How can we perform long-running computa-tion with unreliable power? Answering this questionforces us to go beyond a direct application of existingwork from the fault tolerance community due to fourproperties of energy harvesting devices:• Power availability and duration are unknowable for

most use cases.• Added energy drain by hardware is just as important

as added cycles by software.• Small variations in the device and the environment

have large effects on up time.• Faults (i.e., power cycles) are the common case.

1We say total run-time overhead to cover all sources of run-timeoverhead, including hardware and software. Keep in mind that addinghardware indirectly increases the run time of programs by decreasingthe amount of energy available for executing instructions. BecauseRatchet is software only, the total run-time overhead is equal to theoverhead due to saving checkpoints and re-execution.

Figure 2: Ratchet-compiled program in operation. The checkmarksrepresent completed checkpoints, the x’s represent power failures, andthe dashed lines to the backward rotating arrows represent the systemrestarting execution at the latest checkpoint. This figure shows howprograms execute as normal with added overhead from checkpoints andre-execution.

These four properties dictate how system builders con-struct energy harvesting devices and how researchersmake programs amenable to intermittent computation.

2.1 Prediction vs. ResilienceGiven the properties of harvested energy, there are

two checkpointing methods for enabling long-runningcomputation: one-time and continuous. One-time check-pointing approaches attempt to predict when energy isabout to run out and checkpoint all volatile state rightbefore it does. Doing this requires measuring the volt-age across the energy storage capacitor as depicted inFigure 1. Measuring the voltage requires an Analog-to-Digital Converter (ADC) configured to measure the ca-pacitor’s voltage. Hibernus [3], the lowest overhead one-time checkpointing approach, utilizes an advanced ADCwith interrupt functionality and a configurable voltagethreshold that removes the need to periodically check thevoltage from software. As Table 1 shows, this producesvery low overheads, with the two main sources of over-head being the extra power consumed by the ADC andthe energy wasted waiting in the guard bands.

While many energy-harvesting systems have ADCs,the program may require use of the ADC, the ADC maynot support interrupts, or the ADC may not be config-ured (in hardware) to monitor the voltage of the energystorage capacitor. Without such an ADC, programs mustbe able to fail at any time and still complete executioncorrectly. Making programs resilient to spontaneouspower failures is the domain of continuous checkpointingsystems. Continuous checkpointing systems must main-tain the abstraction of re-execution memory consistency(i.e., a section of code is unable to determine if it is beingexecuted for the first time or being re-executed by exam-

USENIX Association 12th USENIX Symposium on Operating Systems Design and Implementation 19

Page 5: Intermittent Computation without Hardware Support or Programmer

PropertyCommodity

Checkpointing[32, 11, 31, 34]

Energy HarvestingHW-assisted

[17, 3, 29]DINO [26] Idempotence [7, 13] Ratchet

Failure Rate Days/Weeks 100 ms 100 ms Days/Weeks 100 msRequires Varies HW+Compiler Programmer+Compiler Compiler CompilerFailure Type Transient Fault Power Loss Power Loss Transient Fault Power LossMemory type DRAM+HDD FRAM SRAM+FRAM DRAM+HDD FRAMChkpt. Trigger Time Low Voltage Task Boundary Register WAR NV WARChkpt. Contents Varies VS VS+NV TV — VSOverhead Varies 0–145% [3] 80–170% 0–30% 0–115%Primary Factor Varies Measurement Task Size # Faults Section Size

Table 1: Requirements and behavior of different checkpointing/logging techniques. WAR represents a Write-After-Read dependence, VS representsVolatile State (e.g., SRAM and registers), NV represents Non-Volatile memory (e.g., FRAM), NV TV represents Task Variables stored in Non-Volatile memory, and Measurement represents the added time and energy consumed by using voltage-monitoring hardware.

ining memory)2. To maintain re-execution memory con-sistency, continuous checkpointing systems periodicallycheckpoint volatile state and guard against inconsistentupdates to non-volatile memory. DINO [26] does thisthrough data versioning, while Ratchet does this throughmaintaining idempotence. Table 1 shows that this classof approach also yields low total overheads, with theprimary source being time spent checkpointing.

2.2 Memory VolatilityAnother consideration for energy harvesting devices

is the type, placement, and use of non-volatile memory.While the initial exploration into support for energy har-vesting devices, Mementos [39], focuses on supportingFlash-based systems with mixed-volatility main mem-ory, all known follow-on work focuses on emergingsystems with Ferroelectric RAM (FRAM)-based, non-volatile, main memory [17, 3, 26, 27]. This transi-tion is necessary as several properties of Flash makeit antithetical to harvested energy. The primary reasonFlash is ill suited is its energy requirements. Flashworks by pushing charge across a dielectric. Doing sois an energy intense operation requiring high voltage thatmakes little sense when the system is about to run out ofpower. In fact, on MSP430 devices, Flash writes fail at amuch higher voltage than the processor itself fails [5]—increasing the energy wasted in the guard band. A secondlimitation of Flash is that most programs avoid placingvariables there, increasing the amount of volatile statethat requires checkpointing. Flash writes, beyond beingenergy expensive, are slow and complex. Updating a

2Note that this is a relaxation on the requirement for deterministicre-execution [48, 31, 30, 33], where it is required that each re-executionproduce the same exact result. Our problem only requires that re-executions produce a semantically correct execution.

variable stored in Flash requires erasing a much largerblock of memory and rewriting all data, along with theone updated value. This process adds complexity toapplications and increases write latency over FRAM bytwo-orders of magnitude.

In comparison with Flash memory, FRAM boasts ex-tremely low voltage writes, as low as a single volt [35].Writes to FRAM are also nearly as fast as writes toSRAM and are bit-wise programmable. The flexibilityand low overheads of FRAM allows for processor de-signers to create wholly non-volatile main memory, de-creasing the size and cost of checkpoints. This opens thedoor to continuous checkpointing systems as the cost ofcheckpointing is outweighed by the power requirementof the ADC. While, like previous approaches, we focuson FRAM due to its commercial availability, there arecompeting non-volatile memory technologies (e.g., Mag-netoresistive RAM [46] and Phase Change RAM [40])that we expect to work equally well with Ratchet. Mov-ing program data to non-volatile memory does comewith a cost: previous work reveals that mixing non-volatile and volatile memory is prone to error [38, 26].Ratchet deals with this by pushing such complexity intothe compiler, relieving the programmer from the burdenof reasoning about mixed volatility main memory and theeffects of power cycles.

3 DesignRatchet seeks to extend computation across common

case power cycles in order to enable programs on energyharvesting devices to complete long-running computa-tion with unreliable power sources. Ratchet enablesre-execution after a power failure by saving volatilestate to non-volatile memory during compiler-insertedcheckpoints. However, checkpointing alone is insuffi-

20 12th USENIX Symposium on Operating Systems Design and Implementation USENIX Association

Page 6: Intermittent Computation without Hardware Support or Programmer

(a) (b) (c)

Figure 3: The same code executed without failures, with failures, and with failures with Ratchet. This basic example illustrates one of the difficultieswith using non-volatile main memory on intermittently powered computers: failures can create states not possible given program semantics.

cient to ensure correct computation due to the problemswith maintaining consistency between volatile and non-volatile memory give unpredictable failures [38, 26]. Toensure correct re-execution, we use compiler analysisto determine sections of code that may be re-executedfrom the beginning, without producing different results;a property called idempotence. After decomposing pro-grams into idempotent sections, we connect the indepen-dent sections with checkpoints of volatile state. Thisensures that after a power failure the program will re-sume with a view of all memory identical to the first(attempted) execution3.

3.1 Idempotent SectionsIdempotent sections are useful because they are natu-

rally side effect free. Nothing needs to be changed aboutthem in order to protect against memory consistencyerrors that may arise from partial execution. By recog-nizing code sections with this property, Ratchet is able tofind potentially long sequences of instructions that canbe re-executed without any additional work required toensure memory consistency.

De Kruijf et al. identify idempotent sections by look-ing for instruction sequences that perform a Write-After-Read (WAR) to the same memory address [7]. Undernormal execution, overwriting a previously read memorylocation is inconsequential. However, on systems thatroll back to a previous state for recovery (due to potentialissues such as power failures), overwriting a value thatwas previously read will cause a different value to beread when the code section is re-executed. Figure 3shows an example of how re-executing a section of codewith a WAR dependency may introduce execution thatdiverges from program semantics.

In order to prevent these potential consistency prob-

3We are not saying the memory must be identical. We are sayingthat the values that a given idempotent section of code reads areidentical to the initial execution. Idempotency enables this relaxation.

lems Ratchet inserts a checkpoint between the write andthe read. This breaks the dependency, separating theread and the write in different idempotent sections. Thisensures that the read always gets the original value andthe write will be contained in a different idempotentsection, where it is free to update the value. Note thata sequence of instructions that contains a WAR may stillbe idempotent if there exists a write to the same memoryaddress before the first read. For example a WARAWdependency chain is idempotent since the first write ini-tializes the value stored at the memory address so thateven if it is changed by the last write, it will be restoredupon re-execution before the address is read again. Notethat this holds for a potentially infinite sequence of writesand reads, the sequence will be idempotent if there is awrite before the first read.

It is important to remember that in order for a loadfollowed by a store cause consistency problems, theymust read and modify the same memory. In order todetermine which instructions rely on the same memorylocations, Ratchet uses intraprocedural alias analysis dueto its availability, performance, and precision. Aliasanalysis conservatively identifies instructions that mayread or modify the same memory locations. Since thealias analysis is intraprocedural we conservatively as-sume all stores to addresses outside of the stack framemay alias with loads that occurred in the caller. Thisforces Ratchet to insert a checkpoint along any controlflow path that includes a store to non-local memory.

After finding all WARs we use a modified hitting setalgorithm to insert the minimum number of checkpointsbetween the loads and stores. The algorithm works byassigning weights to different points along the controlflow graph based upon metrics such as loop depth andthe number of other idempotency violations intersected.It uses these metrics to identify checkpointing locationsthat prevent all possible idempotency violations whiletrying to avoid locations that will be re-executed more

USENIX Association 12th USENIX Symposium on Operating Systems Design and Implementation 21

Page 7: Intermittent Computation without Hardware Support or Programmer

often than necessary. For example, do not checkpointwithin a loop if a checkpoint outside the loop will sep-arate all WARs. For a more in-depth discussion of thisalgorithm, we refer you to de Kruijf et al. [7].

3.2 Implicit Idempotency ViolationsWhile looking for WARs identifies the majority of

code sequences that violate idempotence, some instruc-tions may implicitly violate idempotence. A pop in-struction can be modeled by a read and subsequentupdate to the stack pointer. This update immediatelyinvalidates the memory locations just read by allowingfuture push instructions to overwrite the old values. Ona system with interrupts, this scenario occurs when aninterrupt fires after a pop instruction. In this case, thepop instruction will read from the stack and update thestack pointer. When the interrupt occurs it will performa push instruction to callee save registers, in order topreserve the state of the processor before the interruptfired. However, the state saved by the interrupt is writtento the stack addresses that were read by the initial pop. Ifthe system re-executes the pop instruction, it will read avalue from the interrupt handler—not the original value!This behavior forces Ratchet to treat all pop instructionsas implicit idempotency violations since interrupts arenot predictable at compile time.

In order to enable a checkpoint before the slots on thestack are freed Ratchet exchanges pop instructions fora series of instructions that perform the same function.The first duty of the pop instruction is to retrieve aset of values from the stack and insert them into reg-isters. Ratchet emulates this by inserting a series ofload instructions to retrieve these values from the stackand place them in their respective registers. Note thatthe load instructions do not update the stack pointer soany interrupt that fires will push the new values abovethese values. After the data is retrieved from the stack,Ratchet inserts a checkpoint. Finally, Ratchet insertsan update to the stack pointer to free the space previ-ously occupied by the values that we just loaded intoregisters. By emitting this sequence of instructions wehave deconstructed an atomic read write instruction thatis an implicit idempotency violation and replaced it by aseries of instructions that enable separation of potentialidempotency violations.

3.3 CheckpointsIn between each naturally occurring idempotent sec-

tion, we insert checkpoints in order to save all volatilememory necessary to restart from the failure. In emerg-ing systems we observe non-volatile memory movingcloser and closer to the CPU, so far that it has non-volatile RAM [42]. In such a system, all that is needed to

Figure 4: Shows the relationship between checkpoint overhead andlive registers. A checkpoint’s cost is the number of cycles it takes tocommit and its weight refers to how often they occur in our benchmarksrelative to the total number of checkpoints.

restore state are the values stored in the registers. In fact,not all registers are necessary, only registers that are usedas inputs to instructions that occur after the checkpointlocation. These registers are denoted live-in registers.

Traditionally, compilers keep a list of live-in and live-out registers to determine which registers are needed ina basic block to perform some computation and whichones are unused and can be reallocated to reduce registerspilling. This information is available to the compilerafter registers have been allocated. We are interested inwhich registers are live-in to a checkpoint because theydenote the volatile memory needed to correctly restartfrom a given location in a program. Figure 4 shows therelationship between checkpoint overhead and number oflive-in registers.

In order to prevent power failures during a check-point from causing an inconsistent state, we use a dou-ble buffering scheme. One buffer holds the previouscheckpoint, while the other is used for writing a newcheckpoint. A checkpoint is committed by updating thepointer to the valid checkpoint buffer as the last part ofwriting the checkpoint. We tolerate failures even whiletaking a checkpoint by never overwriting a checkpointuntil a more recent checkpoint is available in the otherbuffer. The atomicity of the store instruction for a singeword ensures that we always have a valid checkpointregardless of when we experience a power failure.

3.4 RecoveryIn order to recover from a power failure, we insert

code before the main function to check to see if there ex-ists a valid checkpoint. If so, we determine that we haveexperienced a power failure and need to restore state,

22 12th USENIX Symposium on Operating Systems Design and Implementation USENIX Association

Page 8: Intermittent Computation without Hardware Support or Programmer

otherwise we begin executing from main. Restoring stateconsists of moving all saved registers into the appropriatephysical registers. Once the program counter has beenrestored, execution will restart from the instruction afterthe most recently executed checkpoint.

In the case that some idempotent sections are too long,that is, power may repeatedly fail before a checkpoint isreached, we use a timer that triggers an interrupt in orderto ensure forward progress. Each interrupt checks to seeif a new checkpoint has been taken since the last timeit was called. It does this by zeroing-out the programcounter value in the unset checkpoint buffer each time itis called. It can then tell if a checkpoint has been takensince its last call and only checkpoint if the programcounter of the unused checkpoint buffer is still zero.

When checkpointing from the timer interrupt it isimpossible to foresee which registers are still live andwe must instead conservatively save all of the registersto non-volatile memory. There exists a trade off whenselecting the timer speed. A timer that is short increasesthe overhead due to checkpointing, while a timer that islong increases re-execution overhead. Without additionalhardware to measure environmental conditions, the timercan be set on the order of experts estimation of averagelifetime for transiently powered devices [39]. Note thatfor our benchmarks, a timer was not needed in order tomake forward progress.

3.5 ChallengesDuring implementation of Ratchet we encountered a

number of design challenges with actually implementingour ideal design. Most of these challenges related tobeing entrenched at different levels of abstraction withinthe compiler. While the code is in the compiler’s in-termediate representation (IR), powerful alias analysisand freedom from architecture level specifics made for alogical location to identify WAR dependencies and insertcheckpoints. However, these decisions are dependenton the choices made during the translation from thecompiler’s IR to machine code, such as which calls aretail calls and information about when register pressurecauses register spilling.

This semantic gap causes conservative decision mak-ing about where to place checkpoints, since the decisionsmade in the front end rely on assumptions about wherecheckpoints will be inserted in the back-end.

3.6 OptimizationsAs we began to implement our design, it became clear

that we were inserting checkpoints more frequently thanneeded. As a result of profiling our initial design weimplemented several optimizations to remove redundantcheckpoints.

r1 = mem[sp + 4]

checkpoint()

mem[sp + 4] = r2

...

r3 = mem[r2 + 8]

checkpoint()

mem[r2 + 8] = r4

r1 = mem[sp + 4]

r3 = mem[sp + 8]

checkpoint()

mem[sp + 4] = r2

...

checkpoint()

mem[sp + 8] = r4

a. b.

Figure 5: Two possible code sequences. In (a) each WAR dependencyis separated by a checkpoint, but the two checkpoints cannot be com-bined without violating idempotency. In (b), the second checkpointcould be moved to the same line as the first checkpoint since there areno potentially aliasing reads separating the two checkpoints.

3.6.1 Interprocedural Idempotency ViolationsBecause of the limits on interprocedural alias anal-

ysis, we initially conservatively inserted a checkpointon function entry. This protects against potential WARviolations between caller and callee code. We observedthat this was often conservative and could be relaxed. Acheckpoint is only necessary on function entry if thereexists a write that may alias with an address outside ofthe function’s local stack frame. In the presence of anoffending write, the function entry is modeled as thepotentially offending read (since an offending load couldhave occurred in the caller).

In addition, we noticed that some tail calls could beimplemented without any checkpoints. Since tail callsoperate on the stack frame of their caller, a checkpointon return is unnecessary, assuming that the tail call doesnot modify non-local memory. However, opportunitiesfor this optimization were observed to be limited due tothe difficulty of determining where to put checkpoints forintraprocedural WAR dependencies. This is a result ofthe semantic gap between compiler stages, when iden-tifying WAR dependencies, we do not yet have perfectinformation about which calls can be represented as tailcalls. We imagine a more extensive version of Ratchetthat takes information from each stage of the compilerpipeline and iteratively adjusts checkpoint locations tofind the near-minimal set of checkpoints that maintaincorrectness.

3.6.2 Redundant CheckpointsDue to the semantic gap between our alias analysis and

insertion of checkpoints in the front-end of the compiler(while the code is still in IR), and the instruction schedul-ing of the back-end, we observed cases where optimiza-tions or other scheduling decisions caused redundantcheckpoints. We consider redundant checkpoints to bea pair of checkpoints where any potential idempotency

USENIX Association 12th USENIX Symposium on Operating Systems Design and Implementation 23

Page 9: Intermittent Computation without Hardware Support or Programmer

violations could be protected with a single checkpoint.In general, a checkpoint can be relocated within its basicblock to any dominating location does not cross anyreads and any dominated location that does not crossany writes. This conservative rule follows even withoutknowing alias information, which allows us to reorderinstructions after machine code has been generated andwe no longer know which instructions generated theWAR dependency. Figure 5 shows an example of howcheckpoints can be combined safely by relocation.

3.6.3 Special Purpose RegistersSince all volatile state must be saved during a check-

point, all live special purpose registers must be savedalong with the live general-purpose registers. Some ofthe special purpose registers have higher costs to savethan the general purpose registers. In our experienceimplementing Ratchet, we found the cost of checkpoint-ing condition codes to be high. Instead of paying thisoverhead, we ensured checkpoints were placed after thecondition code information was consumed, while still en-suring all non-volatile memory idempotency violationswere cut. Ratchet does this by reordering instructionsto ensure a checkpoint is not placed between a condi-tion code generator and all possible consumers. Thisinstruction reordering is done with the same constraintsas combining checkpoints.

3.7 Architecture Specific TradeoffsThere are a number of architectural decisions that

influence the overhead of our design. Register-richarchitectures reduce the number of idempotent sectionbreaks required by reducing the frequency of registerspills, at the cost of increasing checkpoint size. Atomicread-modify-write instructions are incompatible with ourdesign since there is no way to checkpoint between theread and the write. On such an architecture, Ratchetcould separate the instruction into separate load and storeoperations by our compiler implementation.

4 ImplementationWe implement Ratchet using the LLVM compiler in-

frastructure [22]. Beyond verifying that Ratchet outputexecutes correctly on an ARM development board [41],we build an ARMv6-M [2] energy-harvesting simulator,with wholly non-volatile memory. The simulator allowsfor fine-grain control over the frequency, arrival time,and effects of power cycles, as well as allowing us toverify Ratchet’s correctness. The benchmarks we use forevaluation all depend on libc, so we also use Ratchet toinstrument newlib [45], and link our benchmarks againstour instrumented library.

4.1 CompilerWe build our compiler implementation on top of the

LLVM infrastructure [22]. We add a front end, IR levelpass that detects idempotent section breaks by trackingloads and stores to non-volatile memory that may alias(based on earlier work targeted at registers [7]). Thistop-level pass inserts checkpoint placeholders that areeventually passed to the back end where machine codeis eventually emitted. The back end replaces pop in-structions with non-destructive reads and a checkpointfollowed by an update to the stack pointer. Next, theback end relocates the inserted placeholders to minimizethe number of checkpoints required by combining themand avoiding bisecting condition code def-use chains.After register allocation, each placeholder is replaced bya function call to the checkpointing routine that savesthe fewest possible registers. We determine the minimalset of registers to save using the liveness informationavailable in the back end.

One compiler-inserted idempotency-violating con-struct that we modify the compiler to prevent is theuse of shared stack slots for virtual registers. If thecompiler were able to re-assign the same stack slot toa different virtual register, it would create an idem-potency violation as the original virtual register valueis overwritten by a different virtual register’s value.While it is possible to include some backend analy-sis to uncover such situations, we choose to sacrificesome stack space and prevent the sharing of stack slots.Achieving this in LLVM is as simple as adding the flag-no-stack-slot-sharing to the compile com-mand. In addition, we include the mem2reg optimiza-tion that causes some variables that would otherwise bestored in memory to be stored in registers. This reducesthe number of idempotency violations thereby reducingthe number of checkpoints and increasing idempotentsection length.

4.2 Energy Harvesting SimulatorWe also implement a cycle accurate ARMv6-M sim-

ulator. Many of the coming internet of things classdevices are choosing to use ARM devices as they are theperformance per Watt leader. As the price of new non-volatile memory technologies decreases and their speedincreases, we expect ARM to follow in the footsteps ofthe MSP430 [42] and move to a wholly non-volatile mainmemory. In fact, even Texas Instruments, the maker ofthe MSP430, recently moved to ARM for their newestMSP devices [43].

An energy-harvesting simulator is required because ofthe difficulties associated with developing and debuggingintermittently powered devices. Using a cycle accurate

24 12th USENIX Symposium on Operating Systems Design and Implementation USENIX Association

Page 10: Intermittent Computation without Hardware Support or Programmer

Figure 6: Runtime overhead for several versions of Ratchet.

simulator we are able to simulate failures with a proba-bility distribution that can model the true frequency andeffects of power failures experienced by devices in thereal world. Using a simulator also allows us to takeprecise measurements of how much progress a programmakes per power cycle and the cycles consumed by re-execution. Lastly, our simulator allows for us to verifythe correctness of Ratchet for every benchmark run.

4.3 Idempotent LibrariesIn order to ensure that each section of code that runs

is idempotent, we instrument all of the libraries neededby the device. In order to facilitate real applicationswe compile newlib [45], a basic libc and libm libraryaimed at embedded systems, with Ratchet. This requiresa modifying three lines in newlib’s makefile to prevent itfrom building these optimized versions of libc calls. Wedid this because any uninstrumented code could causememory consistency to be violated (due to idempotencyviolations) if power fails after a write-after-read depen-dency and before a checkpoint.

Lastly, we produce an instrumented version of theminimum runtime functions expected by clang that areincluded in the compiler-rt project [1]. These functionsimplement the libgcc interfaces expected by most com-pilers. As with newlib, we use only the bare minimumoptimized assembly implementations. Those that we use,we insert checkpoints by hand between potential write-after-read dependencies. Thankfully, this only needs tobe done once by the library’s author.

Note that Ratchet supports assembly as long as theassembly is free of idempotence violations or all poten-tial idempotency violations are separated by checkpoints.With additional engineering effort, it is possible to cre-

ate a tool that inserts these checkpoints automaticallythrough static analysis of the assembly [8].

5 EvaluationIn order to provide a comparison against other check-

pointing solutions for energy harvesting devices, weevaluate Ratchet on benchmarks common to these ap-proaches, namely, RSA, CRC, and FFT. For a morecomplete analysis, we port4 several benchmarks fromMiBench, a set of embedded systems benchmarks cate-gorized by run-time behavior [14]. Expanding the bench-mark set used to evaluate energy harvesting systems iscrucial, because testing with a wide range of programbehaviors and truly long-running programs is more likelyto expose an approach’s tradeoffs.

Unless otherwise noted, we compile all benchmarkswith -O2 as the optimization level. We choose the-O2 level since it includes most optimizations whileavoiding code size versus speed tradeoffs. As Section 5.3illustrates, Ratchet supports all of LLVM’s optimizationlevels. We use an average lifetime of 100 ms to matchthe setup of previous works. With an average lifetimeof 100 ms running with a 24 MHz clock, this gives us amean lifetime of 2,400,000 cycles. Before each bout ofexecution, the simulator samples from a Gaussian whosemean is the desired mean lifetime (in cycles) and usesthat value as the number of clock cycles to execute forbefore inducing the next power cycle. To simulate apower cycle, we clear the register values.

Given this experimental setup, we set out to answerseveral key questions about Ratchet:

4Some of these applications require input that is read in from afile. Since many energy-harvesting systems do not include an operatingsystem or file system, we instead compile the input into the binary.

USENIX Association 12th USENIX Symposium on Operating Systems Design and Implementation 25

Page 11: Intermittent Computation without Hardware Support or Programmer

Figure 7: The average number of cycles per idempotent section break.

1. Does Ratchet stretch computation across frequent,unpredictable losses of power correctly?

2. What is the overhead of running Ratchet due tocheckpoints and re-execution?

3. Is Ratchet compatible with compiler optimizations?4. What impact does Ratchet have on code size?We use the results of this evaluation to compare

Ratchet against alternative approaches that require hard-ware support or programmer intervention. See the resultsof this comparison in Table 1 and Section 7.2.

5.1 PerformanceTo understand the effects of power cycles on long-

running programs and the overhead of Ratchet, we per-form 10 trials of each benchmark with power failures asdescribed earlier. Figure 6 displays the results of thisexperiment for each benchmark, averaged and normal-ized to the run time of the benchmark executed withoutpower failures. The first thing to note is that 6 out of8 of the benchmarks fail to complete execution on har-vested energy without Ratchet (w/o Ratchet). Thereare several other results for each benchmark that repre-sent successive Ratchet optimizations (all levels exceptIdeal maintain correctness): Ratchet shows the per-formance from our naive implementation; RatchetFEdenotes placing a checkpoint at function entry only whenthere is a store to a non-local variable that is not precededby a checkpoint; RatchetFE+RD is RatchetFE, butwith duplicate checkpoints removed in LLVM’s back-end; RatchetFE+RD+LR adds a further optimizationof only checkpointing the live-in registers; and finally,Ideal represents a lower bound on Ratchet’s overheadthat assumes perfect intraprocedural alias analysis andzero idempotence violations. Ideal bounds what is

Figure 8: Re-execution overhead decreases as failure frequency in-creases. Note that CRC and RSA have zero re-execution overheadthroughout since they are short enough to complete in a single powercycle.

possible with more compiler engineering.We observe an average run-time overhead of 58.9%

using Ratchet+FE+RD+LR—a 20.1% improvementover Ratchet. The Ideal result suggests that furthercompiler engineering can reduce this overhead by over60%. The total overhead includes run-time overhead dueto saving checkpoints and re-execution, but checkpointoverhead dominates total overhead, because re-executionoverhead approaches zero.

Looking at Figure 6, we can see that overhead variesdramatically between benchmarks. This shows that per-formance of our method is highly program dependent.Intuitively, this makes sense. If one program includes animplicit WAR dependence buried deep in the hot sectionsof code and another has very few WAR dependencies,we would expect their run times to vary dramatically. Inorder to determine the effect of idempotent section lengthof a benchmark on performance we measure the numberof cycles between each checkpoint commit. To measurethis, we instrument our simulator to measure the numberof cycles5 between each call to any of our checkpointingfunctions. We then run each benchmark to completion,without power cycles. The average number of cyclesper idempotent section for each benchmark is shown inFigure 7. By comparing Figures 6 and 7 we notice thatprograms with shorter idempotent sections have higheroverheads.

We also investigate the relationship between idempo-tent section length and re-execution overhead. To do this,we instrument our simulator to measure the number of

5Most instructions take a single clock cycle to complete in theARMv6-M instruction set.

26 12th USENIX Symposium on Operating Systems Design and Implementation USENIX Association

Page 12: Intermittent Computation without Hardware Support or Programmer

Figure 9: The runtime overhead of each benchmark compiled withRatchet at various LLVM optimization levels normalized to the runtimeof the uninstrumented benchmark compiled at -O0.

cycles from the last checkpoint to a checkpoint restore.This includes the cycles spent executing code that occursafter the last checkpoint and the cycles spent restartingand restoring the last checkpoint. Figure 8 shows thefraction of run-time overhead due to re-execution for arange of average power-on times. While short idempo-tent sections tend to cause higher overall overhead dueto checkpoint overhead dominating the total overhead,Figure 8 combined with Figure 7 shows that bench-marks with shorter idempotent sections have lower re-execution costs. This is reasonable considering that afailure halfway through an idempotent section requiresthe program to re-execute from the last checkpoint. Themore cycles since the last checkpoint, the higher there-execution overhead. This suggests that increases inidempotent section length eventually will expose re-execution time as a key component of overhead.

5.2 CorrectnessWe validate Ratchet’s correctness using both formal

and experimental approaches. First, to test that Ratchetenforces idempotency with respect to non-volatile mem-ory, we instrument the simulator to log reads and detectWAR dependencies that occur during execution. We con-sider a program to fail if there exists any load and sub-sequent store, to the same address, that are not separatedby a checkpoint6. Second, to test that Ratchet enableslong-running execution even with power-cycle-inducedvolatile state corruption, we simulate random power fail-ures like those experienced in energy-harvesting devices

6This was especially helpful in debugging, as it exposed missedWAR dependencies, such as ones caused by spilling registers onto thestack due to register pressure, in early prototypes.

Program Ratchet Uninstrumented Change

AVERAGE 563720 560824 1.79%rsa 41326 40694 1.55%crc 36037 34677 3.92%FFT 182362 183612 -0.68%sha 3286631 3284544 0.06%picojpeg 379134 373051 1.63%stringsearch 184656 177567 3.99%dijkstra 183554 178465 2.85%basicmath 216053 213978 0.96%

Table 2: Code size increase due to Ratchet (sizes are in bytes).

and verify the results. We check the validity by runningdifferent sequences of failures and hashing memory con-tents and registers at the completion of each benchmarkrun to compare the hash to the hash of the ground truthrun without power failure. Lastly, to ensure Ratchetworks even in the most energy starved environments werepeat this experiment with lifetimes as short as 1 ms.

5.3 Impact of Compiler OptimizationsIn order to understand the performance of Ratchet

under varying compiler optimizations, we benchmarkRatchet across each LLVM optimization level. Figure 9shows the performance of the benchmarks compiled,with Ratchet, at different optimization levels relative touninstrumented benchmarks compiled at -O0. We ob-serve that in general, traditional compiler optimizationsimprove the performance of Ratchet. However, it alsoshows that in some cases, aggressive optimization resultsin higher perceived overheads. This suggests that break-ing the program into idempotent sections can not onlyreduce the efficiency of optimizations, but also causethem to be detrimental (see Dijkstra).

5.4 Code size increase from RatchetCode size is a critical constraint for many energy-

harvesting devices. In order to evaluate Ratchet’s prac-ticality with respect to code size, we measure the effectRatchet has on code size. On average, Ratchet increasesthe size of the program by 1.79% or 2896 bytes. Thisincrease is caused by adding our checkpoint recoverycode, a number of optimized checkpointing functions,checkpoint calls throughout the program, and exchang-ing pop instructions for loads. Table 2 shows the changein code size for each benchmark. We notice that about1356 bytes are a result of the additional function callsand reserved areas in memory. The rest of the code sizeincrease can be attributed to inserted code or code thathas been rewritten to support Ratchet, namely the trans-lation of pop instructions. Note that the FFT program

USENIX Association 12th USENIX Symposium on Operating Systems Design and Implementation 27

Page 13: Intermittent Computation without Hardware Support or Programmer

actually sees a decrease in size. We suspect this mightbe a result of Ratchet limiting optimization opportunitiessuch as loop unrolling.

6 DiscussionThere are several issues that represent corner-cases to

Ratchet’s design, such as avoiding repeating outputs onre-execution, instrumenting hand-written assembly, anddealing with the effects of power cycles too short to makeforward progress. Instead of distracting from the coredesign, we discuss them here.

6.1 Output Commit ProblemIn any replay-based system, there is a dilemma called

the output commit problem, which states that a processshould not send data to the outside world until it knowsthat that data is error free. The output commit problemtakes on new meaning for energy harvesting devices;the problem is one of sending multiple outputs whenonly one should have been sent in a correct execution.This problem occurs during the re-execution of a codesection that created an output during an earlier execu-tion. Imagine an LCD interface on an embedded system,where there is re-execution while printing to the screen.We suggest placing checkpoints immediately before andafter these output instructions to minimize the chance ofre-execution due to the instruction being at the end ofa long idempotent section. We believe that there is awealth of future work on making protocols themselvesrobust against the effects of intermittent computation.One step in this direction is delay/disruption-tolerantnetworking [12].

6.2 Hand-Written AssemblyHand-written assembly poses another challenge for

Ratchet. Because it is never transformed into LLVMIR we cannot run our passes to determine if there areidempotency violations, and if so, where they occur. Onenaive approach is to simply checkpoint before and afterthe assembly. While that is a good start, there still re-mains the possibility of idempotency violations betweenloads and stores within a section of assembly. While wehand-process the assembly files required for newlib, andlibgcc it is possible to create a tool to perform this pro-cessing automatically [8]. We choose not to write sucha tool, since only three assembly files have idempotencyviolations. Instead we instrument those by hand.

6.3 Ensuring CompletionExtremely short power cycles pose a problem in that

they prevent programs from making forward progressand thus never complete execution. Ratchet handles this

problem both at compile time and at run time. At compiletime, the programmer informs Ratchet of the expectedon time for the device. Ratchet uses this information tomake sure that no idempotent section is longer than theexpected on time. Note that adding arbitrary checkpoints(i.e., artificial idempotent section breaks) only affectsoverhead, not correctness. The second approach is to usethe watchdog timer (commonly available on embeddedsystems) to insert checkpoints dynamically, when thesequick power cycles prevent forward progress. That beingsaid, the longest idempotent sections in our benchmarks(roughly 5000 cycles) require on-times of less than 0.2ms to expose this issue.

7 Related WorkRatchet builds upon previous work from three broad

categories of research: rollback recovery, intermittentlypowered computing, and idempotence. We start by ex-amining the history of general-purpose checkpointingand logging schemes, followed by how previous ap-proaches to stretching program execution across frequentpower cycles have so far adapted those general-purposeapproaches. Lastly, since Ratchet builds upon previouswork on idempotence, we cover that previous work.

7.1 Rollback RecoveryResearch from the fault tolerance community presents

the idea of backward error recovery, “backing up one ormore of the processes of a system to a previous statewhich is hoped is error-free, before attempting to con-tinue further operation” [36]. Backward error recoverysystems often choose checkpointing as the underlyingtechnique [11]. CATCH is an example of a check-pointing system that leverages the compiler to providetransparent rollback recovery [24].

The challenge of checkpointing is identifying andminimizing the state that needs to be protected by acheckpoint [36]. BugNet [31] eliminates the need forfull-state checkpoints, while maintaining correctness, bylogging the values loaded after an initial checkpoint.This represents a dramatic reduction in overhead as itis unlikely that a process reads the entire contents ofmemory. Revive [34] reduces overhead further (up to50%) by using undo logging to record stores instead ofloads—assuming a reliable memory.

Ratchet further reduces the bandwidth required by log-ging/checkpointing by extending idempotency to mainmemory. Idempotency tells us that only a subset of pro-gram stores actually require logging—those that earlierloads depend on. By checkpointing at stores that aliaswith earlier loads (since the last checkpoint), Ratchet is

28 12th USENIX Symposium on Operating Systems Design and Implementation USENIX Association

Page 14: Intermittent Computation without Hardware Support or Programmer

able to increase, by orders of magnitude, the number ofinstructions between log entries/checkpoints.

7.2 Intermittently Powered ComputingResearch into fault tolerant computing traditionally

targets persistently powered computers. However, newultra-low-power devices challenge traditional power re-quirements, making them ideal candidates for energyharvesting techniques. Unfortunately, these techniquesprovide unreliable power availability that results in frag-mented and incorrect execution—violating the assump-tions of continuous power and infrequent errors. Be-cause of this, previous work on intermittent computationadopts known rollback recovery techniques, but mustadapt them to the properties of this new domain.

7.2.1 Hardware-assisted CheckpointingMementos [39] is the first system that attempts to

tackle the intermittent computation problem through theuse of a one-time (ideally) checkpoint. Mementos usesperiodic voltage measurements of the system’s energystorage capacitor to estimate how much energy remains.The goal is to checkpoint as late as possible. Me-mentos provides three ways to control the periodicityof voltage measurement: (1) function triggered: volt-age measurement after every function return; (2) looptriggered: voltage measurement after every loop itera-tion; and (3) timer assisted: a countdown timer addedto either of the first two variants that gates whether avoltage measurement is needed. Unfortunately, whileMementos works well when programs write to volatilestate only, recent research shows that Mementos is in-correct in the general case [26, 38]. The problem isthat Mementos allows for uncheckpointed work to oc-cur. If uncheckpointed work updates both volatile andnon-volatile memory, only the non-volatile memory willpersist, creating an inconsistent software state. Applyingthe idea of Mementos to wholly non-volatile memory, asdone by QUICKRECALL [17], actually makes the prob-lem worse: in contrast to Flash-based systems, in sys-tems with wholly non-volatile main memory, all programdata is stored in non-volatile memory. This makes it anear certainty that a program will write to non-volatilememory after a checkpoint, leading to an inconsistentsoftware state.

Hibernus [3] addresses QUICKRECALL’s correctnessissues through the introduction of guard bands. A guardband is a voltage threshold that represents the amount ofenergy required to store the largest possible checkpointto non-volatile memory in the worst-case device andenvironmental conditions. Execution occurs while thevoltage is above the threshold, but hibernates when the

voltage is below the threshold. Doing this ensures thatall work gets checkpointed, at the cost of wasting energywaiting for voltage to build up to the threshold and in thetime after a non-worst-case checkpoint—there is no safeway to scavenge unused energy.

Hibernus also improves upon Mementos in termsof performance. Hibernus employs a more advancedanalog-to-digital converter (ADC) (for voltage measure-ment) that allows software to set a threshold value thattriggers an interrupt when the voltage goes below thethreshold. This removes all software overhead causedby periodically checking the voltage (similar to pollingversus interrupts in software).

Comparing voltage-triggered checkpointing systemsto Ratchet is difficult. Fortunately, Hibernus provides adetailed performance comparison between itself and Me-mentos by implementing Mementos on their wholly non-volatile development platform. Their experimental re-sults at 100ms on-time show that Mementos’s total over-head varies between 117% and 145%, approximately,depending on the variant of Mementos. Hibernus’spolling reduces total overhead to 38%, approximately. Incomparison, Ratchet’s overhead for the same benchmarkprogram (potentially different inputs and configuration)is a comparable 32%.

In deciding between a one-time, voltage-triggeredcheckpointing scheme and Ratchet, the biggest factoris the ADC. Mandating an ADC is a non-starter formany existing systems that either do not have one al-ready or systems that have one, but not connected in away that supports power monitoring. This is a problemeven for Mementos, “Voltage supervisors are commoncircuit components, but most—crucially, including ex-isting prototype RFID-scale devices—-do not feature anadjustable threshold voltage.” For future systems andthose existing systems that have an ADC capable ofvoltage monitoring, even the most power efficient ADCsconsume as much as 1/3 of the power of today’s ultra-low-power microcontrollers [5]. The future is direr asperformance/Watt tends to scale at 2x every 1.57 yearsfor processors (known as Dennard scaling [10]), whileADC’s performance/Watt tends to scale at 2x every 2.6years [19].

7.2.2 Software-only CheckpointingDINO [26], is a software-only, programmer-driven

checkpointing scheme for tolerating an unreliable powersource. Programmers leverage the DINO compiler todecompose their programs into a series of independenttransactions backed by data versioning to achieve mem-ory consistency in the face of intermittent execution.To accomplish this, the authors rely on programmer

USENIX Association 12th USENIX Symposium on Operating Systems Design and Implementation 29

Page 15: Intermittent Computation without Hardware Support or Programmer

annotated logical tasks to define the transactions. Con-trast this with Ratchet, which automatically decomposesprograms using idempotency information inherent to aprogram’s structure—allowing programmers to ignorethe effects of power cycles and mixed volatility memory.

DINO enforces a higher-level property than doesRatchet, namely task atomicity, i.e., tasks either com-plete or re-execute from the beginning, as viewed byother tasks on the system. This contrasts Ratchet, whichallows intermediate writes as long as they maintain idem-potency. Enforcing task atomicity happens to also solvethe problem of intermittent computation (assuming theprogrammer defines short tasks), but enforcing a morerestrictive property incurs more run-time overhead, 80%to 170%. Note that even though DINO is evaluated on amixed-volatility system, these numbers are comparableto Ratchet’s because DINO’s overhead is more depen-dent on the amount of state a task may update than thevolatility of that state.

7.3 IdempotenceMahlke et al. develops the idea of idempotent code

sections in creating a speculative processor [28]. Theauthors construct restartable code sections that are bro-ken by irreversible instructions, which, “modifies anelement of the processor state which causes intolerableside effects, or the instruction may not be executed morethan one time.” They use this notion to define how tohandle a speculatively executing instruction that throwsan exception and show that they can use the idempotenceproperty to begin execution from the start of the inter-rupted section and still follow a valid control-flow path.

Kim et al. applies the idea of idempotency to datastorage, showing that idempotency is useful for reduc-ing the amount of data stored in speculative storageon speculatively multithreaded architectures [21]. Theauthors note that there are idempotent references that areindependent of the memory dependencies that result inerrors in non-parallizable code.

Encore is a software-only system that leverages idem-potency for probabilistic rollback-recovery [13]. Tar-geted at systems using probabilistic fault detectionschemes, Encore provides rollback recovery withoutdedicated hardware. The key insight of Encore is proba-bilistic idempotence: ignoring infrequently executed in-structions that break idempotence can increase the lengthof idempotent sections. While this violates correctness—something Ratchet must maintain—the authors realizeda performance improvement at the cost of not recovering3% of the time.

De Kruijf et al. [7] presents an algorithm for iden-tifying idempotent regions of code and show that it is

possible to segment a program into entirely idempotentregions with minimal overhead. In this initial work,the authors focus their attention on soft faults that donot mangle the register state, noting that registers areusually protected by other means. While soft faults maynot corrupt register state, power failures cause the entireregister file to be lost.

In follow-on work, de Kruijf et al. [6] presents al-gorithms that utilize the idempotence information gen-erated by the idempotent compiler to better inform theregister allocation step of compilation. This allows thecompiler to extend the live range of registers. Extendingthe live range of register values that are live-in to a givenidempotent section all the way to the end of that sectioncreates a free checkpoint that enables recovery from side-effect free faults. In contrast, faults on energy harvestingcome with significant side effects.

8 ConclusionRatchet is a compiler that automatically enables the

correct execution of long-running applications on de-vices that run on harvested energy, without hardwaresupport. Ratchet leverages the notion of idempotenceto decompose programs into a series of checkpoint-connected, re-executable sections. Experiments showthat Ratchet stretches program execution across random,frequent, power cycles for a wide range of programs—that would not be able to run completely otherwise—at a cost of less than 60% total run-time overhead.Ratchet’s performance is similar to existing approachesthat require hardware support or programmer reasoning.Experiments also show that, with more engineering, it ispossible to reduce run-time overheads to around 20%.

Ratchet shows that it is possible for compilers toreason about frequent failures and volatile versus non-volatile memory in languages not designed with eitherin mind. Pushing these burdens on the compiler opensthe door for non-expert programmers to code for energyharvesting devices.

AcknowledgmentWe thank our shepherd Y. Charlie Hu for his guidance

and the anonymous reviewers for their feedback andsuggestions. This work was supported in part by C-FAR, one of the six SRC STARnet Centers, sponsoredby MARCO and DARPA.

30 12th USENIX Symposium on Operating Systems Design and Implementation USENIX Association

Page 16: Intermittent Computation without Hardware Support or Programmer

References[1] compiler-rt runtime [computer software]. Retrieved from

http://compiler-rt.llvm.org/, Mar 2015.

[2] ARM. ARMv6-M Architecture Reference Manual, Sept 2010.

[3] BALSAMO, D., WEDDELL, A., MERRETT, G., AL-HASHIMI,B., BRUNELLI, D., AND BENINI, L. Hibernus: Sustainingcomputation during intermittent supply for energy-harvestingsystems. IEEE Embedded Systems Letters 7, 1 (2014), 15–18.

[4] BUETTNER, M., PRASAD, R., SAMPLE, A., YEAGER, D.,GREENSTEIN, B., SMITH, J. R., AND WETHERALL, D. RFIDsensor networks with the Intel WISP. In Conference on Embed-ded Networked Sensor Systems (2008), SenSys, pp. 393–394.

[5] DAVIES, J. H. MSP430 Microcontroller Basics, 1 ed. ElsevierLtd., 2008.

[6] DE KRUIJF, M., AND SANKARALINGAM, K. Idempotent codegeneration: Implementation, analysis, and evaluation. In Interna-tional Symposium on Code Generation and Optimization (2013),CGO, pp. 1–12.

[7] DE KRUIJF, M. A., SANKARALINGAM, K., AND JHA, S. Staticanalysis and compiler design for idempotent processing. In Con-ference on Programming Language Design and Implementation(2012), PLDI, pp. 475–486.

[8] DEBRAY, S., MUTH, R., AND WEIPPERT, M. Alias analysis ofexecutable code. In Symposium on Principles of ProgrammingLanguages (1998), POPL, pp. 12–24.

[9] DEBRUIN, S., CAMPBELL, B., AND DUTTA, P. Monjolo: Anenergy-harvesting energy meter architecture. In Conference onEmbedded Networked Sensor Systems (2013), SenSys, pp. 18:1–18:14.

[10] DENNARD, R. H., GAENSSLEN, F. H., RIDEOUT, V. L., BAS-SOUS, E., AND LEBLANC, A. R. Design of ion-implantedMOSFET’s with very small physical dimensions. IEEE Journalof Solid-State Circuits 9, 5 (Oct 1974), 256–268.

[11] ELNOZAHY, E. N., AND ZWAENEPOEL, W. Manetho: Trans-parent roll back-recovery with low overhead, limited rollback,and fast output commit. IEEE Transactions on Computers 41,5 (May 1992), 526–531.

[12] FARRELL, S., CAHILL, V., GERAGHTY, D., HUMPHREYS, I.,AND MCDONALD, P. When TCP breaks: Delay and disruptiontolerant networking. IEEE Internet Computing 10, 4 (July 2006),72–78.

[13] FENG, S., GUPTA, S., ANSARI, A., MAHLKE, S. A., ANDAUGUST, D. I. Encore: Low-cost, fine-grained transient fault re-covery. In International Symposium on Microarchitecture (2011),MICRO, pp. 398–409.

[14] GUTHAUS, M., RINGENBERG, J., ERNST, D., AUSTIN, T.,MUDGE, T., AND BROWN, R. MiBench: A free, commer-cially representative embedded benchmark suite. In InternationalWorkshop on Workload Characterization (2001), pp. 3–14.

[15] HICKS, M. Mibench port targeted at IoT devices. https://github.com/impedimentToProgress/MiBench2,2016.

[16] HICKS, M. Thumbulator: Cycle accurate ARMv6-m instruction set simulator. https://github.com/impedimentToProgress/thumbulator, 2016.

[17] JAYAKUMAR, H., RAHA, A., AND RAGHUNATHAN, V.QUICKRECALL: A low overhead hw/sw approach for enablingcomputations across power cycles in transiently powered com-puters. In International Conference on Embedded Systems andInternational Conference on VLSI Design (2014), pp. 330–335.

[18] JOGALEKAR, A. Moore’s law and battery technology: No dice.Scientific American (Apr 2013).

[19] JONSSON, B. E. A survey of A/D-converter performance evo-lution. In International Conference on Electronics, Circuits, andSystems (Dec 2010), ICECS, pp. 766–769.

[20] KAHN, J. M., KATZ, R. H., AND PISTER, K. S. J. Next centurychallenges: Mobile networking for smart dust. In InternationalConference on Mobile Computing and Networking (1999), Mo-biCom, pp. 271–278.

[21] KIM, S. W., OOI, C.-L., EIGENMANN, R., FALSAFI, B., ANDVIJAYKUMAR, T. N. Exploiting reference idempotency to re-duce speculative storage overflow. ACM Trans. Program. Lang.Syst. 28, 5 (2006), 942–965.

[22] LATTNER, C., AND ADVE, V. LLVM: A compilation frameworkfor lifelong program analysis & transformation. In InternationalSymposium on Code Generation and Optimization (2004), CGO,pp. 75–86.

[23] LEE, Y., KIM, G., BANG, S., KIM, Y., LEE, I., DUTTA, P.,SYLVESTER, D., AND BLAAUW, D. A modular 1mm3 die-stacked sensing platform with optical communication and multi-modal energy harvesting. In International Solid-State CircuitsConference Digest of Technical Papers (2012), pp. 402–404.

[24] LI, C.-C., AND FUCHS, W. Catch-compiler-assisted techniquesfor checkpointing. In International Symposium on Fault-TolerantComputing (1990), pp. 74–81.

[25] LOWELL, D. E., CHANDRA, S., AND CHEN, P. M. Exploringfailure transparency and the limits of generic recovery. In Sym-posium on Operating Systems Design & Implementation (2000),OSDI.

[26] LUCIA, B., AND RANSFORD, B. A simpler, safer programmingand execution model for intermittent systems. In Conferenceon Programming Language Design and Implementation (2015),PLDI, pp. 575–585.

[27] MA, K., ZHENG, Y., LI, S., SWAMINATHAN, K., LI, X., LIU,Y., SAMPSON, J., XIE, Y., AND NARAYANAN, V. Architectureexploration for ambient energy harvesting nonvolatile processors.In International Symposium on High Performance Computer Ar-chitecture (2015), HPCA, pp. 526–537.

[28] MAHLKE, S. A., CHEN, W. Y., BRINGMANN, R. A., HANK,R. E., MEI W. HWU, W., RAMAKRISHNA, B., MICHAEL,R., AND SCHLANSKER, S. Sentinel scheduling: a model forcompiler-controlled speculative execution. ACM Transactions onComputer Systems 11 (1993), 376–408.

[29] MIRHOSEINI, A., SONGHORI, E., AND KOUSHANFAR, F. Ide-tic: A high-level synthesis approach for enabling long com-putations on transiently-powered asics. In International Con-ference on Pervasive Computing and Communications (2013),PerComm, pp. 216–224.

[30] MONTESINOS, P., HICKS, M., KING, S. T., AND TORRELLAS,J. Capo: A software-hardware interface for practical determin-istic multiprocessor replay. In International Conference on Ar-chitectural Support for Programming Languages and OperatingSystems (2009), ASPLOS, pp. 73–84.

[31] NARAYANASAMY, S., POKAM, G., AND CALDER, B. BugNet:Continuously recording program execution for deterministic re-play debugging. In International Symposium on Computer Ar-chitecture (2005), ISCA, pp. 284–295.

USENIX Association 12th USENIX Symposium on Operating Systems Design and Implementation 31

Page 17: Intermittent Computation without Hardware Support or Programmer

[32] PLANK, J. S., BECK, M., KINGSLEY, G., AND LI, K. Libckpt:Transparent checkpointing under Unix. In USENIX TechnicalConference (1995), TCON, pp. 18–29.

[33] POKAM, G., DANNE, K., PEREIRA, C., KASSA, R., KRANICH,T., HU, S., GOTTSCHLICH, J., HONARMAND, N., DAUTEN-HAHN, N., KING, S. T., AND TORRELLAS, J. QuickRec:Prototyping an Intel architecture extension for record and replayof multithreaded programs. In International Symposium on Com-puter Architecture (2013), ISCA, pp. 643–654.

[34] PRVULOVIC, M., ZHANG, Z., AND TORRELLAS, J. Re-Vive: cost-effective architectural support for rollback recoveryin shared-memory multiprocessors. In International Symposiumon Computer Architecture (2002), ISCA, pp. 111–122.

[35] QAZI, M., CLINTON, M., BARTLING, S., AND CHAN-DRAKASAN, A. A low-voltage 1 Mb FRAM in 0.13 mu mCMOS featuring time-to-digital sensing for expanded operatingmargin. IEEE Journal of Solid-State Circuits 47, 1 (Jan 2012),141–150.

[36] RANDELL, B., LEE, P., AND TRELEAVEN, P. C. Reliabilityissues in computing system design. ACM Computer Surveys 10,2 (1978), 123–165.

[37] RANSFORD, B., CLARK, S., SALAJEGHEH, M., AND FU, K.Getting things done on computational RFIDs with energy-awarecheckpointing and voltage-aware scheduling. In Conference onPower Aware Computing and Systems (2008), HotPower, pp. 5–10.

[38] RANSFORD, B., AND LUCIA, B. Nonvolatile memory is a bro-ken time machine. In Workshop on Memory Systems Performanceand Correctness (2014), MSPC, pp. 5:1–5:3.

[39] RANSFORD, B., SORBER, J., AND FU, K. Mementos: Systemsupport for long-running computation on RFID-scale devices.In International Conference on Architectural Support for Pro-gramming Languages and Operating Systems (2011), ASPLOS,pp. 159–170.

[40] RAOUX, S., BURR, G. W., BREITWISCH, M. J., RETTNER,C. T., CHEN, Y.-C., SHELBY, R. M., SALINGA, M., KREBS,D., CHEN, S.-H., LUNG, H.-L., AND LAM, C. H. Phase-change random access memory: A scalable technology. IBM J.Res. Dev. 52, 4 (July 2008), 465–479.

[41] SILICON LABS. EFM32ZG-STK3200 zero gecko starter kit.

[42] TEXAS INSTRUMENTS. MSP430FR59xx Datasheet, 2014.

[43] TEXAS INSTRUMENTS. MSP432P401x Mixed-Signal Microcon-trollers, 2015.

[44] VAN DER WOUDE, J., AND HICKS, M. Ratchetsource code repository. https://github.com/impedimentToProgress/Ratchet, 2016.

[45] VINSCHEN, C., AND JOHNSTON, J. Newlib [computer soft-ware]. Retrieved from https://sourceware.org/newlib/, Mar 2015.

[46] WANG, J., DONG, X., AND XIE, Y. Enabling high-performancelpddrx-compatible mram. In Proceedings of the 2014 Interna-tional Symposium on Low Power Electronics and Design (NewYork, NY, USA, 2014), ISLPED ’14, ACM, pp. 339–344.

[47] WU, Y.-C., CHEN, P.-F., HU, Z.-H., CHANG, C.-H., LEE, G.-C., AND YU, W.-C. A mobile health monitoring system usingRFID ring-type pulse sensor. In International Conference onDependable, Autonomic and Secure Computing (2009), pp. 317–322.

[48] XU, M., BODIK, R., AND HILL, M. A ”flight data recorder” forenabling full-system multiprocessor deterministic replay. In In-ternational Symposium on Computer Architecture (2003), ISCA,pp. 122–135.

[49] ZHAI, B., PANT, S., NAZHANDALI, L., HANSON, S., OLSON,J., REEVES, A., MINUTH, M., HELFAND, R., AUSTIN, T.,SYLVESTER, D., AND BLAAUW, D. Energy-efficient subthresh-old processor design. Transactions on Very Large Scale Integra-tion Systems 17, 8 (Aug 2009), 1127–1137.

[50] ZHANG, H., GUMMESON, J., RANSFORD, B., AND FU, K.Moo: A batteryless computational RFID and sensing platform.Tech. Rep. UM-CS-2011-020, Department of Computer Science,University of Massachusetts Amherst, Amherst, MA, 2011.

[51] ZHANG, W., DE KRUIJF, M., LI, A., LU, S., AND SANKAR-ALINGAM, K. ConAir: Featherweight concurrency bug recoveryvia single-threaded idempotent execution. In International Con-ference on Architectural Support for Programming Languagesand Operating Systems (2013), ASPLOS, pp. 113–126.

32 12th USENIX Symposium on Operating Systems Design and Implementation USENIX Association