context-sensitive fencing: securing speculative execution ...av6ds/papers/asplos2019.pdf ·...

16
Context-Sensitive Fencing: Securing Speculative Execution via Microcode Customization Mohammadkazem Taram University of California San Diego [email protected] Ashish Venkat University of Virginia [email protected] Dean Tullsen University of California San Diego [email protected] Abstract This paper describes context-sensitive fencing (CSF), a micro- code-level defense against multiple variants of Spectre. CSF leverages the ability to dynamically alter the decoding of the instruction stream, to seamlessly inject new micro-ops, including fences, only when dynamic conditions indicate they are needed. This enables the processor to protect against the attack, but with minimal impact on the efficacy of key performance features such as speculative execution. This research also examines several alternative fence im- plementations, and introduces three new types of fences which allow most dynamic reorderings of loads and stores, but in a way that prevents speculative accesses from chang- ing visible cache state. These optimizations reduce the per- formance overhead of the defense mechanism, compared to state-of-the-art software-based fencing mechanisms by a factor of six. CCS Concepts Security and privacy Side-channel analysis and countermeasures; Systems security; Informa- tion flow control ; Computer systems organization Architectures. Keywords secure architectures; side-channel attacks; Spec- tre; speculative execution; microcode; taint tracking ACM Reference Format: Mohammadkazem Taram, Ashish Venkat, and Dean Tullsen. 2019. Context-Sensitive Fencing: Securing Speculative Execution via Mi- crocode Customization. In Proceedings of 2019 Architectural Support for Programming Languages and Operating Systems (ASPLOS’19). ACM, New York, NY, USA, 16 pages. hps://doi.org/10.1145/3297858. 3304060 1 Introduction Maximizing performance has been a major driving force in the economics of the microprocessor industry. Modern Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. ASPLOS’19, April 13–17, 2019, Providence, RI, USA © 2019 Association for Computing Machinery. ACM ISBN ISBN 978-1-4503-6240-5/19/04. . . $15.00 hps://doi.org/10.1145/3297858.3304060 processor architectures feature highly complex and sophis- ticated performance optimizations. However, scaling per- formance without considering security implications could have serious negative consequences, as evidenced by the recent pile of lawsuits [96] concerning the Meltdown [61] and Spectre [57] class of microarchitectural attacks. These events have highlighted the need to architect systems that can not only run at high speed, but can also exhibit high resilience against security attacks, not just one or the other. The goal of this work is to secure a particular performance optimization integral to modern processor architectures – speculative execution – against the Spectre class of microar- chitectural attacks, while maintaining acceptably high levels of performance. Modern processors employ branch speculation and out-of- order execution to take advantage of the available instruction- level parallelism beyond control-flow boundaries, thereby improving the overall CPU resource utilization and sustain- ing high throughput. Spectre attacks exploit speculative ex- ecution by leaking secret information along misspeculated paths via cache-based and other timing side channels. Owing to their ability to mistrain the branch predictor to deliber- ately steer execution to an attacker-intended control-flow path [49, 58], these attacks have notably demonstrated the potential to break all confidentiality and completely bypass important hardware/software security mechanisms such as ASLR, even via remotely accessed covert channels [76]. Mitigating Spectre is a particularly hard problem since it could potentially cause highly intrusive changes to the existing out-of-order processor design, severely limiting per- formance. Although Intel has announced microcode update patches to mitigate certain variants of the attack, a major- ity of the high impact vulnerabilities still largely rely on software patching [45, 72]. State-of-the-art software counter- measures take advantage of fences that mute specific effects of speculative execution by constraining the order of certain memory operations, or in some cases by completely serial- izing a portion of the dynamic instruction stream. Liberal fence insertion (e.g., at every bounds check) can mitigate the attacks, but doing so severely hurts performance; however, spraying fences more strategically at appropriate locations in the code requires extensive patching of software via re- compilation or binary translation – resulting in significant engineering effort and long delays to deployment. We need

Upload: others

Post on 09-Mar-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Context-Sensitive Fencing: Securing Speculative Execution ...av6ds/papers/asplos2019.pdf · Context-Sensitive Fencing: Securing Speculative Execution via Microcode Customization Mohammadkazem

Context-Sensitive Fencing Securing SpeculativeExecution via Microcode Customization

Mohammadkazem TaramUniversity of California San Diego

mtaramcsucsdedu

Ashish VenkatUniversity of Virginiavenkatvirginiaedu

Dean TullsenUniversity of California San Diego

tullsencsucsdedu

AbstractThis paper describes context-sensitive fencing (CSF) a micro-code-level defense against multiple variants of Spectre CSFleverages the ability to dynamically alter the decoding ofthe instruction stream to seamlessly inject new micro-opsincluding fences only when dynamic conditions indicatethey are needed This enables the processor to protect againstthe attack but with minimal impact on the efficacy of keyperformance features such as speculative executionThis research also examines several alternative fence im-

plementations and introduces three new types of fenceswhich allow most dynamic reorderings of loads and storesbut in a way that prevents speculative accesses from chang-ing visible cache state These optimizations reduce the per-formance overhead of the defense mechanism comparedto state-of-the-art software-based fencing mechanisms by afactor of six

CCS Concepts bull Security and privacyrarr Side-channelanalysis and countermeasures Systems security Informa-tion flow control bull Computer systems organization rarrArchitectures

Keywords secure architectures side-channel attacks Spec-tre speculative execution microcode taint trackingACM Reference FormatMohammadkazem Taram Ashish Venkat and Dean Tullsen 2019Context-Sensitive Fencing Securing Speculative Execution via Mi-crocode Customization In Proceedings of 2019 Architectural Supportfor Programming Languages and Operating Systems (ASPLOSrsquo19)ACMNewYork NY USA 16 pages httpsdoiorg10114532978583304060

1 IntroductionMaximizing performance has been a major driving forcein the economics of the microprocessor industry Modern

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies are notmade or distributed for profit or commercial advantage and that copies bearthis notice and the full citation on the first page Copyrights for componentsof this work owned by others than ACMmust be honored Abstracting withcredit is permitted To copy otherwise or republish to post on servers or toredistribute to lists requires prior specific permission andor a fee Requestpermissions from permissionsacmorgASPLOSrsquo19 April 13ndash17 2019 Providence RI USAcopy 2019 Association for Computing MachineryACM ISBN ISBN 978-1-4503-6240-51904 $1500httpsdoiorg10114532978583304060

processor architectures feature highly complex and sophis-ticated performance optimizations However scaling per-formance without considering security implications couldhave serious negative consequences as evidenced by therecent pile of lawsuits [96] concerning the Meltdown [61]and Spectre [57] class of microarchitectural attacks Theseevents have highlighted the need to architect systems thatcan not only run at high speed but can also exhibit highresilience against security attacks not just one or the otherThe goal of this work is to secure a particular performanceoptimization integral to modern processor architectures ndashspeculative execution ndash against the Spectre class of microar-chitectural attacks while maintaining acceptably high levelsof performance

Modern processors employ branch speculation and out-of-order execution to take advantage of the available instruction-level parallelism beyond control-flow boundaries therebyimproving the overall CPU resource utilization and sustain-ing high throughput Spectre attacks exploit speculative ex-ecution by leaking secret information along misspeculatedpaths via cache-based and other timing side channels Owingto their ability to mistrain the branch predictor to deliber-ately steer execution to an attacker-intended control-flowpath [49 58] these attacks have notably demonstrated thepotential to break all confidentiality and completely bypassimportant hardwaresoftware security mechanisms such asASLR even via remotely accessed covert channels [76]

Mitigating Spectre is a particularly hard problem sinceit could potentially cause highly intrusive changes to theexisting out-of-order processor design severely limiting per-formance Although Intel has announced microcode updatepatches to mitigate certain variants of the attack a major-ity of the high impact vulnerabilities still largely rely onsoftware patching [45 72] State-of-the-art software counter-measures take advantage of fences that mute specific effectsof speculative execution by constraining the order of certainmemory operations or in some cases by completely serial-izing a portion of the dynamic instruction stream Liberalfence insertion (eg at every bounds check) can mitigate theattacks but doing so severely hurts performance howeverspraying fences more strategically at appropriate locationsin the code requires extensive patching of software via re-compilation or binary translation ndash resulting in significantengineering effort and long delays to deployment We need

hardware architectures that can more seamlessly react tosuch attacks via unobtrusive field updates

This work proposes context-sensitive fencing (CSF) a novelmicrocode-level defense against Spectre The key compo-nents of the defense strategy include (a) a microcode cus-tomization mechanism that allows processors to surgicallyinsert fences into the dynamic instruction stream to miti-gate undesirable side-effects of speculative execution (b) adecoder-level information flow tracking (DLIFT) frameworkthat identifies potentially unsafe execution patterns to trig-ger microcode customization and (c)mistraining mitigationsthat secure the branch predictor and the return address stackTo perform secure microcode customization with mini-

mal impact on performance this work leverages context-sensitive decoding (CSD) [80] a recently proposed extensionto Intelrsquos (and othersrsquo) micro-op translation mechanism thatenables on-demand and context-sensitive customization ofthe dynamic micro-op instruction stream In addition thiswork also takes advantage of the reconfiguration frameworkoffered by CSD allowing the operating system and othertrusted entities to dynamically control the frequency typeand behavior of fences that are surgically inserted into themicro-op stream to secure speculative execution

Thiswork analyzes a significantly expanded suite of fencesconsidering different possible enforcement stages and en-forcement strategies In particular we introduce a new fencethat prevents speculative updates to cache state with mini-mal interference in the dynamic scheduling of instructionsContext-sensitive fencing has the ability to automatically

identify fence insertion points via a novel decoder-level infor-mation flow tracking-based detection mechanism Since theprocessor front-end typically churns instructions at a muchhigher rate than the rest of the pipeline information-flowtracking at the decoder-level is prone to frequent overtaint-ing and undertainting scenarios While overtainting couldhurt performance due to the increased frequency of fenceinsertion undertainting could undermine the security ofthe system for a brief window By solving these challengesthrough an early and low-overhead mistaint detection andrecoverymechanism this paper further establishes the viabil-ity of decoder-level information flow tracking as an effectiveattack detection mechanismThis work further proposes novel micro-op flows that

protect the branch predictor the branch target buffer andthe return address stack against mistraining across differentprotection domains While similar in spirit to the proposedIndirect Branch Predictor Barrier (IBPB) instruction by Intelthese micro-op flows apply more generally to a broader classof branch predictors and return address stacks and offermore fine-grained control

This research makes the following major contributions

bull It introduces context-sensitive fencing a mechanismthat leverages a dynamic decoding architecture to in-ject fences between control flow and loads withoutrecompilation or binary translation

bull It examines several variants of existing fences andintroduces three new fences which allow aggressivereordering of loads and stores without exposing anymicroarchitectural evidence in the cache of specula-tive accesses that cross the fence

bull It introduces a decoder-level dynamic informationflow tracker DLIFT which allows accurate trackingof taints early in the pipeline giving the pipeline theability to identify tainted accesses before the stagesthat enable speculative execution

bull It introduces a simple dynamic mechanism that elimi-nates redundant (per basic block) fences

bull The combination of optimizations introduced in thispaper reduce the cost of a fence-based Spectre mitiga-tion technique from 48 overhead (for a conservativescheme) to less than 8

bull It also introduces a decode-level branch predictor iso-lation technique that mitigates branch mis-trainingvariants of Spectre

2 Background and Related WorkSpeculative Execution Dynamic control speculation is awell-known instruction throughput optimization techniqueused especially in out-of-order processors to predict thebranch outcome and execute instructions along the predictedpath while waiting for the actual branch outcome to be eval-uated at a later pipeline stage In the event of a mispredic-tion the processor rolls back execution along the misspec-ulated path and redirects control to the right branch tar-get Although speculative execution is largely programmer-invisible in terms of committed architectural register andmemory state in all known instances it leaves some mi-croarchitectural side effects that can be observed throughwell-established side channels

Microarchitectural AttacksMicroarchitectural attacksleak secret information of a victim process by observing mi-croarchitectural effects of certain performancepower opti-mizations such as caches branch speculation memory disam-biguation and even dynamic voltage and frequency scaling(DVFS) through side-channels such as memory bus activ-ity [3] power consumption characteristics [7] branch accesspatterns [1 30] faults [42 51] acoustics [33 34] electromag-netic effects [31] functional unit timing characteristics [94]and most notably cache access patterns [2 32 52 62 68 97100 101]

Cache-based side-channel attacks have been shown toreveal secret information such as cryptographic keys [2738 100] keystrokes [37] and browsing activity [70] by co-locating a spy process alongside a victim in such a way that

Table 1 Speculative Attacks VariantsVariant Vulnerability NameSpectre v1 [49 57] Bounds Check Bypass (BCB)Spectre v2 [49 57] Branch Target Injection (BTI)Spectre v3 [49 61] Rogue Data Cache Load (RDCL)Spectre v3a [6 67] Rogue System Register Read (RSRD)Spectre v4 [67] Speculative Store Bypass (SSB)Spectre-NG [78] Lazy FP State RestoreSpectre v11 [54] Bounds Check Bypass Store (BCBS)Spectre v12 [54] Read-only Protection BypassSpectre v5 [59 66] Ret2Spec and SpecRSBNetSpectre [76] Remote Bounds Check BypassForeshadow [87 95] L1 Terminal Fault

1 i f ( x lt a r r a y 1 _ s i z e )2 y = a r r ay2 [ a r r ay1 [ x ] lowast 2 5 6 ] Figure 1 Example Spectre Variant-1 Gadget

they share cache memory The attack unfolds through a pre-attack step in which a spy process fillsflushes specific cachesets so that the victim leaves observable side effects in termsof its cache access patterns that can be later inferred by thespy process by timing the access to particular cache blocks

Exploiting Speculative Execution This work tacklesa highly evasive class of microarchitectural attacks calledSpectre [57] that leaks information by exploiting side effectsof speculative execution through cache-based side channelsThese attacks not only exploit unintended side effects dueto speculation but have the ability to deliberately misleadexecution into attacker-intended paths by mistraining thebranch predictor and other associated structures Severalvariants of the attack have been described (shown in Table 1)that can potentially bypass software securityintegrity pro-tection mechanisms such as bounds checking and ASLR [77]Figure 1 shows a vulnerable code target for the variant-

1 attack The code fragment is composed of a conditionalbranch that performs a bounds check and a Spectre gadgetthat results in an observable microarchitectural side effectupon execution Upon misspeculation the bounds check isbypassed and the Spectre gadget executes leaving an observ-able cache footprint to the attacker that remains even afterthe processor detects the misprediction and rolls back execu-tion In the simplest variant of the attack where the attackercontrols both x and array2 the attacker can potentially leakthe entire address space of the victimThe variant-2 Spectre attack further allows the attacker

to hijack speculative execution by mistraining the branchpredictor and associated structures enabling a ROP-styleattack [75] that stitches together Spectre gadgets To mountsuch an attack a co-located adversary process that sharesthe branch predictor (eg a browser process with severaluser threads) with the victim first forces an artificial BTB(Branch Target Buffer) entry collision using a carefully cho-sen indirect branch address The attacker next poisons thevalue of the colliding BTB entry by repeatedly executing its

1 mov eax a r r 1 _ s i z e2 cmp edi eax3 jge END_LBL4 mov eax edi

6 mov eax [ eax+ a r r 1 ]7 shl eax 0 x88 mov eax [ eax+ a r r 2 ]9 mov [ y ] eax10 END_LBL

(a) Vulnerable gadget (x86)

mov eax a r r 1 _ s i z ecmp edi eaxjge END_LBLmov eax edi____FENCE____mov eax [ eax+ a r r 1 ]shl eax 0 x8mov eax [ eax+ a r r 2 ]mov [ y ] eaxEND_LBL

(b) Fenced gadget (x86)

Figure 2 Mitigating Spectre-v1 using a Fence Instruction

own branch whose target is the address of a Spectre gadgetAfter successful mistraining of the branch predictor the pro-cessor now predicts the victimrsquos indirect branch that collidesin the BTB to be the attacker-chosen Spectre gadget andstarts speculatively executing itThe variant-3 attack (aka Meltdown) exploits the fact

that most out-of-order processors that employ dynamic spec-ulation supress loads with protection violation at instructionretirement (ie commit stage) rather than during instruc-tion execution Therefore a non-privileged memory accesscan find its way to the cache and leave a footprint allowingan attacker to now effectively read arbitrary memory con-tents of another process or even the kernel or the hypervisorthrough careful cache-based side channel analysisThe literature describes multiple attacks that conform to

the above variants but exploit different attack targets andordifferent side channels These include SgxPectre [13] that by-passes Intelrsquos SGX [44] security mechanisms to steal secretsfrom SGX enclaves the MeltdownPrimeSpectrePrime [82]that leverage a PRIME+PROBE [62] cache attack instead ofFLUSH+RELOAD [100] by exploiting the side effects of cacheline invalidation mechanisms in modern cache coherenceprotocols and the NetSpectre attack that leaks informationacross independent virtual machines on Google Cloud viaan AVX-based covert channel [76]The variant-4 Spectre attack exploits microarchitectural

side effects of the memory disambiguation feature employedby most out-of-order processors in order to allow loads to bespeculatively ordered and executed before any outstandingstore whose effective address has not been calculated yetUpon misspeculation ie if the load address has a conflictwith the outstanding store the processor flushes the pipelineand triggers the re-execution of the load and all subsequentinstructions [48] while the microarchitectural side effectsof the misspeculation linger resulting in memory disclo-sures similar to variant 1 and 3 Similarly the NG variant-3attack [78] leverages the lazy x87 floating-point restore func-tionality of Intel CPUs to read floating-point registers of avictim process

The Spectre variants 11 and 12 dubbed Speculative BufferOverflows [55] exploit the store-to-load forwarding opti-mization to stitch together Spectre gadgets More specificallythese attacks bypass a stack buffer overflow check similarto variant-1 and execute a Spectre gadget that speculativelystores malicious content (the address of the next Spectregadget) into the return address on the stack Store-to-loadforwarding combined with the fact that most modern pro-cessors break down the return instruction into micro-opsthat load the return address from the stack before transfer-ring control then result in a ROP-style execution of Spectregadgets that leave a trail of microarchitectural state behindThe variant-5 attack [58] achieves similar effect by insteadmistraining the return address stack employed by most pro-cessors to speculatively predict the target of a procedurereturn

Spectre Mitigations The current set of mitigations forSpectre range from simple coding guidelines[10] to propos-als that advocate exposing microarchitectural details in theISA [64] To mitigate Meltdown Kernel Page Table Isolation(KPTI) [17 36] has been proposed and recently patched to theLinux kernel incurring about 6 in performance To mitigateSpectre v1 multiple chip manufacturers including Intel [45]ARM [6] and AMD [4] have suggested instrumenting codewith serializing instructions or fences to inhibit speculationat specific points in execution

For example consider Figure 2a that shows the assemblycode for the Spectre variant-1 gadget in Figure 1 Figure 2bshows a software-patched version that employs a serializinginstruction or a fence to prevent the speculative execution ofthe Spectre gadget (ie lines 6-9) In most implementationsupon decoding the fence the processor stops fetching newinstructions until the fence gets committed or squashedthereby serializing execution In our example if the attackercalls the vulnerable gadget with out-of-bounds values in theedi register the processor front-end stalls until the fence iscommitted thereby disallowing the speculative execution ofthe Spectre gadget and completely mitigating the attackThe associated performance overhead with liberal fence

insertion however could be as high as 10x [69] While itis possible to perform compiler-directed code instrumen-tation with fences at a lower performance overhead [72]locating the potential Spectre targets using static analysis isnon-trivial and could therefore result in less than full cov-erage [56] Speculative Load Hardening [11 12] and YSNB(You Shall Not Bypass) [69] propose to use predicated execu-tion [50 65 73] to alleviate the high overheads for fences byinjecting an artificial data dependency between the condi-tional branch and the Spectre gadget These proposals incur36-60 overhead in performanceTo cope with variant-2 both Intel and AMD have an-

nounced microcode update patches that introduce new fenceinstructions [4 45] that prevent instructions preceding thefence from controlling the indirect branch prediction of

branches that follow the fence The fences also prevent soft-ware with lower privileges from influencing the indirectbranch prediction of software with higher privileges Fur-thermore these patches restrict indirect branch predictionfrom being controlled by co-located threads via simultane-ous multithreading [83 84] Furthermore the use of retpo-lines [45 46 85] have been advocated to replace indirectjumps and calls with an equivalent push+ret instruction se-quence in order to bypass the indirect branch predictorHowever they could further expand the attack surface inthe wake of v5 [58 66] v11 and v12 attacks [54]

On the hardware front SafeSpec [53] and InvisiSpec [99]propose mitigating side effects of speculative execution byadding new shadow user-invisible structures for caches andTLBs that store transient results from speculative instruc-tions and committing them to main cacheTLB only if thespeculation was deemed correct and the corresponding in-structions gracefully retire Although these techniques makedisruptive changes to the processormemory architectureand consistency models they make significant strides in se-cure hardware design with minimal performance impact Fi-nally Dong et al [28 29] propose leveraging Virtual Ghost [22]to protect applications running on a compromised OS kernelfrom Spectre

Secure Instruction StreamCustomization Prior workhas established that instruction customization is an effectivemeans to instrument the dynamic execution stream with se-curity checks and therebymitigate several attack vectors at arelatively low performance overhead Instruction stream cus-tomization has been proposed and evaluated in various formsand levels ranging from secure virtual architectures at theISA and compiler level [23 88 90] to full-blown processor bi-nary translation at the microcode level [9 25 71 89] Corlisset al [18ndash20] propose dynamic instruction stream editing(DISE) a macro-engine that customizes the dynamic instruc-tion stream at the decoder level by pattern-matching user-defined production rules pushed into the decoder Taram etal [80] propose context-sensitive decoding that leverages theCISC to RISC translation feature of modern instruction setdecoders to dynamically alter the behavior of programmer-visible instructions without recompilation or binary trans-lation [26 91] While we leverage these approaches in thisresearch this work targets a different class of emerging at-tacks and proposes several additional techniques beyondmicrocode customization

Information Flow Tracking Suh et al [79] first pro-posed the idea of dynamic information flow tracking (DIFT)that tags data from untrusted channels as spurious and fur-ther tracks the information flow of spurious data flaggingany violation of an enforced security policy Multiple hard-ware information flow tracking techniques have been de-scribed at various levels of the hardware [14ndash16 21 24 81

86 92] as a detection mechanism to circumvent code injec-tion [98] cross-site scripting attacks [24 93] buffer over-flows [24 98] and SQL injection [24 39 40 60 102] Thiswork proposes decoder-level information-flow tracking asa detection mechanism for Spectre and addresses severalassociated challenges

3 Assumptions and Threat ModelAmong the Spectre variants the Bounds Check Bypass (variant-1) is the one which has had the highest impact in terms ofaffected platforms and devices Therefore the main focus ofthis work is to mitigate the variant-1 attack at an acceptablelevel of performance However owing to the flexibility ad-vantages of microcode customization we also propose newmitigations against other Spectre variants documented inTable 1

We assume that the goal of the attacker is to read arbi-trary memory contents by exploiting a Spectre gadget Alsowithout loss of generality in this work we defend against aSpectre attacker that targets the most obvious and the mostimportant victim ie the OS kernel but our frameworkallows us to extend the proposed techniques to mitigate dif-ferent targets such as virtual machines and browsers Ourattack model assumes an adversary with the following capa-bilities

bull Information Leakage We assume that the adver-sary has the ability and privilege to probe flush orevict any cache line including that of the kernel at anyparticular time Also they can make precise timingmeasurements using instructions such as rdtsc andrdtscp

bull Co-location We assume that the adversary is colo-cated with the victim and can not only leak informa-tion through cache-based side channels but can alsomistrain shared branch predictor structures includingbranch target buffers and return address stacks andcan further influence the branch outcome of the victimas described in the variant-2 attack

bull User-Mode Access We assume that the adversaryhas standard user-mode access and can invoke anysystem call with arbitrary and carefully chosen ar-guments and can further access devices such as thekeyboard or network to feed the kerneldriver codewith carefully chosen data

Moreover we assume that the only communication chan-nel between the attacker and the kernel code is throughmicro-architectural side channels and the attacker does nothave any other channel direct or indirect to leak data Amongall the micro-architectural side-channels data caches are theones that are predominantly used and are the easiest to ex-ploit Thus the mitigation strategies described in this workprimarily focus on those variants of the attack that use thedata cache as their covert channel to leak information Again

Configuration Registers (MSRs)

Mac

ro-o

p qu

eue

microop-Queue

Scratchpad Regs

Runtime System

DLIFTMacro-op Stream

Macro-to-microop Translation Unit

CSD-Enabled Decoder

INC [0x802ac] FENCE LD t0[0x802ac]ADD t0 t0 1ST [0x802ac] t0 Di

spat

cher Mistaint

Detection and Recovery

Figure 3 Architectural Overview

due to the flexiblity and programmability of the proposedstrategy it is possible to easily extend the approach to miti-gate other side-channels such as functional unit contentionand AVX timing channels [76]

4 Architectural OverviewFigure 3 gives the architectural overview of our defensestrategy ndash context-sensitive fencing The central piece of theproposed architecture is an x86 microcode engine that hascontext-sensitive decoding (CSD) capabilities [80] allow-ing it to optionally translate a native x86 instruction into acustomizable alternate set of micro-ops In this work weleverage this capability to perform the surgical insertion ofspeculation fences (some existing and some newly proposed)at potentially vulnerable Spectre code targets To this end weintroduce new custommicro-op flows and new configurationmechanisms that trigger such micro-op flowsThe CSD-enabled microcode engine is provisioned with

fine-grained reconfiguration capabilities via a set of model-specific registers (MSRs) that can control the frequency typeand enforcement criteria of speculation fences inserted intothe dynamic instruction stream Such a fine-grained recon-figuration capability is especially important to this workbecause speculation fences are particularly expensive allow-ing us to surgically insert fences that impose varying degreesof restrictions on speculative execution depending upon theruntime conditions current level of threat and the nature ofthe code being executedMoreover the context-sensitive fencing framework also

benefits from a novel decoder-level information flow track-ing (DLIFT) engine that has the ability to identify untrustedinstructions that are potentially in Spectre gadgets and trig-ger alternate micro-op flows that insert speculation fencesOwing to its decoder-level and inherently speculative imple-mentation DLIFT relies on a mistaint detection and recoveryhardware implemented in the execute stage to avoid over-tainting and undertainting scenarios Finally the proposeddefense framework also includes hardware and microcode-level mechanisms to achieve branch predictor state isola-tion across protection domains to mitigate the variant-2 andvariant-4 attacks

Table 2 List of Intelrsquos Serializing InstructionsType Instr Desc

PrivilegedSeria

lizingInstructions INVD Invalidate Internal Caches

INVEPT Invalidate Translations from EPTINVLPG Invalidate TLB EntriesINVVPID Invalidate Translations Based on VPIDLIDT Load Interrupt Descriptor Table RegisterLGDT Load Global Descriptor Table RegisterLLDT Load Local Descriptor Table RegisterLTR Load Task RegisterMOV Move to Control RegisterMOV Move to Debug Register

WBINVD Write Back and Invalidate CacheWRMSR Write to Model Specific Register

Non

Priv CPUID CPU Identification

IRET Interrupt ReturnRSM Resume from System Management Mode

Mem

Ordering SFENCE Store Fence

LFENCE Load Fence

MFENCE Memory Fence

5 Design and ImplementationIn this section we describe in greater detail the architecturaltechniques and building blocks that together constitute theproposed defense strategy ndash context-sensitive fencing Firstwe describe a microcode customization framework that en-ables the surgical insertion of fences to secure speculativeexecution with minimal performance impact Second we ex-amine the full design space of fences to provide the isolationproperties we need without undue obstruction of existingpipeline performance features Third we propose a decoder-level information flow tracking (DLIFT) technique to followpotentially malicious execution patterns and trigger securemicrocode customization Finally we propose protectionmechanisms that circumvent the mistraining of the branchpredictor and associated structures

51 Microcode CustomizationSpeculation fences are a processorrsquos primary mechanism tooverride speculative execution For Spectre variant-1 context-sensitive fencing works with CSD by providing alternate de-codings of all load instructions with the alternate decodingalways including a fence micro-op that appears before theload micro-op The alternate decoding will then be triggeredor not based on runtime conditions The first considerationthen is what fence instruction to incorporate Most pro-cessors already provide a variety of fences and serializinginstructions We first study the performance and securityimpact of the existing suite of fences and serializing instruc-tions and then explore and propose new speculation fencesand context-sensitive fencing techniques that manage theimpact on performance For Spectre variant-2 CSF works

micro-op

Que

ue

Macro-op Stream Macro-to-microop

Translation Unit

CSD-Enabled Decoder

FENCE Injection

Point

micro-op

Dis

patc

her

Load Store-Queue

FU1 FU2

RSRS

RSRSRS

Cache

Cac

he

Con

trolle

r

Mem

Mem

C

ontro

ller

Fence Enforcement Point

Figure 4 Fence Enforcement Points

with CSD by providing alternate decodings of some con-trol flow and possibly insert fences andor branch predictorresetting micro-ops ndash more detail is in Section 53

511 Serializing Instructions and Memory FencesSerializing instructions are the strictest amongst all specula-tion fences and completely override speculative executionUpon decoding a serializing instruction the processor stallsfetching any subsequent instruction until all instructionspreceding the serializing instruction retire Due to the highpipeline depth and issue width of modern out-of-order su-perscalars the usage of serializing instructions could resultin long delays and considerable throughput loss Memoryfences on the other hand enforce a memory serializationpoint in the program instruction stream More specificallythese fences ensure that memory operations that appear inexecution after the fence are stalled until all the outstandingmemory requests (including the fence) complete executionTable 2 shows a complete list of Intelrsquos existing serializ-

ing instructions and fences [48] Most of the serializing in-structions need to be run in privileged mode which restrictstheir usage in defenses for victims that lack sufficient privi-leges (eg browsers) Moreover a majority of them modifythe state or contents of architectural registers the programcounter cache TLB etc and only serialize execution as aside effect thereby requiring an additional backuprestorestep when used for the sole purpose of serializing executionAn exception to this is the MOV to debug register instructionthat does not corrupt any architectural state if not actuallyin debug modeThe SFENCE instruction does not allow stores to pass

through it but does not affect loads That is it enforces thatall stores that precede the SFENCE are executed to comple-tion before any store that succeeds the SFENCE is fetchedThe MFENCE instruction restricts all memory operationsfrom passing through it Unlike SFENCE and MFENCE thatare memory ordering operations LFENCE performs a seri-alizing operation In particular LFENCE does not stop the

processor from fetching and decoding instructions that ap-pear after the LFENCE but it restricts the dispatch of suchinstructions until the instructions preceding the LFENCEcomplete execution Since LFENCE serializes execution andyet makes some progress in the front-end it has been recom-mended by Intel as a low-overhead fence that can be insertedat vulnerable points in execution to defend against Spec-tre [45] In other words while serializing instructions suchas CPUID are enforced at Fetch LFENCE is enforced at theinstruction queue level Therefore in this work we start withthe LFENCE and propose new fences that come with fewerrestrictions different enforcement policies andor sport ad-ditional optimizations

512 Fence Enforcement PoliciesIt is important to note that none of the existing instructionsthat provide fence support were actually created for the pur-pose we (or the whole industry) need for Spectre mitigationThus this work will examine existing fences variants ofexisting fences and also introduce a new fence primitive Tobetter understand the design landscape we examine severalpossible properties of fences in this section

Early vs Late Enforcement We first categorize fencesinto early-enforced and late-enforced fences based on thepipeline stage at which they are enforced In particular werefer to any serializing instruction such as an LFENCE thatis enforced at the instruction queue or earlier in the pipelineas early-enforced If the fence is enforced at a later stage suchas the reservation station the loadstore queue or the cachecontroller we refer to it as a late-enforced fence Late enforce-ment in essence shifts the fence enforcement point towardsthe leaking structure (eg the cache) reducing the impacton instructions that do not access that structure and allow-ing for the enforcement of more fine-grained serializationrules (eg allow cache hits but not misses) Figure 4 showspotential fence enforcement points at various stages in theprocessor pipelineIt is important to note that the later a fence is enforced

the fewer the side channels it protects against For exampleIntelrsquos LFENCE prevents information leakage through all mi-croarchitectural structures that appear after the instructionqueue However to prevent information leakage through theinstruction cache side channel we would have to resort toa regular serializing instruction or an MFENCE resultingin prohibitively high performance overheads Similarly afence enforced at the data cache controller level would onlymitigate data cache-based side channels and will not protectagainst an FPU-based side-channel Instead the FPU-basedside-channel may be mitigated using a different fence thatis appropriately configured and enforced at the reservationstation or the FPU

Strict vs Relaxed Enforcement Depending upon howprohibitive they are we next classify fences into strict andrelaxed Strict fences are highly prohibitive and do not allow

Table 3 Characteristics of Different Fence Types

Fence Name EnforcementPoint

StrictRelaxed

Instructionsnot Allowed

MitigatesVariants

ExistingNew

Intelrsquos SIs (CPUID) Fetch Strict All All ExistingLFENCE IQ Strict All All Existing

LSQ-LFENCE LSQ Relaxed Ld v1 NewLSQ-MFENCE LSQ Relaxed LdampSt v1v11v12 New

CFENCE CC Relaxed None v1 New CC Cache Controller RS Reservation Station

any instructions to pass through them until the fence retireswhereas relaxed fences allow certain types of instructionsto pass through them For example an SFENCE that is en-forced at the loadstore queue and allows all instructions topass except stores that have a greater sequence number thanitself is a late-enforced and relaxed fence On the other handall x86 serializing instructions including LFENCE are early-enforced and strict Customizing the micro-op stream withearly-enforced and strict fences typically results in slower ex-ecution when compared to customization with late-enforcedand relaxed fences However with carefully enforced con-straints the late-enforced and relaxed fences could offer sim-ilar if not better security guarantees

Early vs Late Commit A fence typically remains ef-fective until it gets committed or squashed Based on Intelrsquosmanual [47] a serializing instruction is only allowed to becommitted if there is no preceding outstanding store thatis waiting to be written back While this behavior might benecessary for device synchronization or memory orderingenforcement for the purpose of securing speculative exe-cution against Spectre attacks there is no need to wait forstores to be written back This is because write buffers arenrsquotcommitted to the cache until the store reaches retirementand therefore the fact that a store is waiting for an outstand-ing writeback request to complete is enough evidence that itdid not occur along a misspeculated path Allowing a fenceto commit early without waiting for preceding outstandingstores to write back can considerably improve performancebecause as soon as a fence gets committed a stream of in-structions can advance further in the pipeline Therefore inthis work we propose and study the effects of a late-commitversion of each fence that does not wait for stores to be writ-ten back However we note that these versions should onlybe limited to the security use case we describe and shouldnot be used for synchronization

Newly Proposed Fences Table 3 summarizes the char-acteristics of different existing and newly proposed fencesAlthough the existing set of fences defend against all vari-ants they incur prohibitively high costs on performance dueto their strict enforcement constraints To better understandthe potential for fences beyond those that already exist wepropose and evaluate three new types of fences In an attackscenario we typically insert one of these fences between abranch instruction and a load instruction that potentiallyleaks sensitive information via a cache side-channel Theproposed fences more effectively and efficiently mitigate the

high impact variant-1 attacks by preventing informationleaks along misspeculated paths through cache-based sidechannels We describe each of them in greater detail below

The LSQ-LFENCE and the LSQ-MFENCE are relaxed fencesenforced at the loadstore queue While LSQ-LFENCE fenceis in effect it does not allow any subsequent load instructionto be issued out of the loadstore queue thereby preventingthe cache state from being changed by load instructions onmisspeculated paths Thus the LSQ-LFENCE mitigates theSpectre variant-1 attack On the other hand the more restric-tive LSQ-MFENCE does not allow any subsequent memoryinstruction (both loads and stores) to be issued out of theloadstore queue until the fence commits In addition tomitigating the variant-1 attack that the LSQ-LFENCE pro-tects against the LSQ-MFENCE mitigates the variant-11and variant-12 attacks that exploit store-to-load forwardingbetween speculative loads and stores

The CFENCE is a relaxed fence and is enforced at the cachecontroller level using the following set of rules First like anyother fence it allows all preceding instructions to proceedSecond since store instructions do not commit the contentsof the write buffer until the instruction retires they are unaf-fected by the CFENCE Finally it labels any subsequent loadas a non-modifying load and allows it to pass through thefence but the load is restricted from modifying the cachestate In particular a non-modifying load that results in acache hit is allowed to read the contents of the cache but isrestricted from changing the LRU and other metadata bitsA non-modifying load that results in a cache miss is markedas uncacheable allowing the memory read request to com-plete without altering the cache state In this way we avoidupdating the cache state upon encountering a speculativeload and donrsquot leave any observable cache footprints alongmisspeculated paths thereby mitigating the variant Spectrevariant-1 attack In addition due to locality of referencesthe miss rate of a reasonable program is typically very lowand therefore using CFENCE results in considerably lowerperformance overhead than other types of fencesNote that this can be applied recursively at each cache

level For example an L2 hit (L1 miss) will bypass the L1cache and not initiate a fill then read from the L2 cachewithout altering the LRU bits

513 Fence Frequency OptimizationThe most naiumlve yet secure way to insert fences is to liberallyinstrument every instruction of a vulnerable type (eg loadinstructions in the case of cache side-channels) in the pro-gram and add the fence micro-op to all of them In fact notevery instance of a vulnerable instruction type is necessarilyvulnerable for example all the loads in the program arenrsquotvulnerable to speculative attacks via cache side-channelsTherefore we can reduce the number of fences insertedHowever since failing to insert fences even in one scenariowould enable a Spectre attacker to read the whole victimrsquos

memory space it is of crucial importance that fence fre-quency optimizations be conducted meticulously In the fol-lowing we introduce two secure optimizations for reducingthe number of fencesBasic Block-Level Fence InsertionThe source of the Spec-tre attack is dynamic control speculation which implies thatthe speculation begins with a branch prediction and theprocessor starts speculatively executing along the predictedpath To fully mitigate this attack we want a fence betweeneach branch and subsequent loads but if one branch is fol-lowed by four loads we only need one fence to protect allfour Thus in this optimization we propose to only instru-ment (provide the alternate decoding for) the first instanceof a vulnerable instruction type (eg first load) of each ba-sic block and it is safe to leave the rest of the instructionsuninstrumented This is simply implemented by setting aflag in hardware whenever a branch is decoded then inserta micro-op in the alternative load decoding that along withthe fence also resets the flag When the flag is not set theoriginal decoding is usedTaint-Based Fence Insertion However even one load perbasic block is likely still too conservative In all known in-stances of the attacks the attacker performs some operations(mostly memory read) based on untrusted data that leadsto the information leak For example in Spectre variant-1the attacker provides an untrusted out-of-bound index to anarray In this work we assume any information that comesfrom the user address space and input devices (eg via thex86 IN instruction) as untrusted In addition we also con-sider DMArsquod pages as untrusted for which we rely on theIOMMU [5] to mark DMArsquod pages as tainted in the pagetable For the taint-based fence insertion optimization wepropose the insertion of fences for only vulnerabletaintedloads that operate on untrusted data by leveraging a noveldecoder-level information flow tracking (DLIFT) strategydescribed in the following section

52 Decoder-Level Information Flow TrackingAs its name indicates the distinguishing feature that makesDecoder-Level Information Flow Tracking (DLIFT) uniquefrom typical hardware taint tracking systems is that it canprovide the taint information at the decoder stage of thepipeline rather than at commit stage This is a critical distinc-tion for two reasons First of course our solution is decoder-based Second and perhaps more important execute- orcommit-based taint tracking comes too late in the pipelinefor any speculation-based attack and is of little useHowever commit is still the only stage where taint in-

formation is guaranteed to be correct so the design of adecode-based taint tracker is challenging In particular sincethe front-end of the pipeline typically runs far ahead in ex-ecution than the rest of the pipeline reading actual taintsfrom register files at the decode stage will read inaccuratevalues Therefore in the proposed DLIFT framework we

Macro-to-microop Translation

Unit

CSD-Enabled Decoder

Dis

patc

her

FetchTaint

Evaluator

Load Store Queue

2

Reg File

Actu

al T

aint

Bits

3

4

5

TLB

Tain

t Bitm

ap1

ADD rax rbx

raxSpecTaint |= rbxSpecTaint

Mistaint Recovery -Flushing the pipeline -Restarting translation and execution -Recovering speculative taint map

EA T

aint

CommitUnit

FetchDecode Execute

Speculative Taint Map

~Spec_Taint ampamp

Evaluated_Taint

True

Commit

Arch TaintMap

Arch RF

Figure 5 DLIFT integration with a CSD-enabled pipeline

separate the taint information into four taint structures ndash (a)a decoder-level taint map that tracks and maintains the taintinformation for architectural registers (b) the physical reg-ister file augmented with taint information that maintainsdynamically computed taint information at execute (c) theTLB and page tables augmented with a taint bitmap to trackcache block-level taint information and (d) a commit-leveltaint map that maintains verified architectural register taintinformation While the latter three structures are typical ofany standard DIFT implementation [79] the first structureis a new addition proposed by this work Instruction trans-lation thus depends on some speculative taint tracking thatmight not be exact potentially leaving vulnerable instruc-tions unfenced (ie translating to non-secure micro-ops) Forthis reason DLIFT relies on a mistaint recovery mechanismthat upon detecting a mistaint at the execution stage redi-rects and restarts the execution from the incorrectly taintedinstructionFigure 5 shows the integration of DLIFT with a CSD-

enabled pipeline The DLIFT engine maintains a taint mapthat stores a speculative taint bit for each architectural reg-ister For each incoming instruction (bull1 ) the DLIFT engineevaluates the taint of the destination register(s) based on thespeculative taint bits of the source registers Taint evaluationfollows the standard DIFT procedure [79] If a source regis-ter is marked as tainted in the speculative taint map DLIFTpropagates this taint to the destination register and furthertriggers context-sensitive translation (bull2 ) of the instruction(eg with speculative fence insertion) depending upon theconfigured levels of performance and security

Tracking taints at the decoder level is not always straight-forward First during speculative execution the DLIFT en-gine continues to propagate taints along the misspeculatedpath potentially leaving the taint map in an inconsistentstate upon recovery from a branch misprediction To thisend the DLIFT engine rolls back the state of the taint map

to its commit-level counterpart as part of the branch mis-prediction recovery Second the effective addresses of loadinstructions is usually not known at the decoder stage dueto which the DLIFT engine cannot accurately compute andpropagate taint information A conservative approach thatmarks all loads with unknown effective addresses as taintedcould severely degrade performance since overtainting typi-cally results in a high frequency of fence insertion Thereforewe take a more optimistic approach by assuming loads withunknown effective addresses are untainted and further relyon our mistaint detection mechanism at the execute stageto validate the predicted taints against the dynamically eval-uated taints (bull3 andbull5 ) In the event a mistaint is detectedwe update the speculative taint map flush the pipeline andrestart the translation and execution of the mistainted in-struction Note that we only perform mistaint recovery forthe undertainting scenario since it could potentially leavevulnerable instructions unfenced

Furthermore the DLIFT engine is capable of following in-struction sequences and execution patterns such as a taintedload followed by a branch allowing for the seamless de-tection of different Spectre gadget variants In this way al-though speculative the DLIFT engine has the potential forfast recovery from misspeculation and further allows themicrocode customization framework to perform a more tar-geted and surgical insertion of fences for particular vulnera-ble targets

53 Mitigations for Control-flow MistrainingTwo Spectre variants (v2 and v5) rely on the mistrainingof the branch predictor and the return stack buffer to influ-ence the victimrsquos branch outcome across different protec-tion domains To mitigate these attacks we examine severalhardening mechanisms In the simplest case where we areprotecting kernel branches from getting influenced we caninstrument syscall instructions to enforce branch predictorstate isolation If we want to protect a wider range of domaincrossings we could rely on a simple hardware mechanismthat identifies control transfer to a domain with a higherprivilege and then sets a flag The first control flow instruc-tion that decodes after that flag is set would then triggeran alternate decoding and reset the flag The alternate de-coding would insert a fence (to protect against this branchbeing mispredicted) then execute micro-ops that enforcebranch predictor state isolation and subsequently resumethe original control flowThis solution then is flexible enough that any region of

code could be protected in this way by configuringmodel spe-cific range registers (MSRRs) to indicate a protected regionthat would always reset the branch predictor upon entrythis then even prevents mistraining of sensitive code ofdifferent threads within the same process To prevent theMSRRs from being tampered by the attacker we only allow

Table 4 Architecture Detail for the Baseline x86 CoreBaseline Processor

Frequency 33 GHz I cache 32 KB 8 wayFetch width 4 fused uops D cache 32 KB 8 wayIssue width 6 unfused uops ROB size 168 entriesINTFP Regfile 160144 regs IQ 54 entriesRAS size 816 32 entries BTB size 256 512 1024 entriesLQSQ size 6436 entries Functional Int ALU(6) Mult(1)Branch Predictor LTAGE Units FP ALUMult(2) SIMD(2)

the operating system to configure MSRRs and we use specu-lative fences to guard x86rsquosWRMSR instruction Furthermorein the case of remapping via mmap() the operating systemis also responsible for reconfiguring MSRRs

The easiest and the most heavyweight way to isolate theBP state is to clear all the states by a series of returnbranchmicro-ops However such a hard reset of the BP can be morequickly and efficiently performed using a special micro-opthat clears some or all state of the branch predictor therebyenabling custom solutions to reset different structures withinthe prediction unit if such a micro-op is available In thiswork we assume a special micro-op to clear the branch targetbuffer (BTB) and the return stack buffer (RSB) the primarytargets of the Spectre attackIt should be noted that while resetting the BTB and the

RSB will induce more mispredictions (something the attackerwants) it does the attacker no good with respect to theseSpectre variants because the attacker can no longer controlthe target of the control flow misspeculation

6 MethodologyThis section details the experimental methodology for theperformance and security evaluation of the context-sensitivefencing frameworkOur baseline processor is modeled after the Intel Sandy-

bridge microarchitecture [43] Table 4 shows the architec-tural configuration of our baseline processor in more detailNote that we evaluate against three different branch pre-dictor configurations (with different BTB and RAS sizes) tomeasure the performance impact of our micro-op flows thatperform branch predictor state isolation across protection do-mains to defend against variant-2Wemodel this architectureusing the gem5 [8] architectural simulator which already fea-tures x86 micro-op translation Furthermore gem5 alreadyfeatures a Spectre test infrastructure and a visualization toolthat allow us to validate our claims regarding the securityguarantees of context-sensitive fencing [63]One of the primary goals of this work is to protect ker-

nel memory from being leaked along misspeculated pathsTherefore we use the full system simulation mode of gem5which allows us to boot an Ubuntu 1804 distribution ofLinux with a kernel version of 4813 Furthermore in orderto provide realistic estimates regarding attack coverage andperformance impact of our proposed techniques we selecta good mix of benchmarks that spend different amounts ofexecution time in kernel mode touch different aspects of the

Table 5 Benchmarks DescriptionBenchmark Description Kernel-Time

nginx HTTP Web Server 66ps Process information query 75ping Sends ICMP ECHO_REQUEST to loopback 95ls Performs two level directory listing 79llu Linked list traversal micro-benchmark 7

bzip2 Compression 34gcc C Language optimizing compiler 11

omnetpp Discrete event simulation 15

sjeng Artificial Intelligence (game tree searchand pattern recognition) 22

operating system and perform different number of systemcalls during execution as illustrated in Table 5 To this endwe include four benchmarks bzip2 gcc sjeng omnet fromthe SPEC CPU2006 suite [41] that exhibit varying degreesof instruction-level parallelism Furthermore we use threecommonly used Unix tools that target different functional-ities of the kernel and use different device drivers ndash (ping)that sends and receives ICMP packets (ps aux) that queriesprocess information and (ls ) that queries the filesystemin order to list multi-level directory information In additionwe also add a memory-intensive program llu that allocatesa large amount of memory for a linked list and traversesthe list making random memory accesses [103] Finally wealso evaluate the impact of our fences on the nginx webserver [74] using the wrk [35] framework to generate HTTPrequests

7 EvaluationIn this section we evaluate both security and performanceimpacts of the proposed strategy starting first with securityassessment and following that with performance evaluation

71 Security DiscussionThe proposed defense strategy can mitigate five variants ofthe Spectre attacks We discuss these in detail below

Variants 1 11 and 12 The goal of these variants is toleak memory contents of a victim along a misspseculatedpath by bypassing a bounds check andor further stitch to-gether such Spectre gadgets via store-to-load forwardingAll variants exploit a data cache-based side channel to leakinformation Context-Sensitive Fencing mitigates these vari-ants primarily by preventing information from being leakedalong misspeculated paths via fences that are surgically in-serted into the instruction stream and strategically placedbetween the conditional branch that performs the boundscheck and the first load in the Spectre gadget This workproposes three new fences ndash LSQ-LFENCE LSQ-MFENCEand CFENCE that each defend against the variant-1 attack bydisallowing loads along misspeculated paths to modify thedata cache state LSQ-MFENCE additionally protects againstthe variants 11 and 12 by preventing both speculative loadsand stores from being issued out of the loadstore queue

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

Injected for All Kernel Loads

100

105

110

115

120

125

130

135

140

145

150

155

160

165

170

175

180

185

190

195

200

Nor

mal

ized

Exe

cution

Tim

e

LFENCE-LC

LSQ-MFENCE-LC

CFENCE-LC

Figure 6 Execution Time of Different Fence EnforcementLevels (normalized to insecure execution)

effectively avoiding store-to-load forwarding between spec-ulative loads and stores

The surgical insertion of these fences is facilitated by thedecoder-level information flow tracker (DLIFT) that can fol-low instruction sequences that could serve as Spectre gad-gets as described in Section 52 While the DLIFT itself per-forms speculative tracking of taint information it has theability to detect mistaints and recover within three stages ofthe pipeline We further experimentally show later in thissection that the speculative nature of the DLIFT engine maysometimes result in overtainting (where trusted operand-sinstructions get marked as tainted) but never results in anundetected and unrecovered undertainting (where untrustedoperandsinstructions remain untainted)

Variants 2 and 5 These variants reverse-engineer andmistrain the branch predictor and the return address stack toinfluence the branch outcomes of a victim which in our caseis the Linux kernel In addition to the surgical fence injec-tion and speculative taint tracking context-sensitive fencingfurther employs mistraining mitigations that circumventsuch attempts More specifically we intercept all protectiondomain crossings via syscall and int 0x80 instructions andperform a full reset of the branch target buffer (BTB) and thereturn stack buffer (RSB) clearing all previous state Thisprevents user code from influencing branch outcomes of thekernel code We record no instance of BTB collision betweenuser and kernel branches and we ensure that we alwaysstart with a clean BTB and RSB upon entering kernel mode

To show that our different fences and frequency optimiza-tions close Spectre attacks we use a proof of concept Spectreimplementation and visualization tool [63] The attacks failin every case when we use context-sensitive fencing to insertfences despite our performance optimizations

72 PerformanceFigure 6 measures the performance impact of three differentfences ndash (a) the standard x86 LFENCE (b) the LSQ-MFENCE

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

Injected for All Kernel Loads

100

105

110

115

120

125

130

135

140

145

Nor

mal

ized

Exe

cution

Tim

e

CFENCE-LC

CFENCE-EC

Figure 7 Execution Time of Early and Late Commit ofCFENCE (normalized to insecure execution)

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

00

01

02

03

04

05

06

07

08

09

10

Rat

ioO

ver

All

Loa

ds

bzip2 gcc omnet llu nginx sjeng ls ping ps

Non-Modifying Misses Non-Modifying Hits Misses Hits

Figure 8 Effects of injecting CFENCE on Cache Miss Rate

and (c) the CFENCE all enforced with the standard latecommit pessimistically inserted for every kernel load in theprogram Clearly from the figure the CFENCE incurs thelowest performance overhead since it is less restrictive thanthe other two and is enforced at a much later pipeline stageWe further study the effect of late and early commit policiesfor the CFENCE as shown in Figure 7 The early commitversion of the CFENCE consistently performs better than thelate commit version saving about 4 in overall executiontime on averageOverall the CFENCE reduces the incurred performance

overhead due to fencing by 23X bringing down the exe-cution time overhead from 48 to 21 Furthermore thisperformance improvement is consistent across all the bench-marks the only exception being llu which performs randommemory accesses due to the linked list traversal and suffersfrom a very high cache miss rateIn Figure 8 we study the effect of the CFENCE on cache

miss rate Recall that when a load passes a CFENCE it getsmarked as non-modifying and as a result if it ends up beinga miss in the cache it is deemed uncacheable and therefore

the fetched cache block isnrsquot brought into cache If the missrate of a program is already high as in the case of llu thepresence of a CFENCE in the instruction stream could po-tentially result in under-utilization of the cache since themissed blocks arenrsquot being filled back into the cache whilethe CFENCE is begin enforced For programs that have un-usually low hit rates we suggest using the standard LFENCEinstead of the CFENCE However in those cases it should benoted that the overhead of the mitigation is low regardlessof the fence usedFurthermore it is important to note that the number of

non-modifying loads in the dynamic instruction stream donot necessarily have a negative impact on performancewhile a CFENCE is being enforced For example while theping program has the most non-modifying loads it doesnot suffer much in terms of overall performance becausethe working set fits well into the cache and most of thenon-modifying accesses end up being hits In general wegain more from employing the CFENCE instead of standardserializing fences when the cache hit rate is lowFigure 9 shows the performance of our proposed fence

frequency optimization techniques In the first optimizationwe inject the CFENCE only for tainted loads and branches asindicated by the DLIFT engine This reduces the performanceoverhead of the defense from 21 to 11 on average We fur-ther optimize the number of fences inserted by performingbasic block-level fence insertion where we only instrumentthe first instance of a vulnerable load in each basic block Thisresults in an additional 4 improvement in performance AsFigure 9 shows the once per basic block opitimization is suc-cessful at improving the performance of all the benchmarksexcept llu That is because CFENCE changes the cache accesspattern usually at some cost but occasionally beneficiallyby bypassing accesses and reducing pressure on the cacheIn such a case like llu with its large working set executingfewer fences would then be slightly less beneficial Overallcompared to the state-of-the-art fence injection scheme thatpessimistically injects an LFENCE for all kernel loads ourDLIFT-Based CFENCE injection reduces the performanceoverhead from 48 to just 77 on averageWe next study the accuracy and coverage of our DLIFT

implementation Figure 10 examines overtainting scenarioswhere the DLIFT engine conservatively marks instructionsas tainted when theyrsquore actually not The results are nor-malized to the total number of dynamic load instructionsexecuted Further we examine two scenarios ndash the bars tothe left represent DLIFT-based fencing without the basic-block level fence insertion and the bars to the right includesthe basic-block level fence insertion optimization In mostcases the percentage of overtainted loads remains low inboth scenariosFigure 11 examines undertainting scenarios where the

DLIFT engine optimistically forgoes instrumenting certaininstructions that are actually vulnerable but recovers from

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

CFENCE

100

105

110

115

120

125

130

135

140

145

150

155

160

Nor

mal

ized

Exe

cution

Tim

e

Always Fencing

DLIFT-Based Fencing

DLIFT + Once per BB

Figure 9 Impact of Fence Frequency Optimizations on Exe-cution Time

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

00

01

02

03

04

05

06

07

08

09

10

Rat

ioO

ver

All

Loa

ds

ping ps nginx bzip2 ls sjeng llu omnet gcc

Over-tainted Loads Tainted Loads

Figure 10 Accuracy of DLIFT Overtainting Rate of Loads

such scenarios later in the pipeline We observe that thepercentage of undertainted loads is low and consistentlybelow 10 with the outliers being the kernel-heavy applica-tions ping and ps which have the highest number of taintedinstructions (instructions with at least one tainted sourceregister) as shown in Figure 12Note that our DLIFT engine performs explicit informa-

tion flow tracking modeled after DIFT [79] that tracks copyload-address store-address and computational dependen-cies However we also evaluate CSF with a more conser-vative implicit information flow tracking model in whichwe taint the program counter when the branch outcomedepends upon a tainted value and further track both sidesof the branch This results in more tainted instructions andincurs 118 extra performance overhead on top of our taint-based CFENCE insertionTo evaluate the effect of our mistraining mitigations we

measure the impact of a full micro-op based BTB and RSBreset at every protection domain crossing We do this ex-periment on three separate branch predictor configurations(a) a small predictor with 256 BTB and 8 RAS entries (b) a

pssje

ngbz

ip2

ping llu gc

c ls

nginx

omne

t

Speculative DIFT ndash Sfence

000

001

002

003

004

005

006

007

008

009

Rat

ioof

Under

-tai

nte

dLoa

ds

Figure 11 Coverage of DLIFT Undertainting Rate of Loads

omnet bzip2 llu ls nginx ping ps sjeng gcc

Speculative DIFT ndash Sfence

00

01

02

03

04

05

06

07

08

09

10

Tai

nte

dIn

tstr

uct

ions

Tainted Untatinted

Figure 12 Ratio of instructions marked as tainted by DLIFT

medium predictor with 512 BTB and 16 RAS entries and(c) a large predictor with 1024 BTB and 32 RAS entries Wemeasure an average performance degradation of 27 on oursmall predictor 66 on our medium predictor and 152 onour large predictor Naturally the overhead is almost com-pletely due to the loss of prediction accuracy rather thanthe cost of micro-op expansionIn summary the proposed defense strategy introduces a

flexible microcode customization framework that performthe surgical insertion of newly introduced speculation fencesthat mitigate five variants of the Spectre class of attacksreducing the fencing overhead of state-of-the-art fence-basedSpectre mitigations by a factor of 6

8 ConclusionIn this work we propose context-sensitive fencing (CSF) aset of architectural techniques that provide high-performancedefense against Spectre-style attacks In particular we showthat we can reduce fencing overhead by a factor of 6 com-pared to a conservative fence insertion method

This is done by injecting fence instructions dynamicallyin the decoder with no recompilation and binary translationrequired This allows us to employ runtime information tostrategically insert fences when needed using taint infor-mation and avoiding redundant fences after branches Thiswork also introduces new fence primitives which protectsensitive structures from speculation-based microarchitec-tural effects with minimal impact on instruction throughputin the pipeline

AcknowledgmentsThe authors would like to thank the anonymous reviewersfor their helpful insights This research was supported inpart by NSF Grant CNS-1652925 NSFIntel FoundationalMicroarchitecture Research Grant CCF-1823444 and DARPAunder the agreement number HR0011-18-C-0020

References[1] Onur Acıiccedilmez Ccediletin Kaya Koccedil and Jean-Pierre Seifert 2007 Predict-

ing secret keys via branch prediction In CryptographersacircĂŹ Track atthe RSA Conference

[2] Onur Acıiccedilmez Werner Schindler and Ccediletin K Koccedil 2007 Cachebased remote timing attack on the AES In CryptographersacircĂŹ Trackat the RSA Conference

[3] Shaizeen Aga and Satish Narayanasamy 2017 InvisiMem SmartMemory Defenses for Memory Bus Side Channel In Proceedings ofthe 44th Annual International Symposium on Computer Architecture(ISCA rsquo17) ACM New York NY USA 94ndash106 httpsdoiorg10114530798563080232

[4] AMD 2018 White paper SOFTWARE TECHNIQUES FOR MANAG-ING SPECULATION ON AMD PROCESSORS Technical Report RE-VISION 12418 httpsdeveloperamdcomwp-contentresourcesManaging-Speculation-on-AMD-Processorspdf

[5] Nadav Amit Muli Ben-Yehuda Dan Tsafrir and Assaf Schuster 2011vIOMMU Efficient IOMMU Emulation In USENIX Annual TechnicalConference

[6] ARM 2018 Whitepaper Cache Speculation Side-channelsTechnical Report Version 22 httpsdeveloperarmcom-mediadeveloperpdfSecurity20update201020July2018CacheSpeculationSide-channels-v22pdfrevision=7de26366-a49f-4c23-85f0-34e8f5e38881

[7] Eli Biham and Adi Shamir 1997 Differential fault analysis of secretkey cryptosystems In Annual International Cryptology Conference

[8] Nathan L Binkert Ronald G Dreslinski Lisa R Hsu Kevin T Lim Ali GSaidi and Steven K Reinhardt 2006 The M5 Simulator ModelingNetworked Systems Micro IEEE (2006)

[9] Darrell Boggs Gary Brown Nathan Tuck and K S Venkatraman2015 Denver Nvidiarsquos First 64-bit ARM Processor IEEE Micro (2015)

[10] Chandler Carruth 2018 Mitigating Speculative Attacks inCrypto httpsgithubcomHACS-workshopspectre-mitigationsblobmastercryptoguidelinesmd Online accessed Jul 2018

[11] Chandler Carruth 2018 RFC Speculative Load Hardening (aSpectre variant1 mitigation) httpsdocsgooglecomdocumentd1wwcfv3UV9ZnZVcGiGuoITT61eKo3TmoCS3uXLcJR0edit Onlineaccessed Jul 2018

[12] Chandler Carruth 2018 RFC Speculative Load Hardening (a Spectrevariant1 mitigation) httpslistsllvmorgpipermailllvm-dev2018-March122085html Online accessed Jul 2018

[13] Guoxing Chen Sanchuan Chen Yuan Xiao Yinqian Zhang ZhiqiangLin and Ten H Lai 2018 SgxPectre Attacks Leaking Enclave

Secrets via Speculative Execution CoRR abs180209085 (2018)arXiv180209085 httparxivorgabs180209085

[14] H Chen X Wu L Yuan B Zang P c Yew and F T Chong 2008From Speculation to Security Practical and Efficient InformationFlow Tracking Using Speculative Hardware In 2008 InternationalSymposium on Computer Architecture 401ndash412 httpsdoiorg101109ISCA200818

[15] Shimin Chen Phillip B GibbonsMichael Kozuch and Todd CMowry2011 Log-based Architectures Using Multicore to Help SoftwareBehave Correctly SIGOPS Oper Syst Rev 45 1 (Feb 2011) 84ndash91httpsdoiorg10114519450231945034

[16] Hari Cherupalli Henry Duwe Weidong Ye Rakesh Kumar and JohnSartori 2017 Software-based Gate-level Information Flow Securityfor IoT Systems In Proceedings of the 50th Annual IEEEACM Interna-tional Symposium on Microarchitecture (MICRO-50 rsquo17) ACM NewYork NY USA 328ndash340 httpsdoiorg10114531239393123955

[17] JonathanCorbet 2017 The current state of kernel page-table isolationhttpslwnnetArticles741878 Online accessed Jan 2018

[18] Marc L Corliss E Christopher Lewis and Amir Roth 2003 DISE Aprogrammable macro engine for customizing applications In Com-puter Architecture 2003 Proceedings 30th Annual International Sym-posium on

[19] Marc L Corliss E Christopher Lewis and Amir Roth 2005 Low-overhead interactive debugging via dynamic instrumentation withDISE In 11th International Symposium on High-Performance ComputerArchitecture

[20] Marc L Corliss E Christopher Lewis and Amir Roth 2005 UsingDISE to Protect Return Addresses from Attack SIGARCH ComputArchit News (March 2005)

[21] Jedidiah R Crandall and Frederic T Chong 2004 Minos Control DataAttack Prevention Orthogonal toMemoryModel In Proceedings of the37th Annual IEEEACM International Symposium on Microarchitecture(MICRO 37) IEEE Computer Society Washington DC USA 221ndash232httpsdoiorg101109MICRO200426

[22] John Criswell Nathan Dautenhahn and Vikram Adve 2014 VirtualGhost Protecting Applications from Hostile Operating Systems InProceedings of the 19th International Conference on Architectural Sup-port for Programming Languages and Operating Systems (ASPLOS rsquo14)ACM New York NY USA 81ndash96 httpsdoiorg10114525419402541986

[23] John Criswell Andrew Lenharth Dinakar Dhurjati and Vikram Adve2007 Secure Virtual Architecture A Safe Execution Environmentfor Commodity Operating Systems In Proceedings of Twenty-firstACM SIGOPS Symposium on Operating Systems Principles (SOSP rsquo07)ACM New York NY USA 351ndash366 httpsdoiorg10114512942611294295

[24] Michael Dalton Hari Kannan and Christos Kozyrakis 2007 RakshaA Flexible Information Flow Architecture for Software Security InProceedings of the 34th Annual International Symposium on ComputerArchitecture (ISCA rsquo07) ACM New York NY USA 482ndash493 httpsdoiorg10114512506621250722

[25] James C Dehnert Brian K Grant John P Banning Richard JohnsonThomas Kistler Alexander Klaiber and Jim Mattson 2003 The Trans-meta Code Morphing Software using speculation recovery andadaptive retranslation to address real-life challenges In Proceedingsof the international symposium on Code generation and optimizationfeedback-directed and runtime optimization

[26] Matthew DeVuyst Ashish Venkat and Dean M Tullsen 2012 Ex-ecution migration in a heterogeneous-isa chip multiprocessor InProceedings of the Seventeenth International Conference on Architec-tural Support for Programming Languages and Operating Systems

[27] Craig Disselkoen David Kohlbrenner Leo Porter and Dean Tullsen2017 Prime+Abort A Timer-Free High-Precision L3 CacheAttack using Intel TSX In 26th USENIX Security Symposium

(USENIX Security 17) USENIX Association Vancouver BC 51ndash67 httpswwwusenixorgconferenceusenixsecurity17technical-sessionspresentationdisselkoen

[28] Xiaowan Dong Zhuojia Shen John Criswell Alan Cox and SandhyaDwarkadas 2018 Spectres Virtual Ghosts and Hardware SupportIn Proceedings of the 7th International Workshop on Hardware andArchitectural Support for Security and Privacy (HASP rsquo18) ACM NewYork NY USA Article 5 9 pages httpsdoiorg10114532142923214297

[29] Xiaowan Dong Zhuojia Shen John Criswell Alan L Cox and Sand-hya Dwarkadas 2018 Shielding Software From Privileged Side-Channel Attacks In 27th USENIX Security Symposium (USENIX Se-curity 18) USENIX Association Baltimore MD 1441ndash1458 httpswwwusenixorgconferenceusenixsecurity18presentationdong

[30] Dmitry Evtyushkin Ryan Riley Nael CSE Abu-Ghazaleh ECE andDmitry Ponomarev 2018 BranchScope A New Side-Channel Attackon Directional Branch Predictor In Proceedings of the Twenty-ThirdInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems (ASPLOS rsquo18) ACM New York NYUSA 693ndash707 httpsdoiorg10114531731623173204

[31] Karine Gandolfi Christophe Mourtel and Francis Olivier 2001 Elec-tromagnetic analysis Concrete results In International Workshop onCryptographic Hardware and Embedded Systems

[32] Qian Ge Yuval Yarom David Cock and Gernot Heiser 2018 Asurvey of microarchitectural timing attacks and countermeasures oncontemporary hardware Journal of Cryptographic Engineering 8 1(2018) 1ndash27

[33] Daniel Genkin Adi Shamir and Eran Tromer 2014 RSA key ex-traction via low-bandwidth acoustic cryptanalysis In InternationalCryptology Conference

[34] Daniel Genkin Adi Shamir and Eran Tromer 2016 Acoustic Crypt-analysis Journal of Cryptology (2016)

[35] Github [n d] wrk - a HTTP benchmarking tool httpsgithubcomwgwrk Online accessed Jun 2018

[36] Daniel Gruss Moritz Lipp Michael Schwarz Richard Fellner Cleacute-mentine Maurice and Stefan Mangard 2017 Kaslr is dead long livekaslr In International Symposium on Engineering Secure Software andSystems Springer 161ndash176

[37] Daniel Gruss Raphael Spreitzer and Stefan Mangard 2015Cache Template Attacks Automating Attacks on InclusiveLast-Level Caches In 24th USENIX Security Symposium (USENIXSecurity 15) USENIX Association Washington DC 897ndash912httpswwwusenixorgconferenceusenixsecurity15technical-sessionspresentationgruss

[38] Maurice CleacutementineWagner Klaus Gruss Daniel and StefanMangard2016 Flush+Flush A Fast and Stealthy Cache Attack

[39] V Haldar D Chandra andM Franz 2005 Dynamic taint propagationfor Java In 21st Annual Computer Security Applications Conference(ACSACrsquo05) 9 ppndash311 httpsdoiorg101109CSAC200521

[40] W Halfond A Orso and P Manolios 2008 WASP Protecting WebApplications Using Positive Tainting and Syntax-Aware EvaluationIEEE Transactions on Software Engineering 34 1 (Jan 2008) 65ndash81httpsdoiorg101109TSE200770748

[41] John L Henning 2006 SPEC CPU2006 Benchmark DescriptionsSIGARCH Comput Archit News (Sept 2006)

[42] Jonathan J Hoch and Adi Shamir 2004 Fault analysis of streamciphers In International Workshop on Cryptographic Hardware andEmbedded Systems

[43] Intel 2008 2nd Generation Intel Core vPro Processor Family Tech-nical Report httpwwwintelcomcontentdamdocwhite-paperperformance-2nd-generation-core-vpro-family-paperpdf

[44] Intel 2014 Software guard extensions programming reference (2014)httpssoftwareintelcomsitesdefaultfiles329298-001pdf

[45] Intel 2018 White paper Intel Analysis of Speculative Exe-cution Side Channels Technical Report 336983-001 Revision10 httpsnewsroomintelcomwp-contentuploadssites11201801Intel-Analysis-of-Speculative-Execution-Side-Channelspdf

[46] Intel 2018 White paper Retpoline A Branch Target In-jection Mitigation Technical Report 337131-003 Revision003 httpssoftwareintelcomsitesdefaultfilesmanaged1d46Retpoline-A-Branch-Target-Injection-Mitigationpdf

[47] Intel Corporation 2009 Intelreg 64 and IA-32 Architectures OptimizationReference Manual

[48] Intel Corporation 2011 Intel 64 and IA-32 Architectures SoftwareDeveloperrsquos Manual Intel Corporation

[49] Project Zero Jann Horn 2018 Reading privileged memory with a side-channel httpsgoogleprojectzeroblogspotcom201801reading-privileged-memory-with-sidehtml Online accessed Jan 2018

[50] Morteza Mohajjel Kafshdooz Mohammadkazem Taram Sepehr As-sadi and Alireza Ejlali 2016 A Compile-Time Optimization MethodforWCET Reduction in Real-Time Embedded Systems Through BlockFormation ACMTrans Archit Code Optim 12 4 Article 66 (Jan 2016)25 pages httpsdoiorg1011452845083

[51] R Karri K Wu P Mishra and Yongkook Kim 2002 Concurrenterror detection schemes for fault-based side-channel cryptanalysisof symmetric block ciphers IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems 21 12 (Dec 2002) 1509ndash1517httpsdoiorg101109TCAD2002804378

[52] Mehmet Kayaalp Khaled N Khasawneh Hodjat Asghari EsfedenJesse Elwell Nael Abu-Ghazaleh Dmitry Ponomarev and AamerJaleel 2017 RIC Relaxed Inclusion Caches for Mitigating LLC Side-Channel Attacks In Proceedings of the 54th Annual Design AutomationConference 2017 (DAC rsquo17) ACM New York NY USA Article 76 pages httpsdoiorg10114530616393062313

[53] Khaled N Khasawneh Esmaeil Mohammadian Koruyeh ChengyuSong Dmitry Evtyushkin Dmitry Ponomarev and Nael Abu-Ghazaleh 2018 SafeSpec Banishing the Spectre of a Meltdown withLeakage-Free Speculation arXiv preprint arXiv180605179 (2018)

[54] Vladimir Kiriansky and Carl Waldspurger 2018 Speculative bufferoverflows Attacks and defenses arXiv preprint arXiv180703757(2018)

[55] Vladimir Kiriansky and Carl Waldspurger 2018 Speculative BufferOverflows Attacks and Defenses arXiv preprint arXiv180703757(2018)

[56] Paul Kocher 2018 Spectre Mitigations in MicrosoftrsquosCC++ Compiler httpswwwpaulkochercomdocMicrosoftCompilerSpectreMitigationhtml Online accessedJul 2018

[57] Paul Kocher Daniel Genkin Daniel Gruss Werner Haas MikeHamburg Moritz Lipp Stefan Mangard Thomas Prescher MichaelSchwarz and Yuval Yarom 2018 Spectre Attacks Exploiting Specu-lative Execution ArXiv e-prints (Jan 2018) arXiv180101203

[58] Esmaeil Mohammadian Koruyeh Khaled N Khasawneh ChengyuSong and Nael Abu-Ghazaleh 2018 Spectre Returns SpeculationAttacks using the Return Stack Buffer In 12th USENIXWorkshop on Of-fensive Technologies (WOOT 18) USENIX Association Baltimore MDhttpswwwusenixorgconferencewoot18presentationkoruyeh

[59] Esmaeil Mohammadian Koruyeh Khaled N Khasawneh ChengyuSong and Nael Abu-Ghazaleh 2018 Spectre Returns SpeculationAttacks using the Return Stack Buffer In 12th USENIXWorkshop on Of-fensive Technologies (WOOT 18) USENIX Association Baltimore MDhttpswwwusenixorgconferencewoot18presentationkoruyeh

[60] Monica S Lam Michael Martin Benjamin Livshits and John Wha-ley 2008 Securing Web Applications with Static and DynamicInformation Flow Tracking In Proceedings of the 2008 ACM SIG-PLAN Symposium on Partial Evaluation and Semantics-based Pro-gram Manipulation (PEPM rsquo08) ACM New York NY USA 3ndash12

httpsdoiorg10114513284081328410[61] Moritz Lipp Michael Schwarz Daniel Gruss Thomas Prescher

Werner Haas Anders Fogh Jann Horn Stefan Mangard Paul KocherDaniel Genkin Yuval Yarom and Mike Hamburg 2018 MeltdownReading Kernel Memory from User Space In 27th USENIX SecuritySymposium (USENIX Security 18)

[62] Fangfei Liu Yuval Yarom Qian Ge Gernot Heiser and Ruby BLee 2015 Last-Level Cache Side-Channel Attacks Are PracticalIn Proceedings of the 2015 IEEE Symposium on Security and Privacy(SP rsquo15) IEEE Computer Society Washington DC USA 605ndash622httpsdoiorg101109SP201543

[63] Jason Lowe-Power 2018 Visualizing Spectre with gem5 httpwwwlowepowercomjasonvisualizing-spectre-with-gem5html Onlineaccessed Jun 2018

[64] Jason Lowe-Power Venkatesh Akella Matthew K Farrens Samuel TKing and Christopher J Nitta 2018 Position Paper A Case forExposing Extra-architectural State in the ISA In Proceedings of the7th International Workshop on Hardware and Architectural Support forSecurity and Privacy (HASP rsquo18) ACM New York NY USA Article 86 pages httpsdoiorg10114532142923214300

[65] Scott A Mahlke David C Lin William Y Chen Richard E Hank andRoger A Bringmann 1992 Effective compiler support for predicatedexecution using the hyperblock In MICRO

[66] Giorgi Maisuradze and Christian Rossow 2018 Ret2Spec SpeculativeExecution Using Return Stack Buffers In Proceedings of the 2018 ACMSIGSAC Conference on Computer and Communications Security (CCSrsquo18) ACM New York NY USA 2109ndash2122 httpsdoiorg10114532437343243761

[67] Microsoft Security Response Center (MSRC) Matt Miller 2018Analysis and mitigation of speculative store bypass (CVE-2018-3639) httpsblogstechnetmicrosoftcomsrd20180521analysis-and-mitigation-of-speculative-store-bypass-cve-2018-3639 Onlineaccessed Jul 2018

[68] Keaton Mowery Sriram Keelveedhi and Hovav Shacham 2012 AreAES x86 cache timing attacks still feasible In Proceedings of the 2012ACM Workshop on Cloud computing security workshop

[69] Oleksii Oleksenko Bohdan Trach Tobias Reiher Mark Silbersteinand Christof Fetzer 2018 You Shall Not Bypass Employing datadependencies to prevent Bounds Check Bypass CoRR abs180508506(2018) arXiv180508506 httparxivorgabs180508506

[70] Yossef Oren Vasileios P Kemerlis Simha Sethumadhavan and An-gelos D Keromytis 2015 The Spy in the Sandbox Practical CacheAttacks in JavaScript and Their Implications In Proceedings of the22Nd ACM SIGSAC Conference on Computer and CommunicationsSecurity

[71] Guilherme Ottoni Thomas Hartin ChristopherWeaver Jason BrandtBelliappa Kuttanna and HongWang 2011 Harmonia A TransparentEfficient and Harmonious Dynamic Binary Translator Targeting theIntelampReg Architecture In Proceedings of the 8th ACM InternationalConference on Computing Frontiers (CF rsquo11) ACM New York NYUSA Article 26 10 pages httpsdoiorg10114520166042016635

[72] Andrew Pardoe 2018 Spectre mitigations in MSVC httpsblogsmsdnmicrosoftcomvcblog20180115spectre-mitigations-in-msvc Online accessed Jul 2018

[73] Joseph CH Park and Mike Schlansker 1991 On predicated executionHewlett-Packard Laboratories Palo Alto California

[74] Will Reese 2008 Nginx The High-performance Web Server andReverse Proxy Linux J 2008 173 Article 2 (Sept 2008) httpdlacmorgcitationcfmid=14122021412204

[75] Ryan Roemer Erik Buchanan Hovav Shacham and Stefan Savage2012 Return-oriented programming Systems languages and ap-plications ACM Transactions on Information and System Security(2012)

[76] Michael Schwarz Martin Schwarzl Moritz Lipp and Daniel Gruss2018 NetSpectre Read Arbitrary Memory over Network arXivpreprint arXiv180710535 (2018)

[77] Hovav Shacham Matthew Page Ben Pfaff Eu-Jin Goh NagendraModadugu and Dan Boneh 2004 On the Effectiveness of Address-space Randomization In Proceedings of the 11th ACM Conference onComputer and Communications Security (CCS rsquo04) ACM New YorkNY USA 298ndash307 httpsdoiorg10114510300831030124

[78] Julian Stecklina and Thomas Prescher 2018 LazyFP Leaking FPURegister State using Microarchitectural Side-Channels arXiv preprintarXiv180607480 (2018)

[79] G Edward Suh Jae W Lee David Zhang and Srinivas Devadas2004 Secure Program Execution via Dynamic Information FlowTracking In Proceedings of the 11th International Conference onArchitectural Support for Programming Languages and OperatingSystems (ASPLOS XI) ACM New York NY USA 85ndash96 httpsdoiorg10114510243931024404

[80] Mohammadkazem Taram Ashish Venkat and Dean Tullsen 2018Mobilizing the Micro-Ops Exploiting Context Sensitive Decodingfor Security and Energy Efficiency In Proceedings of the 45th AnnualInternational Symposium on Computer Architecture (ISCA rsquo18)

[81] Mohit Tiwari Hassan MG Wassel Bita Mazloom ShashidharMysore Frederic T Chong and Timothy Sherwood 2009 Com-plete Information Flow Tracking from the Gates Up In Proceedings ofthe 14th International Conference on Architectural Support for Program-ming Languages and Operating Systems (ASPLOS XIV) ACM NewYork NY USA 109ndash120 httpsdoiorg10114515082441508258

[82] Caroline Trippel Daniel Lustig and Margaret Martonosi 2018MeltdownPrime and SpectrePrime Automatically-Synthesized At-tacks Exploiting Invalidation-Based Coherence Protocols CoRRabs180203802 (2018) arXiv180203802 httparxivorgabs180203802

[83] DeanM Tullsen Susan J Eggers Joel S Emer Henry M Levy Jack L Loand Rebecca L Stamm 1996 Exploiting choice Instruction fetch andissue on an implementable simultaneous multithreading processor InACM SIGARCH Computer Architecture News Vol 24 ACM 191ndash202

[84] Dean M Tullsen Susan J Eggers and Henry M Levy 1995 Si-multaneous Multithreading Maximizing On-chip Parallelism InProceedings of the 22Nd Annual International Symposium on Com-puter Architecture (ISCA rsquo95) ACM New York NY USA 392ndash403httpsdoiorg101145223982224449

[85] Paul Turner 2018 Retpoline a software construct for preventingbranch-target-injection httpssupportgooglecomfaqsanswer7625886 Online accessed Jul 2018

[86] N Vachharajani M J Bridges J Chang R Rangan G Ottoni J ABlome G A Reis M Vachharajani and D I August 2004 RIFLEAn Architectural Framework for User-Centric Information-Flow Se-curity In Microarchitecture 2004 MICRO-37 2004 37th InternationalSymposium on 243ndash254 httpsdoiorg101109MICRO200431

[87] Jo Van Bulck Marina Minkin Ofir Weisse Daniel Genkin BarisKasikci Frank Piessens Mark Silberstein Thomas F Wenisch Yu-val Yarom and Raoul Strackx 2018 Foreshadow Extracting theKeys to the Intel SGX Kingdom with Transient Out-of-Order Execu-tion In Proceedings of the 27th USENIX Security Symposium USENIXAssociation See also technical report Foreshadow-NG [95]

[88] Ashish Venkat 2018 Breaking the ISA Barrier in Modern ComputingPhD Dissertation UC San Diego

[89] Ashish Venkat Arvind Krishnaswamy Koichi Yamada and RajanPalanivel 2015 Binary Translation driven Program State RelocationIn United States Patent Grant US009135435B2

[90] Ashish Venkat S Shamasunder Hovav Shacham and DeanM Tullsen2016 HIPStR Heterogeneous-ISA Program State Relocation In Pro-ceedings of the International Symposium on Architectural Support forProgramming Languages and Operating Systems

[91] Ashish Venkat and Dean M Tullsen 2014 Harnessing ISA diversityDesign of a heterogeneous-ISA chip multiprocessor In Proceedingsof the International Symposium on Computer Architecture

[92] G Venkataramani I Doudalis Y Solihin and M Prvulovic 2008FlexiTaint A programmable accelerator for dynamic taint propaga-tion In 2008 IEEE 14th International Symposium on High PerformanceComputer Architecture 173ndash184 httpsdoiorg101109HPCA20084658637

[93] Philipp Vogt Florian Nentwich Nenad Jovanovic Engin KirdaChristopher Kruegel and Giovanni Vigna 2007 Cross Site Script-ing Prevention with Dynamic Data Tainting and Static Analysis InNDSS Vol 2007 12

[94] Zhenghong Wang and Ruby B Lee 2006 Covert and side channelsdue to processor architecture In Computer Security Applications Con-ference 2006 ACSACrsquo06 22nd Annual IEEE 473ndash482

[95] Ofir Weisse Jo Van Bulck Marina Minkin Daniel Genkin BarisKasikci Frank Piessens Mark Silberstein Raoul Strackx Thomas FWenisch and Yuval Yarom 2018 Foreshadow-NG Breaking theVirtual Memory Abstraction with Transient Out-of-Order Execu-tion Technical report (2018) See also USENIX Security paper Fore-shadow [87]

[96] Troy Wolverton 2018 Spectre and Meltdown are now a le-gal pain for Intel the chip maker faces 35 lawsuits over the at-tacks httpswwwbusinessinsidercom35-lawsuits-have-been-filed-against-intel-over-spectre-and-meltdown-2018-2 Online ac-cessed Aug 2018

[97] Zhenyu Wu Zhang Xu and Haining Wang 2012 Whispers in theHyper-space High-speed Covert Channel Attacks in the Cloud InPresented as part of the 21st USENIX Security Symposium (USENIX Se-curity 12) USENIX Bellevue WA 159ndash173 httpswwwusenixorgconferenceusenixsecurity12technical-sessionspresentationwu

[98] Wei Xu Sandeep Bhatkar and Ramachandran Sekar 2006 Taint-Enhanced Policy Enforcement A Practical Approach to Defeat aWide Range of Attacks In USENIX Security Symposium 121ndash136

[99] M Yan J Choi D Skarlatos A Morrison C Fletcher and J Torrellas2018 InvisiSpec Making Speculative Execution Invisible in the CacheHierarchy In 2018 51st Annual IEEEACM International Symposium onMicroarchitecture (MICRO) 428ndash441 httpsdoiorg101109MICRO201800042

[100] Yuval Yarom and Katrina Falkner 2014 FLUSH+ RELOAD A HighResolution Low Noise L3 Cache Side-Channel Attack In USENIXSecurity

[101] Yuval Yarom Daniel Genkin and Nadia Heninger 2016 CacheBleedA timing attack on OpenSSL constant time RSA In InternationalConference on Cryptographic Hardware and Embedded Systems

[102] Alexander Yip Xi Wang Nickolai Zeldovich and M Frans Kaashoek2009 Improving Application Security with Data Flow AssertionsIn Proceedings of the ACM SIGOPS 22Nd Symposium on OperatingSystems Principles (SOSP rsquo09) ACM New York NY USA 291ndash304httpsdoiorg10114516295751629604

[103] Craig Zilles 2001 Linked List Traversal Micro-Benchmark httpzillescsillinoisedullubenchmarkhtml Online accessed Jun 2018

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
  • 3 Assumptions and Threat Model
  • 4 Architectural Overview
  • 5 Design and Implementation
    • 51 Microcode Customization
    • 52 Decoder-Level Information Flow Tracking
    • 53 Mitigations for Control-flow Mistraining
      • 6 Methodology
      • 7 Evaluation
        • 71 Security Discussion
        • 72 Performance
          • 8 Conclusion
          • Acknowledgments
          • References
Page 2: Context-Sensitive Fencing: Securing Speculative Execution ...av6ds/papers/asplos2019.pdf · Context-Sensitive Fencing: Securing Speculative Execution via Microcode Customization Mohammadkazem

hardware architectures that can more seamlessly react tosuch attacks via unobtrusive field updates

This work proposes context-sensitive fencing (CSF) a novelmicrocode-level defense against Spectre The key compo-nents of the defense strategy include (a) a microcode cus-tomization mechanism that allows processors to surgicallyinsert fences into the dynamic instruction stream to miti-gate undesirable side-effects of speculative execution (b) adecoder-level information flow tracking (DLIFT) frameworkthat identifies potentially unsafe execution patterns to trig-ger microcode customization and (c)mistraining mitigationsthat secure the branch predictor and the return address stackTo perform secure microcode customization with mini-

mal impact on performance this work leverages context-sensitive decoding (CSD) [80] a recently proposed extensionto Intelrsquos (and othersrsquo) micro-op translation mechanism thatenables on-demand and context-sensitive customization ofthe dynamic micro-op instruction stream In addition thiswork also takes advantage of the reconfiguration frameworkoffered by CSD allowing the operating system and othertrusted entities to dynamically control the frequency typeand behavior of fences that are surgically inserted into themicro-op stream to secure speculative execution

Thiswork analyzes a significantly expanded suite of fencesconsidering different possible enforcement stages and en-forcement strategies In particular we introduce a new fencethat prevents speculative updates to cache state with mini-mal interference in the dynamic scheduling of instructionsContext-sensitive fencing has the ability to automatically

identify fence insertion points via a novel decoder-level infor-mation flow tracking-based detection mechanism Since theprocessor front-end typically churns instructions at a muchhigher rate than the rest of the pipeline information-flowtracking at the decoder-level is prone to frequent overtaint-ing and undertainting scenarios While overtainting couldhurt performance due to the increased frequency of fenceinsertion undertainting could undermine the security ofthe system for a brief window By solving these challengesthrough an early and low-overhead mistaint detection andrecoverymechanism this paper further establishes the viabil-ity of decoder-level information flow tracking as an effectiveattack detection mechanismThis work further proposes novel micro-op flows that

protect the branch predictor the branch target buffer andthe return address stack against mistraining across differentprotection domains While similar in spirit to the proposedIndirect Branch Predictor Barrier (IBPB) instruction by Intelthese micro-op flows apply more generally to a broader classof branch predictors and return address stacks and offermore fine-grained control

This research makes the following major contributions

bull It introduces context-sensitive fencing a mechanismthat leverages a dynamic decoding architecture to in-ject fences between control flow and loads withoutrecompilation or binary translation

bull It examines several variants of existing fences andintroduces three new fences which allow aggressivereordering of loads and stores without exposing anymicroarchitectural evidence in the cache of specula-tive accesses that cross the fence

bull It introduces a decoder-level dynamic informationflow tracker DLIFT which allows accurate trackingof taints early in the pipeline giving the pipeline theability to identify tainted accesses before the stagesthat enable speculative execution

bull It introduces a simple dynamic mechanism that elimi-nates redundant (per basic block) fences

bull The combination of optimizations introduced in thispaper reduce the cost of a fence-based Spectre mitiga-tion technique from 48 overhead (for a conservativescheme) to less than 8

bull It also introduces a decode-level branch predictor iso-lation technique that mitigates branch mis-trainingvariants of Spectre

2 Background and Related WorkSpeculative Execution Dynamic control speculation is awell-known instruction throughput optimization techniqueused especially in out-of-order processors to predict thebranch outcome and execute instructions along the predictedpath while waiting for the actual branch outcome to be eval-uated at a later pipeline stage In the event of a mispredic-tion the processor rolls back execution along the misspec-ulated path and redirects control to the right branch tar-get Although speculative execution is largely programmer-invisible in terms of committed architectural register andmemory state in all known instances it leaves some mi-croarchitectural side effects that can be observed throughwell-established side channels

Microarchitectural AttacksMicroarchitectural attacksleak secret information of a victim process by observing mi-croarchitectural effects of certain performancepower opti-mizations such as caches branch speculation memory disam-biguation and even dynamic voltage and frequency scaling(DVFS) through side-channels such as memory bus activ-ity [3] power consumption characteristics [7] branch accesspatterns [1 30] faults [42 51] acoustics [33 34] electromag-netic effects [31] functional unit timing characteristics [94]and most notably cache access patterns [2 32 52 62 68 97100 101]

Cache-based side-channel attacks have been shown toreveal secret information such as cryptographic keys [2738 100] keystrokes [37] and browsing activity [70] by co-locating a spy process alongside a victim in such a way that

Table 1 Speculative Attacks VariantsVariant Vulnerability NameSpectre v1 [49 57] Bounds Check Bypass (BCB)Spectre v2 [49 57] Branch Target Injection (BTI)Spectre v3 [49 61] Rogue Data Cache Load (RDCL)Spectre v3a [6 67] Rogue System Register Read (RSRD)Spectre v4 [67] Speculative Store Bypass (SSB)Spectre-NG [78] Lazy FP State RestoreSpectre v11 [54] Bounds Check Bypass Store (BCBS)Spectre v12 [54] Read-only Protection BypassSpectre v5 [59 66] Ret2Spec and SpecRSBNetSpectre [76] Remote Bounds Check BypassForeshadow [87 95] L1 Terminal Fault

1 i f ( x lt a r r a y 1 _ s i z e )2 y = a r r ay2 [ a r r ay1 [ x ] lowast 2 5 6 ] Figure 1 Example Spectre Variant-1 Gadget

they share cache memory The attack unfolds through a pre-attack step in which a spy process fillsflushes specific cachesets so that the victim leaves observable side effects in termsof its cache access patterns that can be later inferred by thespy process by timing the access to particular cache blocks

Exploiting Speculative Execution This work tacklesa highly evasive class of microarchitectural attacks calledSpectre [57] that leaks information by exploiting side effectsof speculative execution through cache-based side channelsThese attacks not only exploit unintended side effects dueto speculation but have the ability to deliberately misleadexecution into attacker-intended paths by mistraining thebranch predictor and other associated structures Severalvariants of the attack have been described (shown in Table 1)that can potentially bypass software securityintegrity pro-tection mechanisms such as bounds checking and ASLR [77]Figure 1 shows a vulnerable code target for the variant-

1 attack The code fragment is composed of a conditionalbranch that performs a bounds check and a Spectre gadgetthat results in an observable microarchitectural side effectupon execution Upon misspeculation the bounds check isbypassed and the Spectre gadget executes leaving an observ-able cache footprint to the attacker that remains even afterthe processor detects the misprediction and rolls back execu-tion In the simplest variant of the attack where the attackercontrols both x and array2 the attacker can potentially leakthe entire address space of the victimThe variant-2 Spectre attack further allows the attacker

to hijack speculative execution by mistraining the branchpredictor and associated structures enabling a ROP-styleattack [75] that stitches together Spectre gadgets To mountsuch an attack a co-located adversary process that sharesthe branch predictor (eg a browser process with severaluser threads) with the victim first forces an artificial BTB(Branch Target Buffer) entry collision using a carefully cho-sen indirect branch address The attacker next poisons thevalue of the colliding BTB entry by repeatedly executing its

1 mov eax a r r 1 _ s i z e2 cmp edi eax3 jge END_LBL4 mov eax edi

6 mov eax [ eax+ a r r 1 ]7 shl eax 0 x88 mov eax [ eax+ a r r 2 ]9 mov [ y ] eax10 END_LBL

(a) Vulnerable gadget (x86)

mov eax a r r 1 _ s i z ecmp edi eaxjge END_LBLmov eax edi____FENCE____mov eax [ eax+ a r r 1 ]shl eax 0 x8mov eax [ eax+ a r r 2 ]mov [ y ] eaxEND_LBL

(b) Fenced gadget (x86)

Figure 2 Mitigating Spectre-v1 using a Fence Instruction

own branch whose target is the address of a Spectre gadgetAfter successful mistraining of the branch predictor the pro-cessor now predicts the victimrsquos indirect branch that collidesin the BTB to be the attacker-chosen Spectre gadget andstarts speculatively executing itThe variant-3 attack (aka Meltdown) exploits the fact

that most out-of-order processors that employ dynamic spec-ulation supress loads with protection violation at instructionretirement (ie commit stage) rather than during instruc-tion execution Therefore a non-privileged memory accesscan find its way to the cache and leave a footprint allowingan attacker to now effectively read arbitrary memory con-tents of another process or even the kernel or the hypervisorthrough careful cache-based side channel analysisThe literature describes multiple attacks that conform to

the above variants but exploit different attack targets andordifferent side channels These include SgxPectre [13] that by-passes Intelrsquos SGX [44] security mechanisms to steal secretsfrom SGX enclaves the MeltdownPrimeSpectrePrime [82]that leverage a PRIME+PROBE [62] cache attack instead ofFLUSH+RELOAD [100] by exploiting the side effects of cacheline invalidation mechanisms in modern cache coherenceprotocols and the NetSpectre attack that leaks informationacross independent virtual machines on Google Cloud viaan AVX-based covert channel [76]The variant-4 Spectre attack exploits microarchitectural

side effects of the memory disambiguation feature employedby most out-of-order processors in order to allow loads to bespeculatively ordered and executed before any outstandingstore whose effective address has not been calculated yetUpon misspeculation ie if the load address has a conflictwith the outstanding store the processor flushes the pipelineand triggers the re-execution of the load and all subsequentinstructions [48] while the microarchitectural side effectsof the misspeculation linger resulting in memory disclo-sures similar to variant 1 and 3 Similarly the NG variant-3attack [78] leverages the lazy x87 floating-point restore func-tionality of Intel CPUs to read floating-point registers of avictim process

The Spectre variants 11 and 12 dubbed Speculative BufferOverflows [55] exploit the store-to-load forwarding opti-mization to stitch together Spectre gadgets More specificallythese attacks bypass a stack buffer overflow check similarto variant-1 and execute a Spectre gadget that speculativelystores malicious content (the address of the next Spectregadget) into the return address on the stack Store-to-loadforwarding combined with the fact that most modern pro-cessors break down the return instruction into micro-opsthat load the return address from the stack before transfer-ring control then result in a ROP-style execution of Spectregadgets that leave a trail of microarchitectural state behindThe variant-5 attack [58] achieves similar effect by insteadmistraining the return address stack employed by most pro-cessors to speculatively predict the target of a procedurereturn

Spectre Mitigations The current set of mitigations forSpectre range from simple coding guidelines[10] to propos-als that advocate exposing microarchitectural details in theISA [64] To mitigate Meltdown Kernel Page Table Isolation(KPTI) [17 36] has been proposed and recently patched to theLinux kernel incurring about 6 in performance To mitigateSpectre v1 multiple chip manufacturers including Intel [45]ARM [6] and AMD [4] have suggested instrumenting codewith serializing instructions or fences to inhibit speculationat specific points in execution

For example consider Figure 2a that shows the assemblycode for the Spectre variant-1 gadget in Figure 1 Figure 2bshows a software-patched version that employs a serializinginstruction or a fence to prevent the speculative execution ofthe Spectre gadget (ie lines 6-9) In most implementationsupon decoding the fence the processor stops fetching newinstructions until the fence gets committed or squashedthereby serializing execution In our example if the attackercalls the vulnerable gadget with out-of-bounds values in theedi register the processor front-end stalls until the fence iscommitted thereby disallowing the speculative execution ofthe Spectre gadget and completely mitigating the attackThe associated performance overhead with liberal fence

insertion however could be as high as 10x [69] While itis possible to perform compiler-directed code instrumen-tation with fences at a lower performance overhead [72]locating the potential Spectre targets using static analysis isnon-trivial and could therefore result in less than full cov-erage [56] Speculative Load Hardening [11 12] and YSNB(You Shall Not Bypass) [69] propose to use predicated execu-tion [50 65 73] to alleviate the high overheads for fences byinjecting an artificial data dependency between the condi-tional branch and the Spectre gadget These proposals incur36-60 overhead in performanceTo cope with variant-2 both Intel and AMD have an-

nounced microcode update patches that introduce new fenceinstructions [4 45] that prevent instructions preceding thefence from controlling the indirect branch prediction of

branches that follow the fence The fences also prevent soft-ware with lower privileges from influencing the indirectbranch prediction of software with higher privileges Fur-thermore these patches restrict indirect branch predictionfrom being controlled by co-located threads via simultane-ous multithreading [83 84] Furthermore the use of retpo-lines [45 46 85] have been advocated to replace indirectjumps and calls with an equivalent push+ret instruction se-quence in order to bypass the indirect branch predictorHowever they could further expand the attack surface inthe wake of v5 [58 66] v11 and v12 attacks [54]

On the hardware front SafeSpec [53] and InvisiSpec [99]propose mitigating side effects of speculative execution byadding new shadow user-invisible structures for caches andTLBs that store transient results from speculative instruc-tions and committing them to main cacheTLB only if thespeculation was deemed correct and the corresponding in-structions gracefully retire Although these techniques makedisruptive changes to the processormemory architectureand consistency models they make significant strides in se-cure hardware design with minimal performance impact Fi-nally Dong et al [28 29] propose leveraging Virtual Ghost [22]to protect applications running on a compromised OS kernelfrom Spectre

Secure Instruction StreamCustomization Prior workhas established that instruction customization is an effectivemeans to instrument the dynamic execution stream with se-curity checks and therebymitigate several attack vectors at arelatively low performance overhead Instruction stream cus-tomization has been proposed and evaluated in various formsand levels ranging from secure virtual architectures at theISA and compiler level [23 88 90] to full-blown processor bi-nary translation at the microcode level [9 25 71 89] Corlisset al [18ndash20] propose dynamic instruction stream editing(DISE) a macro-engine that customizes the dynamic instruc-tion stream at the decoder level by pattern-matching user-defined production rules pushed into the decoder Taram etal [80] propose context-sensitive decoding that leverages theCISC to RISC translation feature of modern instruction setdecoders to dynamically alter the behavior of programmer-visible instructions without recompilation or binary trans-lation [26 91] While we leverage these approaches in thisresearch this work targets a different class of emerging at-tacks and proposes several additional techniques beyondmicrocode customization

Information Flow Tracking Suh et al [79] first pro-posed the idea of dynamic information flow tracking (DIFT)that tags data from untrusted channels as spurious and fur-ther tracks the information flow of spurious data flaggingany violation of an enforced security policy Multiple hard-ware information flow tracking techniques have been de-scribed at various levels of the hardware [14ndash16 21 24 81

86 92] as a detection mechanism to circumvent code injec-tion [98] cross-site scripting attacks [24 93] buffer over-flows [24 98] and SQL injection [24 39 40 60 102] Thiswork proposes decoder-level information-flow tracking asa detection mechanism for Spectre and addresses severalassociated challenges

3 Assumptions and Threat ModelAmong the Spectre variants the Bounds Check Bypass (variant-1) is the one which has had the highest impact in terms ofaffected platforms and devices Therefore the main focus ofthis work is to mitigate the variant-1 attack at an acceptablelevel of performance However owing to the flexibility ad-vantages of microcode customization we also propose newmitigations against other Spectre variants documented inTable 1

We assume that the goal of the attacker is to read arbi-trary memory contents by exploiting a Spectre gadget Alsowithout loss of generality in this work we defend against aSpectre attacker that targets the most obvious and the mostimportant victim ie the OS kernel but our frameworkallows us to extend the proposed techniques to mitigate dif-ferent targets such as virtual machines and browsers Ourattack model assumes an adversary with the following capa-bilities

bull Information Leakage We assume that the adver-sary has the ability and privilege to probe flush orevict any cache line including that of the kernel at anyparticular time Also they can make precise timingmeasurements using instructions such as rdtsc andrdtscp

bull Co-location We assume that the adversary is colo-cated with the victim and can not only leak informa-tion through cache-based side channels but can alsomistrain shared branch predictor structures includingbranch target buffers and return address stacks andcan further influence the branch outcome of the victimas described in the variant-2 attack

bull User-Mode Access We assume that the adversaryhas standard user-mode access and can invoke anysystem call with arbitrary and carefully chosen ar-guments and can further access devices such as thekeyboard or network to feed the kerneldriver codewith carefully chosen data

Moreover we assume that the only communication chan-nel between the attacker and the kernel code is throughmicro-architectural side channels and the attacker does nothave any other channel direct or indirect to leak data Amongall the micro-architectural side-channels data caches are theones that are predominantly used and are the easiest to ex-ploit Thus the mitigation strategies described in this workprimarily focus on those variants of the attack that use thedata cache as their covert channel to leak information Again

Configuration Registers (MSRs)

Mac

ro-o

p qu

eue

microop-Queue

Scratchpad Regs

Runtime System

DLIFTMacro-op Stream

Macro-to-microop Translation Unit

CSD-Enabled Decoder

INC [0x802ac] FENCE LD t0[0x802ac]ADD t0 t0 1ST [0x802ac] t0 Di

spat

cher Mistaint

Detection and Recovery

Figure 3 Architectural Overview

due to the flexiblity and programmability of the proposedstrategy it is possible to easily extend the approach to miti-gate other side-channels such as functional unit contentionand AVX timing channels [76]

4 Architectural OverviewFigure 3 gives the architectural overview of our defensestrategy ndash context-sensitive fencing The central piece of theproposed architecture is an x86 microcode engine that hascontext-sensitive decoding (CSD) capabilities [80] allow-ing it to optionally translate a native x86 instruction into acustomizable alternate set of micro-ops In this work weleverage this capability to perform the surgical insertion ofspeculation fences (some existing and some newly proposed)at potentially vulnerable Spectre code targets To this end weintroduce new custommicro-op flows and new configurationmechanisms that trigger such micro-op flowsThe CSD-enabled microcode engine is provisioned with

fine-grained reconfiguration capabilities via a set of model-specific registers (MSRs) that can control the frequency typeand enforcement criteria of speculation fences inserted intothe dynamic instruction stream Such a fine-grained recon-figuration capability is especially important to this workbecause speculation fences are particularly expensive allow-ing us to surgically insert fences that impose varying degreesof restrictions on speculative execution depending upon theruntime conditions current level of threat and the nature ofthe code being executedMoreover the context-sensitive fencing framework also

benefits from a novel decoder-level information flow track-ing (DLIFT) engine that has the ability to identify untrustedinstructions that are potentially in Spectre gadgets and trig-ger alternate micro-op flows that insert speculation fencesOwing to its decoder-level and inherently speculative imple-mentation DLIFT relies on a mistaint detection and recoveryhardware implemented in the execute stage to avoid over-tainting and undertainting scenarios Finally the proposeddefense framework also includes hardware and microcode-level mechanisms to achieve branch predictor state isola-tion across protection domains to mitigate the variant-2 andvariant-4 attacks

Table 2 List of Intelrsquos Serializing InstructionsType Instr Desc

PrivilegedSeria

lizingInstructions INVD Invalidate Internal Caches

INVEPT Invalidate Translations from EPTINVLPG Invalidate TLB EntriesINVVPID Invalidate Translations Based on VPIDLIDT Load Interrupt Descriptor Table RegisterLGDT Load Global Descriptor Table RegisterLLDT Load Local Descriptor Table RegisterLTR Load Task RegisterMOV Move to Control RegisterMOV Move to Debug Register

WBINVD Write Back and Invalidate CacheWRMSR Write to Model Specific Register

Non

Priv CPUID CPU Identification

IRET Interrupt ReturnRSM Resume from System Management Mode

Mem

Ordering SFENCE Store Fence

LFENCE Load Fence

MFENCE Memory Fence

5 Design and ImplementationIn this section we describe in greater detail the architecturaltechniques and building blocks that together constitute theproposed defense strategy ndash context-sensitive fencing Firstwe describe a microcode customization framework that en-ables the surgical insertion of fences to secure speculativeexecution with minimal performance impact Second we ex-amine the full design space of fences to provide the isolationproperties we need without undue obstruction of existingpipeline performance features Third we propose a decoder-level information flow tracking (DLIFT) technique to followpotentially malicious execution patterns and trigger securemicrocode customization Finally we propose protectionmechanisms that circumvent the mistraining of the branchpredictor and associated structures

51 Microcode CustomizationSpeculation fences are a processorrsquos primary mechanism tooverride speculative execution For Spectre variant-1 context-sensitive fencing works with CSD by providing alternate de-codings of all load instructions with the alternate decodingalways including a fence micro-op that appears before theload micro-op The alternate decoding will then be triggeredor not based on runtime conditions The first considerationthen is what fence instruction to incorporate Most pro-cessors already provide a variety of fences and serializinginstructions We first study the performance and securityimpact of the existing suite of fences and serializing instruc-tions and then explore and propose new speculation fencesand context-sensitive fencing techniques that manage theimpact on performance For Spectre variant-2 CSF works

micro-op

Que

ue

Macro-op Stream Macro-to-microop

Translation Unit

CSD-Enabled Decoder

FENCE Injection

Point

micro-op

Dis

patc

her

Load Store-Queue

FU1 FU2

RSRS

RSRSRS

Cache

Cac

he

Con

trolle

r

Mem

Mem

C

ontro

ller

Fence Enforcement Point

Figure 4 Fence Enforcement Points

with CSD by providing alternate decodings of some con-trol flow and possibly insert fences andor branch predictorresetting micro-ops ndash more detail is in Section 53

511 Serializing Instructions and Memory FencesSerializing instructions are the strictest amongst all specula-tion fences and completely override speculative executionUpon decoding a serializing instruction the processor stallsfetching any subsequent instruction until all instructionspreceding the serializing instruction retire Due to the highpipeline depth and issue width of modern out-of-order su-perscalars the usage of serializing instructions could resultin long delays and considerable throughput loss Memoryfences on the other hand enforce a memory serializationpoint in the program instruction stream More specificallythese fences ensure that memory operations that appear inexecution after the fence are stalled until all the outstandingmemory requests (including the fence) complete executionTable 2 shows a complete list of Intelrsquos existing serializ-

ing instructions and fences [48] Most of the serializing in-structions need to be run in privileged mode which restrictstheir usage in defenses for victims that lack sufficient privi-leges (eg browsers) Moreover a majority of them modifythe state or contents of architectural registers the programcounter cache TLB etc and only serialize execution as aside effect thereby requiring an additional backuprestorestep when used for the sole purpose of serializing executionAn exception to this is the MOV to debug register instructionthat does not corrupt any architectural state if not actuallyin debug modeThe SFENCE instruction does not allow stores to pass

through it but does not affect loads That is it enforces thatall stores that precede the SFENCE are executed to comple-tion before any store that succeeds the SFENCE is fetchedThe MFENCE instruction restricts all memory operationsfrom passing through it Unlike SFENCE and MFENCE thatare memory ordering operations LFENCE performs a seri-alizing operation In particular LFENCE does not stop the

processor from fetching and decoding instructions that ap-pear after the LFENCE but it restricts the dispatch of suchinstructions until the instructions preceding the LFENCEcomplete execution Since LFENCE serializes execution andyet makes some progress in the front-end it has been recom-mended by Intel as a low-overhead fence that can be insertedat vulnerable points in execution to defend against Spec-tre [45] In other words while serializing instructions suchas CPUID are enforced at Fetch LFENCE is enforced at theinstruction queue level Therefore in this work we start withthe LFENCE and propose new fences that come with fewerrestrictions different enforcement policies andor sport ad-ditional optimizations

512 Fence Enforcement PoliciesIt is important to note that none of the existing instructionsthat provide fence support were actually created for the pur-pose we (or the whole industry) need for Spectre mitigationThus this work will examine existing fences variants ofexisting fences and also introduce a new fence primitive Tobetter understand the design landscape we examine severalpossible properties of fences in this section

Early vs Late Enforcement We first categorize fencesinto early-enforced and late-enforced fences based on thepipeline stage at which they are enforced In particular werefer to any serializing instruction such as an LFENCE thatis enforced at the instruction queue or earlier in the pipelineas early-enforced If the fence is enforced at a later stage suchas the reservation station the loadstore queue or the cachecontroller we refer to it as a late-enforced fence Late enforce-ment in essence shifts the fence enforcement point towardsthe leaking structure (eg the cache) reducing the impacton instructions that do not access that structure and allow-ing for the enforcement of more fine-grained serializationrules (eg allow cache hits but not misses) Figure 4 showspotential fence enforcement points at various stages in theprocessor pipelineIt is important to note that the later a fence is enforced

the fewer the side channels it protects against For exampleIntelrsquos LFENCE prevents information leakage through all mi-croarchitectural structures that appear after the instructionqueue However to prevent information leakage through theinstruction cache side channel we would have to resort toa regular serializing instruction or an MFENCE resultingin prohibitively high performance overheads Similarly afence enforced at the data cache controller level would onlymitigate data cache-based side channels and will not protectagainst an FPU-based side-channel Instead the FPU-basedside-channel may be mitigated using a different fence thatis appropriately configured and enforced at the reservationstation or the FPU

Strict vs Relaxed Enforcement Depending upon howprohibitive they are we next classify fences into strict andrelaxed Strict fences are highly prohibitive and do not allow

Table 3 Characteristics of Different Fence Types

Fence Name EnforcementPoint

StrictRelaxed

Instructionsnot Allowed

MitigatesVariants

ExistingNew

Intelrsquos SIs (CPUID) Fetch Strict All All ExistingLFENCE IQ Strict All All Existing

LSQ-LFENCE LSQ Relaxed Ld v1 NewLSQ-MFENCE LSQ Relaxed LdampSt v1v11v12 New

CFENCE CC Relaxed None v1 New CC Cache Controller RS Reservation Station

any instructions to pass through them until the fence retireswhereas relaxed fences allow certain types of instructionsto pass through them For example an SFENCE that is en-forced at the loadstore queue and allows all instructions topass except stores that have a greater sequence number thanitself is a late-enforced and relaxed fence On the other handall x86 serializing instructions including LFENCE are early-enforced and strict Customizing the micro-op stream withearly-enforced and strict fences typically results in slower ex-ecution when compared to customization with late-enforcedand relaxed fences However with carefully enforced con-straints the late-enforced and relaxed fences could offer sim-ilar if not better security guarantees

Early vs Late Commit A fence typically remains ef-fective until it gets committed or squashed Based on Intelrsquosmanual [47] a serializing instruction is only allowed to becommitted if there is no preceding outstanding store thatis waiting to be written back While this behavior might benecessary for device synchronization or memory orderingenforcement for the purpose of securing speculative exe-cution against Spectre attacks there is no need to wait forstores to be written back This is because write buffers arenrsquotcommitted to the cache until the store reaches retirementand therefore the fact that a store is waiting for an outstand-ing writeback request to complete is enough evidence that itdid not occur along a misspeculated path Allowing a fenceto commit early without waiting for preceding outstandingstores to write back can considerably improve performancebecause as soon as a fence gets committed a stream of in-structions can advance further in the pipeline Therefore inthis work we propose and study the effects of a late-commitversion of each fence that does not wait for stores to be writ-ten back However we note that these versions should onlybe limited to the security use case we describe and shouldnot be used for synchronization

Newly Proposed Fences Table 3 summarizes the char-acteristics of different existing and newly proposed fencesAlthough the existing set of fences defend against all vari-ants they incur prohibitively high costs on performance dueto their strict enforcement constraints To better understandthe potential for fences beyond those that already exist wepropose and evaluate three new types of fences In an attackscenario we typically insert one of these fences between abranch instruction and a load instruction that potentiallyleaks sensitive information via a cache side-channel Theproposed fences more effectively and efficiently mitigate the

high impact variant-1 attacks by preventing informationleaks along misspeculated paths through cache-based sidechannels We describe each of them in greater detail below

The LSQ-LFENCE and the LSQ-MFENCE are relaxed fencesenforced at the loadstore queue While LSQ-LFENCE fenceis in effect it does not allow any subsequent load instructionto be issued out of the loadstore queue thereby preventingthe cache state from being changed by load instructions onmisspeculated paths Thus the LSQ-LFENCE mitigates theSpectre variant-1 attack On the other hand the more restric-tive LSQ-MFENCE does not allow any subsequent memoryinstruction (both loads and stores) to be issued out of theloadstore queue until the fence commits In addition tomitigating the variant-1 attack that the LSQ-LFENCE pro-tects against the LSQ-MFENCE mitigates the variant-11and variant-12 attacks that exploit store-to-load forwardingbetween speculative loads and stores

The CFENCE is a relaxed fence and is enforced at the cachecontroller level using the following set of rules First like anyother fence it allows all preceding instructions to proceedSecond since store instructions do not commit the contentsof the write buffer until the instruction retires they are unaf-fected by the CFENCE Finally it labels any subsequent loadas a non-modifying load and allows it to pass through thefence but the load is restricted from modifying the cachestate In particular a non-modifying load that results in acache hit is allowed to read the contents of the cache but isrestricted from changing the LRU and other metadata bitsA non-modifying load that results in a cache miss is markedas uncacheable allowing the memory read request to com-plete without altering the cache state In this way we avoidupdating the cache state upon encountering a speculativeload and donrsquot leave any observable cache footprints alongmisspeculated paths thereby mitigating the variant Spectrevariant-1 attack In addition due to locality of referencesthe miss rate of a reasonable program is typically very lowand therefore using CFENCE results in considerably lowerperformance overhead than other types of fencesNote that this can be applied recursively at each cache

level For example an L2 hit (L1 miss) will bypass the L1cache and not initiate a fill then read from the L2 cachewithout altering the LRU bits

513 Fence Frequency OptimizationThe most naiumlve yet secure way to insert fences is to liberallyinstrument every instruction of a vulnerable type (eg loadinstructions in the case of cache side-channels) in the pro-gram and add the fence micro-op to all of them In fact notevery instance of a vulnerable instruction type is necessarilyvulnerable for example all the loads in the program arenrsquotvulnerable to speculative attacks via cache side-channelsTherefore we can reduce the number of fences insertedHowever since failing to insert fences even in one scenariowould enable a Spectre attacker to read the whole victimrsquos

memory space it is of crucial importance that fence fre-quency optimizations be conducted meticulously In the fol-lowing we introduce two secure optimizations for reducingthe number of fencesBasic Block-Level Fence InsertionThe source of the Spec-tre attack is dynamic control speculation which implies thatthe speculation begins with a branch prediction and theprocessor starts speculatively executing along the predictedpath To fully mitigate this attack we want a fence betweeneach branch and subsequent loads but if one branch is fol-lowed by four loads we only need one fence to protect allfour Thus in this optimization we propose to only instru-ment (provide the alternate decoding for) the first instanceof a vulnerable instruction type (eg first load) of each ba-sic block and it is safe to leave the rest of the instructionsuninstrumented This is simply implemented by setting aflag in hardware whenever a branch is decoded then inserta micro-op in the alternative load decoding that along withthe fence also resets the flag When the flag is not set theoriginal decoding is usedTaint-Based Fence Insertion However even one load perbasic block is likely still too conservative In all known in-stances of the attacks the attacker performs some operations(mostly memory read) based on untrusted data that leadsto the information leak For example in Spectre variant-1the attacker provides an untrusted out-of-bound index to anarray In this work we assume any information that comesfrom the user address space and input devices (eg via thex86 IN instruction) as untrusted In addition we also con-sider DMArsquod pages as untrusted for which we rely on theIOMMU [5] to mark DMArsquod pages as tainted in the pagetable For the taint-based fence insertion optimization wepropose the insertion of fences for only vulnerabletaintedloads that operate on untrusted data by leveraging a noveldecoder-level information flow tracking (DLIFT) strategydescribed in the following section

52 Decoder-Level Information Flow TrackingAs its name indicates the distinguishing feature that makesDecoder-Level Information Flow Tracking (DLIFT) uniquefrom typical hardware taint tracking systems is that it canprovide the taint information at the decoder stage of thepipeline rather than at commit stage This is a critical distinc-tion for two reasons First of course our solution is decoder-based Second and perhaps more important execute- orcommit-based taint tracking comes too late in the pipelinefor any speculation-based attack and is of little useHowever commit is still the only stage where taint in-

formation is guaranteed to be correct so the design of adecode-based taint tracker is challenging In particular sincethe front-end of the pipeline typically runs far ahead in ex-ecution than the rest of the pipeline reading actual taintsfrom register files at the decode stage will read inaccuratevalues Therefore in the proposed DLIFT framework we

Macro-to-microop Translation

Unit

CSD-Enabled Decoder

Dis

patc

her

FetchTaint

Evaluator

Load Store Queue

2

Reg File

Actu

al T

aint

Bits

3

4

5

TLB

Tain

t Bitm

ap1

ADD rax rbx

raxSpecTaint |= rbxSpecTaint

Mistaint Recovery -Flushing the pipeline -Restarting translation and execution -Recovering speculative taint map

EA T

aint

CommitUnit

FetchDecode Execute

Speculative Taint Map

~Spec_Taint ampamp

Evaluated_Taint

True

Commit

Arch TaintMap

Arch RF

Figure 5 DLIFT integration with a CSD-enabled pipeline

separate the taint information into four taint structures ndash (a)a decoder-level taint map that tracks and maintains the taintinformation for architectural registers (b) the physical reg-ister file augmented with taint information that maintainsdynamically computed taint information at execute (c) theTLB and page tables augmented with a taint bitmap to trackcache block-level taint information and (d) a commit-leveltaint map that maintains verified architectural register taintinformation While the latter three structures are typical ofany standard DIFT implementation [79] the first structureis a new addition proposed by this work Instruction trans-lation thus depends on some speculative taint tracking thatmight not be exact potentially leaving vulnerable instruc-tions unfenced (ie translating to non-secure micro-ops) Forthis reason DLIFT relies on a mistaint recovery mechanismthat upon detecting a mistaint at the execution stage redi-rects and restarts the execution from the incorrectly taintedinstructionFigure 5 shows the integration of DLIFT with a CSD-

enabled pipeline The DLIFT engine maintains a taint mapthat stores a speculative taint bit for each architectural reg-ister For each incoming instruction (bull1 ) the DLIFT engineevaluates the taint of the destination register(s) based on thespeculative taint bits of the source registers Taint evaluationfollows the standard DIFT procedure [79] If a source regis-ter is marked as tainted in the speculative taint map DLIFTpropagates this taint to the destination register and furthertriggers context-sensitive translation (bull2 ) of the instruction(eg with speculative fence insertion) depending upon theconfigured levels of performance and security

Tracking taints at the decoder level is not always straight-forward First during speculative execution the DLIFT en-gine continues to propagate taints along the misspeculatedpath potentially leaving the taint map in an inconsistentstate upon recovery from a branch misprediction To thisend the DLIFT engine rolls back the state of the taint map

to its commit-level counterpart as part of the branch mis-prediction recovery Second the effective addresses of loadinstructions is usually not known at the decoder stage dueto which the DLIFT engine cannot accurately compute andpropagate taint information A conservative approach thatmarks all loads with unknown effective addresses as taintedcould severely degrade performance since overtainting typi-cally results in a high frequency of fence insertion Thereforewe take a more optimistic approach by assuming loads withunknown effective addresses are untainted and further relyon our mistaint detection mechanism at the execute stageto validate the predicted taints against the dynamically eval-uated taints (bull3 andbull5 ) In the event a mistaint is detectedwe update the speculative taint map flush the pipeline andrestart the translation and execution of the mistainted in-struction Note that we only perform mistaint recovery forthe undertainting scenario since it could potentially leavevulnerable instructions unfenced

Furthermore the DLIFT engine is capable of following in-struction sequences and execution patterns such as a taintedload followed by a branch allowing for the seamless de-tection of different Spectre gadget variants In this way al-though speculative the DLIFT engine has the potential forfast recovery from misspeculation and further allows themicrocode customization framework to perform a more tar-geted and surgical insertion of fences for particular vulnera-ble targets

53 Mitigations for Control-flow MistrainingTwo Spectre variants (v2 and v5) rely on the mistrainingof the branch predictor and the return stack buffer to influ-ence the victimrsquos branch outcome across different protec-tion domains To mitigate these attacks we examine severalhardening mechanisms In the simplest case where we areprotecting kernel branches from getting influenced we caninstrument syscall instructions to enforce branch predictorstate isolation If we want to protect a wider range of domaincrossings we could rely on a simple hardware mechanismthat identifies control transfer to a domain with a higherprivilege and then sets a flag The first control flow instruc-tion that decodes after that flag is set would then triggeran alternate decoding and reset the flag The alternate de-coding would insert a fence (to protect against this branchbeing mispredicted) then execute micro-ops that enforcebranch predictor state isolation and subsequently resumethe original control flowThis solution then is flexible enough that any region of

code could be protected in this way by configuringmodel spe-cific range registers (MSRRs) to indicate a protected regionthat would always reset the branch predictor upon entrythis then even prevents mistraining of sensitive code ofdifferent threads within the same process To prevent theMSRRs from being tampered by the attacker we only allow

Table 4 Architecture Detail for the Baseline x86 CoreBaseline Processor

Frequency 33 GHz I cache 32 KB 8 wayFetch width 4 fused uops D cache 32 KB 8 wayIssue width 6 unfused uops ROB size 168 entriesINTFP Regfile 160144 regs IQ 54 entriesRAS size 816 32 entries BTB size 256 512 1024 entriesLQSQ size 6436 entries Functional Int ALU(6) Mult(1)Branch Predictor LTAGE Units FP ALUMult(2) SIMD(2)

the operating system to configure MSRRs and we use specu-lative fences to guard x86rsquosWRMSR instruction Furthermorein the case of remapping via mmap() the operating systemis also responsible for reconfiguring MSRRs

The easiest and the most heavyweight way to isolate theBP state is to clear all the states by a series of returnbranchmicro-ops However such a hard reset of the BP can be morequickly and efficiently performed using a special micro-opthat clears some or all state of the branch predictor therebyenabling custom solutions to reset different structures withinthe prediction unit if such a micro-op is available In thiswork we assume a special micro-op to clear the branch targetbuffer (BTB) and the return stack buffer (RSB) the primarytargets of the Spectre attackIt should be noted that while resetting the BTB and the

RSB will induce more mispredictions (something the attackerwants) it does the attacker no good with respect to theseSpectre variants because the attacker can no longer controlthe target of the control flow misspeculation

6 MethodologyThis section details the experimental methodology for theperformance and security evaluation of the context-sensitivefencing frameworkOur baseline processor is modeled after the Intel Sandy-

bridge microarchitecture [43] Table 4 shows the architec-tural configuration of our baseline processor in more detailNote that we evaluate against three different branch pre-dictor configurations (with different BTB and RAS sizes) tomeasure the performance impact of our micro-op flows thatperform branch predictor state isolation across protection do-mains to defend against variant-2Wemodel this architectureusing the gem5 [8] architectural simulator which already fea-tures x86 micro-op translation Furthermore gem5 alreadyfeatures a Spectre test infrastructure and a visualization toolthat allow us to validate our claims regarding the securityguarantees of context-sensitive fencing [63]One of the primary goals of this work is to protect ker-

nel memory from being leaked along misspeculated pathsTherefore we use the full system simulation mode of gem5which allows us to boot an Ubuntu 1804 distribution ofLinux with a kernel version of 4813 Furthermore in orderto provide realistic estimates regarding attack coverage andperformance impact of our proposed techniques we selecta good mix of benchmarks that spend different amounts ofexecution time in kernel mode touch different aspects of the

Table 5 Benchmarks DescriptionBenchmark Description Kernel-Time

nginx HTTP Web Server 66ps Process information query 75ping Sends ICMP ECHO_REQUEST to loopback 95ls Performs two level directory listing 79llu Linked list traversal micro-benchmark 7

bzip2 Compression 34gcc C Language optimizing compiler 11

omnetpp Discrete event simulation 15

sjeng Artificial Intelligence (game tree searchand pattern recognition) 22

operating system and perform different number of systemcalls during execution as illustrated in Table 5 To this endwe include four benchmarks bzip2 gcc sjeng omnet fromthe SPEC CPU2006 suite [41] that exhibit varying degreesof instruction-level parallelism Furthermore we use threecommonly used Unix tools that target different functional-ities of the kernel and use different device drivers ndash (ping)that sends and receives ICMP packets (ps aux) that queriesprocess information and (ls ) that queries the filesystemin order to list multi-level directory information In additionwe also add a memory-intensive program llu that allocatesa large amount of memory for a linked list and traversesthe list making random memory accesses [103] Finally wealso evaluate the impact of our fences on the nginx webserver [74] using the wrk [35] framework to generate HTTPrequests

7 EvaluationIn this section we evaluate both security and performanceimpacts of the proposed strategy starting first with securityassessment and following that with performance evaluation

71 Security DiscussionThe proposed defense strategy can mitigate five variants ofthe Spectre attacks We discuss these in detail below

Variants 1 11 and 12 The goal of these variants is toleak memory contents of a victim along a misspseculatedpath by bypassing a bounds check andor further stitch to-gether such Spectre gadgets via store-to-load forwardingAll variants exploit a data cache-based side channel to leakinformation Context-Sensitive Fencing mitigates these vari-ants primarily by preventing information from being leakedalong misspeculated paths via fences that are surgically in-serted into the instruction stream and strategically placedbetween the conditional branch that performs the boundscheck and the first load in the Spectre gadget This workproposes three new fences ndash LSQ-LFENCE LSQ-MFENCEand CFENCE that each defend against the variant-1 attack bydisallowing loads along misspeculated paths to modify thedata cache state LSQ-MFENCE additionally protects againstthe variants 11 and 12 by preventing both speculative loadsand stores from being issued out of the loadstore queue

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

Injected for All Kernel Loads

100

105

110

115

120

125

130

135

140

145

150

155

160

165

170

175

180

185

190

195

200

Nor

mal

ized

Exe

cution

Tim

e

LFENCE-LC

LSQ-MFENCE-LC

CFENCE-LC

Figure 6 Execution Time of Different Fence EnforcementLevels (normalized to insecure execution)

effectively avoiding store-to-load forwarding between spec-ulative loads and stores

The surgical insertion of these fences is facilitated by thedecoder-level information flow tracker (DLIFT) that can fol-low instruction sequences that could serve as Spectre gad-gets as described in Section 52 While the DLIFT itself per-forms speculative tracking of taint information it has theability to detect mistaints and recover within three stages ofthe pipeline We further experimentally show later in thissection that the speculative nature of the DLIFT engine maysometimes result in overtainting (where trusted operand-sinstructions get marked as tainted) but never results in anundetected and unrecovered undertainting (where untrustedoperandsinstructions remain untainted)

Variants 2 and 5 These variants reverse-engineer andmistrain the branch predictor and the return address stack toinfluence the branch outcomes of a victim which in our caseis the Linux kernel In addition to the surgical fence injec-tion and speculative taint tracking context-sensitive fencingfurther employs mistraining mitigations that circumventsuch attempts More specifically we intercept all protectiondomain crossings via syscall and int 0x80 instructions andperform a full reset of the branch target buffer (BTB) and thereturn stack buffer (RSB) clearing all previous state Thisprevents user code from influencing branch outcomes of thekernel code We record no instance of BTB collision betweenuser and kernel branches and we ensure that we alwaysstart with a clean BTB and RSB upon entering kernel mode

To show that our different fences and frequency optimiza-tions close Spectre attacks we use a proof of concept Spectreimplementation and visualization tool [63] The attacks failin every case when we use context-sensitive fencing to insertfences despite our performance optimizations

72 PerformanceFigure 6 measures the performance impact of three differentfences ndash (a) the standard x86 LFENCE (b) the LSQ-MFENCE

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

Injected for All Kernel Loads

100

105

110

115

120

125

130

135

140

145

Nor

mal

ized

Exe

cution

Tim

e

CFENCE-LC

CFENCE-EC

Figure 7 Execution Time of Early and Late Commit ofCFENCE (normalized to insecure execution)

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

00

01

02

03

04

05

06

07

08

09

10

Rat

ioO

ver

All

Loa

ds

bzip2 gcc omnet llu nginx sjeng ls ping ps

Non-Modifying Misses Non-Modifying Hits Misses Hits

Figure 8 Effects of injecting CFENCE on Cache Miss Rate

and (c) the CFENCE all enforced with the standard latecommit pessimistically inserted for every kernel load in theprogram Clearly from the figure the CFENCE incurs thelowest performance overhead since it is less restrictive thanthe other two and is enforced at a much later pipeline stageWe further study the effect of late and early commit policiesfor the CFENCE as shown in Figure 7 The early commitversion of the CFENCE consistently performs better than thelate commit version saving about 4 in overall executiontime on averageOverall the CFENCE reduces the incurred performance

overhead due to fencing by 23X bringing down the exe-cution time overhead from 48 to 21 Furthermore thisperformance improvement is consistent across all the bench-marks the only exception being llu which performs randommemory accesses due to the linked list traversal and suffersfrom a very high cache miss rateIn Figure 8 we study the effect of the CFENCE on cache

miss rate Recall that when a load passes a CFENCE it getsmarked as non-modifying and as a result if it ends up beinga miss in the cache it is deemed uncacheable and therefore

the fetched cache block isnrsquot brought into cache If the missrate of a program is already high as in the case of llu thepresence of a CFENCE in the instruction stream could po-tentially result in under-utilization of the cache since themissed blocks arenrsquot being filled back into the cache whilethe CFENCE is begin enforced For programs that have un-usually low hit rates we suggest using the standard LFENCEinstead of the CFENCE However in those cases it should benoted that the overhead of the mitigation is low regardlessof the fence usedFurthermore it is important to note that the number of

non-modifying loads in the dynamic instruction stream donot necessarily have a negative impact on performancewhile a CFENCE is being enforced For example while theping program has the most non-modifying loads it doesnot suffer much in terms of overall performance becausethe working set fits well into the cache and most of thenon-modifying accesses end up being hits In general wegain more from employing the CFENCE instead of standardserializing fences when the cache hit rate is lowFigure 9 shows the performance of our proposed fence

frequency optimization techniques In the first optimizationwe inject the CFENCE only for tainted loads and branches asindicated by the DLIFT engine This reduces the performanceoverhead of the defense from 21 to 11 on average We fur-ther optimize the number of fences inserted by performingbasic block-level fence insertion where we only instrumentthe first instance of a vulnerable load in each basic block Thisresults in an additional 4 improvement in performance AsFigure 9 shows the once per basic block opitimization is suc-cessful at improving the performance of all the benchmarksexcept llu That is because CFENCE changes the cache accesspattern usually at some cost but occasionally beneficiallyby bypassing accesses and reducing pressure on the cacheIn such a case like llu with its large working set executingfewer fences would then be slightly less beneficial Overallcompared to the state-of-the-art fence injection scheme thatpessimistically injects an LFENCE for all kernel loads ourDLIFT-Based CFENCE injection reduces the performanceoverhead from 48 to just 77 on averageWe next study the accuracy and coverage of our DLIFT

implementation Figure 10 examines overtainting scenarioswhere the DLIFT engine conservatively marks instructionsas tainted when theyrsquore actually not The results are nor-malized to the total number of dynamic load instructionsexecuted Further we examine two scenarios ndash the bars tothe left represent DLIFT-based fencing without the basic-block level fence insertion and the bars to the right includesthe basic-block level fence insertion optimization In mostcases the percentage of overtainted loads remains low inboth scenariosFigure 11 examines undertainting scenarios where the

DLIFT engine optimistically forgoes instrumenting certaininstructions that are actually vulnerable but recovers from

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

CFENCE

100

105

110

115

120

125

130

135

140

145

150

155

160

Nor

mal

ized

Exe

cution

Tim

e

Always Fencing

DLIFT-Based Fencing

DLIFT + Once per BB

Figure 9 Impact of Fence Frequency Optimizations on Exe-cution Time

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

00

01

02

03

04

05

06

07

08

09

10

Rat

ioO

ver

All

Loa

ds

ping ps nginx bzip2 ls sjeng llu omnet gcc

Over-tainted Loads Tainted Loads

Figure 10 Accuracy of DLIFT Overtainting Rate of Loads

such scenarios later in the pipeline We observe that thepercentage of undertainted loads is low and consistentlybelow 10 with the outliers being the kernel-heavy applica-tions ping and ps which have the highest number of taintedinstructions (instructions with at least one tainted sourceregister) as shown in Figure 12Note that our DLIFT engine performs explicit informa-

tion flow tracking modeled after DIFT [79] that tracks copyload-address store-address and computational dependen-cies However we also evaluate CSF with a more conser-vative implicit information flow tracking model in whichwe taint the program counter when the branch outcomedepends upon a tainted value and further track both sidesof the branch This results in more tainted instructions andincurs 118 extra performance overhead on top of our taint-based CFENCE insertionTo evaluate the effect of our mistraining mitigations we

measure the impact of a full micro-op based BTB and RSBreset at every protection domain crossing We do this ex-periment on three separate branch predictor configurations(a) a small predictor with 256 BTB and 8 RAS entries (b) a

pssje

ngbz

ip2

ping llu gc

c ls

nginx

omne

t

Speculative DIFT ndash Sfence

000

001

002

003

004

005

006

007

008

009

Rat

ioof

Under

-tai

nte

dLoa

ds

Figure 11 Coverage of DLIFT Undertainting Rate of Loads

omnet bzip2 llu ls nginx ping ps sjeng gcc

Speculative DIFT ndash Sfence

00

01

02

03

04

05

06

07

08

09

10

Tai

nte

dIn

tstr

uct

ions

Tainted Untatinted

Figure 12 Ratio of instructions marked as tainted by DLIFT

medium predictor with 512 BTB and 16 RAS entries and(c) a large predictor with 1024 BTB and 32 RAS entries Wemeasure an average performance degradation of 27 on oursmall predictor 66 on our medium predictor and 152 onour large predictor Naturally the overhead is almost com-pletely due to the loss of prediction accuracy rather thanthe cost of micro-op expansionIn summary the proposed defense strategy introduces a

flexible microcode customization framework that performthe surgical insertion of newly introduced speculation fencesthat mitigate five variants of the Spectre class of attacksreducing the fencing overhead of state-of-the-art fence-basedSpectre mitigations by a factor of 6

8 ConclusionIn this work we propose context-sensitive fencing (CSF) aset of architectural techniques that provide high-performancedefense against Spectre-style attacks In particular we showthat we can reduce fencing overhead by a factor of 6 com-pared to a conservative fence insertion method

This is done by injecting fence instructions dynamicallyin the decoder with no recompilation and binary translationrequired This allows us to employ runtime information tostrategically insert fences when needed using taint infor-mation and avoiding redundant fences after branches Thiswork also introduces new fence primitives which protectsensitive structures from speculation-based microarchitec-tural effects with minimal impact on instruction throughputin the pipeline

AcknowledgmentsThe authors would like to thank the anonymous reviewersfor their helpful insights This research was supported inpart by NSF Grant CNS-1652925 NSFIntel FoundationalMicroarchitecture Research Grant CCF-1823444 and DARPAunder the agreement number HR0011-18-C-0020

References[1] Onur Acıiccedilmez Ccediletin Kaya Koccedil and Jean-Pierre Seifert 2007 Predict-

ing secret keys via branch prediction In CryptographersacircĂŹ Track atthe RSA Conference

[2] Onur Acıiccedilmez Werner Schindler and Ccediletin K Koccedil 2007 Cachebased remote timing attack on the AES In CryptographersacircĂŹ Trackat the RSA Conference

[3] Shaizeen Aga and Satish Narayanasamy 2017 InvisiMem SmartMemory Defenses for Memory Bus Side Channel In Proceedings ofthe 44th Annual International Symposium on Computer Architecture(ISCA rsquo17) ACM New York NY USA 94ndash106 httpsdoiorg10114530798563080232

[4] AMD 2018 White paper SOFTWARE TECHNIQUES FOR MANAG-ING SPECULATION ON AMD PROCESSORS Technical Report RE-VISION 12418 httpsdeveloperamdcomwp-contentresourcesManaging-Speculation-on-AMD-Processorspdf

[5] Nadav Amit Muli Ben-Yehuda Dan Tsafrir and Assaf Schuster 2011vIOMMU Efficient IOMMU Emulation In USENIX Annual TechnicalConference

[6] ARM 2018 Whitepaper Cache Speculation Side-channelsTechnical Report Version 22 httpsdeveloperarmcom-mediadeveloperpdfSecurity20update201020July2018CacheSpeculationSide-channels-v22pdfrevision=7de26366-a49f-4c23-85f0-34e8f5e38881

[7] Eli Biham and Adi Shamir 1997 Differential fault analysis of secretkey cryptosystems In Annual International Cryptology Conference

[8] Nathan L Binkert Ronald G Dreslinski Lisa R Hsu Kevin T Lim Ali GSaidi and Steven K Reinhardt 2006 The M5 Simulator ModelingNetworked Systems Micro IEEE (2006)

[9] Darrell Boggs Gary Brown Nathan Tuck and K S Venkatraman2015 Denver Nvidiarsquos First 64-bit ARM Processor IEEE Micro (2015)

[10] Chandler Carruth 2018 Mitigating Speculative Attacks inCrypto httpsgithubcomHACS-workshopspectre-mitigationsblobmastercryptoguidelinesmd Online accessed Jul 2018

[11] Chandler Carruth 2018 RFC Speculative Load Hardening (aSpectre variant1 mitigation) httpsdocsgooglecomdocumentd1wwcfv3UV9ZnZVcGiGuoITT61eKo3TmoCS3uXLcJR0edit Onlineaccessed Jul 2018

[12] Chandler Carruth 2018 RFC Speculative Load Hardening (a Spectrevariant1 mitigation) httpslistsllvmorgpipermailllvm-dev2018-March122085html Online accessed Jul 2018

[13] Guoxing Chen Sanchuan Chen Yuan Xiao Yinqian Zhang ZhiqiangLin and Ten H Lai 2018 SgxPectre Attacks Leaking Enclave

Secrets via Speculative Execution CoRR abs180209085 (2018)arXiv180209085 httparxivorgabs180209085

[14] H Chen X Wu L Yuan B Zang P c Yew and F T Chong 2008From Speculation to Security Practical and Efficient InformationFlow Tracking Using Speculative Hardware In 2008 InternationalSymposium on Computer Architecture 401ndash412 httpsdoiorg101109ISCA200818

[15] Shimin Chen Phillip B GibbonsMichael Kozuch and Todd CMowry2011 Log-based Architectures Using Multicore to Help SoftwareBehave Correctly SIGOPS Oper Syst Rev 45 1 (Feb 2011) 84ndash91httpsdoiorg10114519450231945034

[16] Hari Cherupalli Henry Duwe Weidong Ye Rakesh Kumar and JohnSartori 2017 Software-based Gate-level Information Flow Securityfor IoT Systems In Proceedings of the 50th Annual IEEEACM Interna-tional Symposium on Microarchitecture (MICRO-50 rsquo17) ACM NewYork NY USA 328ndash340 httpsdoiorg10114531239393123955

[17] JonathanCorbet 2017 The current state of kernel page-table isolationhttpslwnnetArticles741878 Online accessed Jan 2018

[18] Marc L Corliss E Christopher Lewis and Amir Roth 2003 DISE Aprogrammable macro engine for customizing applications In Com-puter Architecture 2003 Proceedings 30th Annual International Sym-posium on

[19] Marc L Corliss E Christopher Lewis and Amir Roth 2005 Low-overhead interactive debugging via dynamic instrumentation withDISE In 11th International Symposium on High-Performance ComputerArchitecture

[20] Marc L Corliss E Christopher Lewis and Amir Roth 2005 UsingDISE to Protect Return Addresses from Attack SIGARCH ComputArchit News (March 2005)

[21] Jedidiah R Crandall and Frederic T Chong 2004 Minos Control DataAttack Prevention Orthogonal toMemoryModel In Proceedings of the37th Annual IEEEACM International Symposium on Microarchitecture(MICRO 37) IEEE Computer Society Washington DC USA 221ndash232httpsdoiorg101109MICRO200426

[22] John Criswell Nathan Dautenhahn and Vikram Adve 2014 VirtualGhost Protecting Applications from Hostile Operating Systems InProceedings of the 19th International Conference on Architectural Sup-port for Programming Languages and Operating Systems (ASPLOS rsquo14)ACM New York NY USA 81ndash96 httpsdoiorg10114525419402541986

[23] John Criswell Andrew Lenharth Dinakar Dhurjati and Vikram Adve2007 Secure Virtual Architecture A Safe Execution Environmentfor Commodity Operating Systems In Proceedings of Twenty-firstACM SIGOPS Symposium on Operating Systems Principles (SOSP rsquo07)ACM New York NY USA 351ndash366 httpsdoiorg10114512942611294295

[24] Michael Dalton Hari Kannan and Christos Kozyrakis 2007 RakshaA Flexible Information Flow Architecture for Software Security InProceedings of the 34th Annual International Symposium on ComputerArchitecture (ISCA rsquo07) ACM New York NY USA 482ndash493 httpsdoiorg10114512506621250722

[25] James C Dehnert Brian K Grant John P Banning Richard JohnsonThomas Kistler Alexander Klaiber and Jim Mattson 2003 The Trans-meta Code Morphing Software using speculation recovery andadaptive retranslation to address real-life challenges In Proceedingsof the international symposium on Code generation and optimizationfeedback-directed and runtime optimization

[26] Matthew DeVuyst Ashish Venkat and Dean M Tullsen 2012 Ex-ecution migration in a heterogeneous-isa chip multiprocessor InProceedings of the Seventeenth International Conference on Architec-tural Support for Programming Languages and Operating Systems

[27] Craig Disselkoen David Kohlbrenner Leo Porter and Dean Tullsen2017 Prime+Abort A Timer-Free High-Precision L3 CacheAttack using Intel TSX In 26th USENIX Security Symposium

(USENIX Security 17) USENIX Association Vancouver BC 51ndash67 httpswwwusenixorgconferenceusenixsecurity17technical-sessionspresentationdisselkoen

[28] Xiaowan Dong Zhuojia Shen John Criswell Alan Cox and SandhyaDwarkadas 2018 Spectres Virtual Ghosts and Hardware SupportIn Proceedings of the 7th International Workshop on Hardware andArchitectural Support for Security and Privacy (HASP rsquo18) ACM NewYork NY USA Article 5 9 pages httpsdoiorg10114532142923214297

[29] Xiaowan Dong Zhuojia Shen John Criswell Alan L Cox and Sand-hya Dwarkadas 2018 Shielding Software From Privileged Side-Channel Attacks In 27th USENIX Security Symposium (USENIX Se-curity 18) USENIX Association Baltimore MD 1441ndash1458 httpswwwusenixorgconferenceusenixsecurity18presentationdong

[30] Dmitry Evtyushkin Ryan Riley Nael CSE Abu-Ghazaleh ECE andDmitry Ponomarev 2018 BranchScope A New Side-Channel Attackon Directional Branch Predictor In Proceedings of the Twenty-ThirdInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems (ASPLOS rsquo18) ACM New York NYUSA 693ndash707 httpsdoiorg10114531731623173204

[31] Karine Gandolfi Christophe Mourtel and Francis Olivier 2001 Elec-tromagnetic analysis Concrete results In International Workshop onCryptographic Hardware and Embedded Systems

[32] Qian Ge Yuval Yarom David Cock and Gernot Heiser 2018 Asurvey of microarchitectural timing attacks and countermeasures oncontemporary hardware Journal of Cryptographic Engineering 8 1(2018) 1ndash27

[33] Daniel Genkin Adi Shamir and Eran Tromer 2014 RSA key ex-traction via low-bandwidth acoustic cryptanalysis In InternationalCryptology Conference

[34] Daniel Genkin Adi Shamir and Eran Tromer 2016 Acoustic Crypt-analysis Journal of Cryptology (2016)

[35] Github [n d] wrk - a HTTP benchmarking tool httpsgithubcomwgwrk Online accessed Jun 2018

[36] Daniel Gruss Moritz Lipp Michael Schwarz Richard Fellner Cleacute-mentine Maurice and Stefan Mangard 2017 Kaslr is dead long livekaslr In International Symposium on Engineering Secure Software andSystems Springer 161ndash176

[37] Daniel Gruss Raphael Spreitzer and Stefan Mangard 2015Cache Template Attacks Automating Attacks on InclusiveLast-Level Caches In 24th USENIX Security Symposium (USENIXSecurity 15) USENIX Association Washington DC 897ndash912httpswwwusenixorgconferenceusenixsecurity15technical-sessionspresentationgruss

[38] Maurice CleacutementineWagner Klaus Gruss Daniel and StefanMangard2016 Flush+Flush A Fast and Stealthy Cache Attack

[39] V Haldar D Chandra andM Franz 2005 Dynamic taint propagationfor Java In 21st Annual Computer Security Applications Conference(ACSACrsquo05) 9 ppndash311 httpsdoiorg101109CSAC200521

[40] W Halfond A Orso and P Manolios 2008 WASP Protecting WebApplications Using Positive Tainting and Syntax-Aware EvaluationIEEE Transactions on Software Engineering 34 1 (Jan 2008) 65ndash81httpsdoiorg101109TSE200770748

[41] John L Henning 2006 SPEC CPU2006 Benchmark DescriptionsSIGARCH Comput Archit News (Sept 2006)

[42] Jonathan J Hoch and Adi Shamir 2004 Fault analysis of streamciphers In International Workshop on Cryptographic Hardware andEmbedded Systems

[43] Intel 2008 2nd Generation Intel Core vPro Processor Family Tech-nical Report httpwwwintelcomcontentdamdocwhite-paperperformance-2nd-generation-core-vpro-family-paperpdf

[44] Intel 2014 Software guard extensions programming reference (2014)httpssoftwareintelcomsitesdefaultfiles329298-001pdf

[45] Intel 2018 White paper Intel Analysis of Speculative Exe-cution Side Channels Technical Report 336983-001 Revision10 httpsnewsroomintelcomwp-contentuploadssites11201801Intel-Analysis-of-Speculative-Execution-Side-Channelspdf

[46] Intel 2018 White paper Retpoline A Branch Target In-jection Mitigation Technical Report 337131-003 Revision003 httpssoftwareintelcomsitesdefaultfilesmanaged1d46Retpoline-A-Branch-Target-Injection-Mitigationpdf

[47] Intel Corporation 2009 Intelreg 64 and IA-32 Architectures OptimizationReference Manual

[48] Intel Corporation 2011 Intel 64 and IA-32 Architectures SoftwareDeveloperrsquos Manual Intel Corporation

[49] Project Zero Jann Horn 2018 Reading privileged memory with a side-channel httpsgoogleprojectzeroblogspotcom201801reading-privileged-memory-with-sidehtml Online accessed Jan 2018

[50] Morteza Mohajjel Kafshdooz Mohammadkazem Taram Sepehr As-sadi and Alireza Ejlali 2016 A Compile-Time Optimization MethodforWCET Reduction in Real-Time Embedded Systems Through BlockFormation ACMTrans Archit Code Optim 12 4 Article 66 (Jan 2016)25 pages httpsdoiorg1011452845083

[51] R Karri K Wu P Mishra and Yongkook Kim 2002 Concurrenterror detection schemes for fault-based side-channel cryptanalysisof symmetric block ciphers IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems 21 12 (Dec 2002) 1509ndash1517httpsdoiorg101109TCAD2002804378

[52] Mehmet Kayaalp Khaled N Khasawneh Hodjat Asghari EsfedenJesse Elwell Nael Abu-Ghazaleh Dmitry Ponomarev and AamerJaleel 2017 RIC Relaxed Inclusion Caches for Mitigating LLC Side-Channel Attacks In Proceedings of the 54th Annual Design AutomationConference 2017 (DAC rsquo17) ACM New York NY USA Article 76 pages httpsdoiorg10114530616393062313

[53] Khaled N Khasawneh Esmaeil Mohammadian Koruyeh ChengyuSong Dmitry Evtyushkin Dmitry Ponomarev and Nael Abu-Ghazaleh 2018 SafeSpec Banishing the Spectre of a Meltdown withLeakage-Free Speculation arXiv preprint arXiv180605179 (2018)

[54] Vladimir Kiriansky and Carl Waldspurger 2018 Speculative bufferoverflows Attacks and defenses arXiv preprint arXiv180703757(2018)

[55] Vladimir Kiriansky and Carl Waldspurger 2018 Speculative BufferOverflows Attacks and Defenses arXiv preprint arXiv180703757(2018)

[56] Paul Kocher 2018 Spectre Mitigations in MicrosoftrsquosCC++ Compiler httpswwwpaulkochercomdocMicrosoftCompilerSpectreMitigationhtml Online accessedJul 2018

[57] Paul Kocher Daniel Genkin Daniel Gruss Werner Haas MikeHamburg Moritz Lipp Stefan Mangard Thomas Prescher MichaelSchwarz and Yuval Yarom 2018 Spectre Attacks Exploiting Specu-lative Execution ArXiv e-prints (Jan 2018) arXiv180101203

[58] Esmaeil Mohammadian Koruyeh Khaled N Khasawneh ChengyuSong and Nael Abu-Ghazaleh 2018 Spectre Returns SpeculationAttacks using the Return Stack Buffer In 12th USENIXWorkshop on Of-fensive Technologies (WOOT 18) USENIX Association Baltimore MDhttpswwwusenixorgconferencewoot18presentationkoruyeh

[59] Esmaeil Mohammadian Koruyeh Khaled N Khasawneh ChengyuSong and Nael Abu-Ghazaleh 2018 Spectre Returns SpeculationAttacks using the Return Stack Buffer In 12th USENIXWorkshop on Of-fensive Technologies (WOOT 18) USENIX Association Baltimore MDhttpswwwusenixorgconferencewoot18presentationkoruyeh

[60] Monica S Lam Michael Martin Benjamin Livshits and John Wha-ley 2008 Securing Web Applications with Static and DynamicInformation Flow Tracking In Proceedings of the 2008 ACM SIG-PLAN Symposium on Partial Evaluation and Semantics-based Pro-gram Manipulation (PEPM rsquo08) ACM New York NY USA 3ndash12

httpsdoiorg10114513284081328410[61] Moritz Lipp Michael Schwarz Daniel Gruss Thomas Prescher

Werner Haas Anders Fogh Jann Horn Stefan Mangard Paul KocherDaniel Genkin Yuval Yarom and Mike Hamburg 2018 MeltdownReading Kernel Memory from User Space In 27th USENIX SecuritySymposium (USENIX Security 18)

[62] Fangfei Liu Yuval Yarom Qian Ge Gernot Heiser and Ruby BLee 2015 Last-Level Cache Side-Channel Attacks Are PracticalIn Proceedings of the 2015 IEEE Symposium on Security and Privacy(SP rsquo15) IEEE Computer Society Washington DC USA 605ndash622httpsdoiorg101109SP201543

[63] Jason Lowe-Power 2018 Visualizing Spectre with gem5 httpwwwlowepowercomjasonvisualizing-spectre-with-gem5html Onlineaccessed Jun 2018

[64] Jason Lowe-Power Venkatesh Akella Matthew K Farrens Samuel TKing and Christopher J Nitta 2018 Position Paper A Case forExposing Extra-architectural State in the ISA In Proceedings of the7th International Workshop on Hardware and Architectural Support forSecurity and Privacy (HASP rsquo18) ACM New York NY USA Article 86 pages httpsdoiorg10114532142923214300

[65] Scott A Mahlke David C Lin William Y Chen Richard E Hank andRoger A Bringmann 1992 Effective compiler support for predicatedexecution using the hyperblock In MICRO

[66] Giorgi Maisuradze and Christian Rossow 2018 Ret2Spec SpeculativeExecution Using Return Stack Buffers In Proceedings of the 2018 ACMSIGSAC Conference on Computer and Communications Security (CCSrsquo18) ACM New York NY USA 2109ndash2122 httpsdoiorg10114532437343243761

[67] Microsoft Security Response Center (MSRC) Matt Miller 2018Analysis and mitigation of speculative store bypass (CVE-2018-3639) httpsblogstechnetmicrosoftcomsrd20180521analysis-and-mitigation-of-speculative-store-bypass-cve-2018-3639 Onlineaccessed Jul 2018

[68] Keaton Mowery Sriram Keelveedhi and Hovav Shacham 2012 AreAES x86 cache timing attacks still feasible In Proceedings of the 2012ACM Workshop on Cloud computing security workshop

[69] Oleksii Oleksenko Bohdan Trach Tobias Reiher Mark Silbersteinand Christof Fetzer 2018 You Shall Not Bypass Employing datadependencies to prevent Bounds Check Bypass CoRR abs180508506(2018) arXiv180508506 httparxivorgabs180508506

[70] Yossef Oren Vasileios P Kemerlis Simha Sethumadhavan and An-gelos D Keromytis 2015 The Spy in the Sandbox Practical CacheAttacks in JavaScript and Their Implications In Proceedings of the22Nd ACM SIGSAC Conference on Computer and CommunicationsSecurity

[71] Guilherme Ottoni Thomas Hartin ChristopherWeaver Jason BrandtBelliappa Kuttanna and HongWang 2011 Harmonia A TransparentEfficient and Harmonious Dynamic Binary Translator Targeting theIntelampReg Architecture In Proceedings of the 8th ACM InternationalConference on Computing Frontiers (CF rsquo11) ACM New York NYUSA Article 26 10 pages httpsdoiorg10114520166042016635

[72] Andrew Pardoe 2018 Spectre mitigations in MSVC httpsblogsmsdnmicrosoftcomvcblog20180115spectre-mitigations-in-msvc Online accessed Jul 2018

[73] Joseph CH Park and Mike Schlansker 1991 On predicated executionHewlett-Packard Laboratories Palo Alto California

[74] Will Reese 2008 Nginx The High-performance Web Server andReverse Proxy Linux J 2008 173 Article 2 (Sept 2008) httpdlacmorgcitationcfmid=14122021412204

[75] Ryan Roemer Erik Buchanan Hovav Shacham and Stefan Savage2012 Return-oriented programming Systems languages and ap-plications ACM Transactions on Information and System Security(2012)

[76] Michael Schwarz Martin Schwarzl Moritz Lipp and Daniel Gruss2018 NetSpectre Read Arbitrary Memory over Network arXivpreprint arXiv180710535 (2018)

[77] Hovav Shacham Matthew Page Ben Pfaff Eu-Jin Goh NagendraModadugu and Dan Boneh 2004 On the Effectiveness of Address-space Randomization In Proceedings of the 11th ACM Conference onComputer and Communications Security (CCS rsquo04) ACM New YorkNY USA 298ndash307 httpsdoiorg10114510300831030124

[78] Julian Stecklina and Thomas Prescher 2018 LazyFP Leaking FPURegister State using Microarchitectural Side-Channels arXiv preprintarXiv180607480 (2018)

[79] G Edward Suh Jae W Lee David Zhang and Srinivas Devadas2004 Secure Program Execution via Dynamic Information FlowTracking In Proceedings of the 11th International Conference onArchitectural Support for Programming Languages and OperatingSystems (ASPLOS XI) ACM New York NY USA 85ndash96 httpsdoiorg10114510243931024404

[80] Mohammadkazem Taram Ashish Venkat and Dean Tullsen 2018Mobilizing the Micro-Ops Exploiting Context Sensitive Decodingfor Security and Energy Efficiency In Proceedings of the 45th AnnualInternational Symposium on Computer Architecture (ISCA rsquo18)

[81] Mohit Tiwari Hassan MG Wassel Bita Mazloom ShashidharMysore Frederic T Chong and Timothy Sherwood 2009 Com-plete Information Flow Tracking from the Gates Up In Proceedings ofthe 14th International Conference on Architectural Support for Program-ming Languages and Operating Systems (ASPLOS XIV) ACM NewYork NY USA 109ndash120 httpsdoiorg10114515082441508258

[82] Caroline Trippel Daniel Lustig and Margaret Martonosi 2018MeltdownPrime and SpectrePrime Automatically-Synthesized At-tacks Exploiting Invalidation-Based Coherence Protocols CoRRabs180203802 (2018) arXiv180203802 httparxivorgabs180203802

[83] DeanM Tullsen Susan J Eggers Joel S Emer Henry M Levy Jack L Loand Rebecca L Stamm 1996 Exploiting choice Instruction fetch andissue on an implementable simultaneous multithreading processor InACM SIGARCH Computer Architecture News Vol 24 ACM 191ndash202

[84] Dean M Tullsen Susan J Eggers and Henry M Levy 1995 Si-multaneous Multithreading Maximizing On-chip Parallelism InProceedings of the 22Nd Annual International Symposium on Com-puter Architecture (ISCA rsquo95) ACM New York NY USA 392ndash403httpsdoiorg101145223982224449

[85] Paul Turner 2018 Retpoline a software construct for preventingbranch-target-injection httpssupportgooglecomfaqsanswer7625886 Online accessed Jul 2018

[86] N Vachharajani M J Bridges J Chang R Rangan G Ottoni J ABlome G A Reis M Vachharajani and D I August 2004 RIFLEAn Architectural Framework for User-Centric Information-Flow Se-curity In Microarchitecture 2004 MICRO-37 2004 37th InternationalSymposium on 243ndash254 httpsdoiorg101109MICRO200431

[87] Jo Van Bulck Marina Minkin Ofir Weisse Daniel Genkin BarisKasikci Frank Piessens Mark Silberstein Thomas F Wenisch Yu-val Yarom and Raoul Strackx 2018 Foreshadow Extracting theKeys to the Intel SGX Kingdom with Transient Out-of-Order Execu-tion In Proceedings of the 27th USENIX Security Symposium USENIXAssociation See also technical report Foreshadow-NG [95]

[88] Ashish Venkat 2018 Breaking the ISA Barrier in Modern ComputingPhD Dissertation UC San Diego

[89] Ashish Venkat Arvind Krishnaswamy Koichi Yamada and RajanPalanivel 2015 Binary Translation driven Program State RelocationIn United States Patent Grant US009135435B2

[90] Ashish Venkat S Shamasunder Hovav Shacham and DeanM Tullsen2016 HIPStR Heterogeneous-ISA Program State Relocation In Pro-ceedings of the International Symposium on Architectural Support forProgramming Languages and Operating Systems

[91] Ashish Venkat and Dean M Tullsen 2014 Harnessing ISA diversityDesign of a heterogeneous-ISA chip multiprocessor In Proceedingsof the International Symposium on Computer Architecture

[92] G Venkataramani I Doudalis Y Solihin and M Prvulovic 2008FlexiTaint A programmable accelerator for dynamic taint propaga-tion In 2008 IEEE 14th International Symposium on High PerformanceComputer Architecture 173ndash184 httpsdoiorg101109HPCA20084658637

[93] Philipp Vogt Florian Nentwich Nenad Jovanovic Engin KirdaChristopher Kruegel and Giovanni Vigna 2007 Cross Site Script-ing Prevention with Dynamic Data Tainting and Static Analysis InNDSS Vol 2007 12

[94] Zhenghong Wang and Ruby B Lee 2006 Covert and side channelsdue to processor architecture In Computer Security Applications Con-ference 2006 ACSACrsquo06 22nd Annual IEEE 473ndash482

[95] Ofir Weisse Jo Van Bulck Marina Minkin Daniel Genkin BarisKasikci Frank Piessens Mark Silberstein Raoul Strackx Thomas FWenisch and Yuval Yarom 2018 Foreshadow-NG Breaking theVirtual Memory Abstraction with Transient Out-of-Order Execu-tion Technical report (2018) See also USENIX Security paper Fore-shadow [87]

[96] Troy Wolverton 2018 Spectre and Meltdown are now a le-gal pain for Intel the chip maker faces 35 lawsuits over the at-tacks httpswwwbusinessinsidercom35-lawsuits-have-been-filed-against-intel-over-spectre-and-meltdown-2018-2 Online ac-cessed Aug 2018

[97] Zhenyu Wu Zhang Xu and Haining Wang 2012 Whispers in theHyper-space High-speed Covert Channel Attacks in the Cloud InPresented as part of the 21st USENIX Security Symposium (USENIX Se-curity 12) USENIX Bellevue WA 159ndash173 httpswwwusenixorgconferenceusenixsecurity12technical-sessionspresentationwu

[98] Wei Xu Sandeep Bhatkar and Ramachandran Sekar 2006 Taint-Enhanced Policy Enforcement A Practical Approach to Defeat aWide Range of Attacks In USENIX Security Symposium 121ndash136

[99] M Yan J Choi D Skarlatos A Morrison C Fletcher and J Torrellas2018 InvisiSpec Making Speculative Execution Invisible in the CacheHierarchy In 2018 51st Annual IEEEACM International Symposium onMicroarchitecture (MICRO) 428ndash441 httpsdoiorg101109MICRO201800042

[100] Yuval Yarom and Katrina Falkner 2014 FLUSH+ RELOAD A HighResolution Low Noise L3 Cache Side-Channel Attack In USENIXSecurity

[101] Yuval Yarom Daniel Genkin and Nadia Heninger 2016 CacheBleedA timing attack on OpenSSL constant time RSA In InternationalConference on Cryptographic Hardware and Embedded Systems

[102] Alexander Yip Xi Wang Nickolai Zeldovich and M Frans Kaashoek2009 Improving Application Security with Data Flow AssertionsIn Proceedings of the ACM SIGOPS 22Nd Symposium on OperatingSystems Principles (SOSP rsquo09) ACM New York NY USA 291ndash304httpsdoiorg10114516295751629604

[103] Craig Zilles 2001 Linked List Traversal Micro-Benchmark httpzillescsillinoisedullubenchmarkhtml Online accessed Jun 2018

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
  • 3 Assumptions and Threat Model
  • 4 Architectural Overview
  • 5 Design and Implementation
    • 51 Microcode Customization
    • 52 Decoder-Level Information Flow Tracking
    • 53 Mitigations for Control-flow Mistraining
      • 6 Methodology
      • 7 Evaluation
        • 71 Security Discussion
        • 72 Performance
          • 8 Conclusion
          • Acknowledgments
          • References
Page 3: Context-Sensitive Fencing: Securing Speculative Execution ...av6ds/papers/asplos2019.pdf · Context-Sensitive Fencing: Securing Speculative Execution via Microcode Customization Mohammadkazem

Table 1 Speculative Attacks VariantsVariant Vulnerability NameSpectre v1 [49 57] Bounds Check Bypass (BCB)Spectre v2 [49 57] Branch Target Injection (BTI)Spectre v3 [49 61] Rogue Data Cache Load (RDCL)Spectre v3a [6 67] Rogue System Register Read (RSRD)Spectre v4 [67] Speculative Store Bypass (SSB)Spectre-NG [78] Lazy FP State RestoreSpectre v11 [54] Bounds Check Bypass Store (BCBS)Spectre v12 [54] Read-only Protection BypassSpectre v5 [59 66] Ret2Spec and SpecRSBNetSpectre [76] Remote Bounds Check BypassForeshadow [87 95] L1 Terminal Fault

1 i f ( x lt a r r a y 1 _ s i z e )2 y = a r r ay2 [ a r r ay1 [ x ] lowast 2 5 6 ] Figure 1 Example Spectre Variant-1 Gadget

they share cache memory The attack unfolds through a pre-attack step in which a spy process fillsflushes specific cachesets so that the victim leaves observable side effects in termsof its cache access patterns that can be later inferred by thespy process by timing the access to particular cache blocks

Exploiting Speculative Execution This work tacklesa highly evasive class of microarchitectural attacks calledSpectre [57] that leaks information by exploiting side effectsof speculative execution through cache-based side channelsThese attacks not only exploit unintended side effects dueto speculation but have the ability to deliberately misleadexecution into attacker-intended paths by mistraining thebranch predictor and other associated structures Severalvariants of the attack have been described (shown in Table 1)that can potentially bypass software securityintegrity pro-tection mechanisms such as bounds checking and ASLR [77]Figure 1 shows a vulnerable code target for the variant-

1 attack The code fragment is composed of a conditionalbranch that performs a bounds check and a Spectre gadgetthat results in an observable microarchitectural side effectupon execution Upon misspeculation the bounds check isbypassed and the Spectre gadget executes leaving an observ-able cache footprint to the attacker that remains even afterthe processor detects the misprediction and rolls back execu-tion In the simplest variant of the attack where the attackercontrols both x and array2 the attacker can potentially leakthe entire address space of the victimThe variant-2 Spectre attack further allows the attacker

to hijack speculative execution by mistraining the branchpredictor and associated structures enabling a ROP-styleattack [75] that stitches together Spectre gadgets To mountsuch an attack a co-located adversary process that sharesthe branch predictor (eg a browser process with severaluser threads) with the victim first forces an artificial BTB(Branch Target Buffer) entry collision using a carefully cho-sen indirect branch address The attacker next poisons thevalue of the colliding BTB entry by repeatedly executing its

1 mov eax a r r 1 _ s i z e2 cmp edi eax3 jge END_LBL4 mov eax edi

6 mov eax [ eax+ a r r 1 ]7 shl eax 0 x88 mov eax [ eax+ a r r 2 ]9 mov [ y ] eax10 END_LBL

(a) Vulnerable gadget (x86)

mov eax a r r 1 _ s i z ecmp edi eaxjge END_LBLmov eax edi____FENCE____mov eax [ eax+ a r r 1 ]shl eax 0 x8mov eax [ eax+ a r r 2 ]mov [ y ] eaxEND_LBL

(b) Fenced gadget (x86)

Figure 2 Mitigating Spectre-v1 using a Fence Instruction

own branch whose target is the address of a Spectre gadgetAfter successful mistraining of the branch predictor the pro-cessor now predicts the victimrsquos indirect branch that collidesin the BTB to be the attacker-chosen Spectre gadget andstarts speculatively executing itThe variant-3 attack (aka Meltdown) exploits the fact

that most out-of-order processors that employ dynamic spec-ulation supress loads with protection violation at instructionretirement (ie commit stage) rather than during instruc-tion execution Therefore a non-privileged memory accesscan find its way to the cache and leave a footprint allowingan attacker to now effectively read arbitrary memory con-tents of another process or even the kernel or the hypervisorthrough careful cache-based side channel analysisThe literature describes multiple attacks that conform to

the above variants but exploit different attack targets andordifferent side channels These include SgxPectre [13] that by-passes Intelrsquos SGX [44] security mechanisms to steal secretsfrom SGX enclaves the MeltdownPrimeSpectrePrime [82]that leverage a PRIME+PROBE [62] cache attack instead ofFLUSH+RELOAD [100] by exploiting the side effects of cacheline invalidation mechanisms in modern cache coherenceprotocols and the NetSpectre attack that leaks informationacross independent virtual machines on Google Cloud viaan AVX-based covert channel [76]The variant-4 Spectre attack exploits microarchitectural

side effects of the memory disambiguation feature employedby most out-of-order processors in order to allow loads to bespeculatively ordered and executed before any outstandingstore whose effective address has not been calculated yetUpon misspeculation ie if the load address has a conflictwith the outstanding store the processor flushes the pipelineand triggers the re-execution of the load and all subsequentinstructions [48] while the microarchitectural side effectsof the misspeculation linger resulting in memory disclo-sures similar to variant 1 and 3 Similarly the NG variant-3attack [78] leverages the lazy x87 floating-point restore func-tionality of Intel CPUs to read floating-point registers of avictim process

The Spectre variants 11 and 12 dubbed Speculative BufferOverflows [55] exploit the store-to-load forwarding opti-mization to stitch together Spectre gadgets More specificallythese attacks bypass a stack buffer overflow check similarto variant-1 and execute a Spectre gadget that speculativelystores malicious content (the address of the next Spectregadget) into the return address on the stack Store-to-loadforwarding combined with the fact that most modern pro-cessors break down the return instruction into micro-opsthat load the return address from the stack before transfer-ring control then result in a ROP-style execution of Spectregadgets that leave a trail of microarchitectural state behindThe variant-5 attack [58] achieves similar effect by insteadmistraining the return address stack employed by most pro-cessors to speculatively predict the target of a procedurereturn

Spectre Mitigations The current set of mitigations forSpectre range from simple coding guidelines[10] to propos-als that advocate exposing microarchitectural details in theISA [64] To mitigate Meltdown Kernel Page Table Isolation(KPTI) [17 36] has been proposed and recently patched to theLinux kernel incurring about 6 in performance To mitigateSpectre v1 multiple chip manufacturers including Intel [45]ARM [6] and AMD [4] have suggested instrumenting codewith serializing instructions or fences to inhibit speculationat specific points in execution

For example consider Figure 2a that shows the assemblycode for the Spectre variant-1 gadget in Figure 1 Figure 2bshows a software-patched version that employs a serializinginstruction or a fence to prevent the speculative execution ofthe Spectre gadget (ie lines 6-9) In most implementationsupon decoding the fence the processor stops fetching newinstructions until the fence gets committed or squashedthereby serializing execution In our example if the attackercalls the vulnerable gadget with out-of-bounds values in theedi register the processor front-end stalls until the fence iscommitted thereby disallowing the speculative execution ofthe Spectre gadget and completely mitigating the attackThe associated performance overhead with liberal fence

insertion however could be as high as 10x [69] While itis possible to perform compiler-directed code instrumen-tation with fences at a lower performance overhead [72]locating the potential Spectre targets using static analysis isnon-trivial and could therefore result in less than full cov-erage [56] Speculative Load Hardening [11 12] and YSNB(You Shall Not Bypass) [69] propose to use predicated execu-tion [50 65 73] to alleviate the high overheads for fences byinjecting an artificial data dependency between the condi-tional branch and the Spectre gadget These proposals incur36-60 overhead in performanceTo cope with variant-2 both Intel and AMD have an-

nounced microcode update patches that introduce new fenceinstructions [4 45] that prevent instructions preceding thefence from controlling the indirect branch prediction of

branches that follow the fence The fences also prevent soft-ware with lower privileges from influencing the indirectbranch prediction of software with higher privileges Fur-thermore these patches restrict indirect branch predictionfrom being controlled by co-located threads via simultane-ous multithreading [83 84] Furthermore the use of retpo-lines [45 46 85] have been advocated to replace indirectjumps and calls with an equivalent push+ret instruction se-quence in order to bypass the indirect branch predictorHowever they could further expand the attack surface inthe wake of v5 [58 66] v11 and v12 attacks [54]

On the hardware front SafeSpec [53] and InvisiSpec [99]propose mitigating side effects of speculative execution byadding new shadow user-invisible structures for caches andTLBs that store transient results from speculative instruc-tions and committing them to main cacheTLB only if thespeculation was deemed correct and the corresponding in-structions gracefully retire Although these techniques makedisruptive changes to the processormemory architectureand consistency models they make significant strides in se-cure hardware design with minimal performance impact Fi-nally Dong et al [28 29] propose leveraging Virtual Ghost [22]to protect applications running on a compromised OS kernelfrom Spectre

Secure Instruction StreamCustomization Prior workhas established that instruction customization is an effectivemeans to instrument the dynamic execution stream with se-curity checks and therebymitigate several attack vectors at arelatively low performance overhead Instruction stream cus-tomization has been proposed and evaluated in various formsand levels ranging from secure virtual architectures at theISA and compiler level [23 88 90] to full-blown processor bi-nary translation at the microcode level [9 25 71 89] Corlisset al [18ndash20] propose dynamic instruction stream editing(DISE) a macro-engine that customizes the dynamic instruc-tion stream at the decoder level by pattern-matching user-defined production rules pushed into the decoder Taram etal [80] propose context-sensitive decoding that leverages theCISC to RISC translation feature of modern instruction setdecoders to dynamically alter the behavior of programmer-visible instructions without recompilation or binary trans-lation [26 91] While we leverage these approaches in thisresearch this work targets a different class of emerging at-tacks and proposes several additional techniques beyondmicrocode customization

Information Flow Tracking Suh et al [79] first pro-posed the idea of dynamic information flow tracking (DIFT)that tags data from untrusted channels as spurious and fur-ther tracks the information flow of spurious data flaggingany violation of an enforced security policy Multiple hard-ware information flow tracking techniques have been de-scribed at various levels of the hardware [14ndash16 21 24 81

86 92] as a detection mechanism to circumvent code injec-tion [98] cross-site scripting attacks [24 93] buffer over-flows [24 98] and SQL injection [24 39 40 60 102] Thiswork proposes decoder-level information-flow tracking asa detection mechanism for Spectre and addresses severalassociated challenges

3 Assumptions and Threat ModelAmong the Spectre variants the Bounds Check Bypass (variant-1) is the one which has had the highest impact in terms ofaffected platforms and devices Therefore the main focus ofthis work is to mitigate the variant-1 attack at an acceptablelevel of performance However owing to the flexibility ad-vantages of microcode customization we also propose newmitigations against other Spectre variants documented inTable 1

We assume that the goal of the attacker is to read arbi-trary memory contents by exploiting a Spectre gadget Alsowithout loss of generality in this work we defend against aSpectre attacker that targets the most obvious and the mostimportant victim ie the OS kernel but our frameworkallows us to extend the proposed techniques to mitigate dif-ferent targets such as virtual machines and browsers Ourattack model assumes an adversary with the following capa-bilities

bull Information Leakage We assume that the adver-sary has the ability and privilege to probe flush orevict any cache line including that of the kernel at anyparticular time Also they can make precise timingmeasurements using instructions such as rdtsc andrdtscp

bull Co-location We assume that the adversary is colo-cated with the victim and can not only leak informa-tion through cache-based side channels but can alsomistrain shared branch predictor structures includingbranch target buffers and return address stacks andcan further influence the branch outcome of the victimas described in the variant-2 attack

bull User-Mode Access We assume that the adversaryhas standard user-mode access and can invoke anysystem call with arbitrary and carefully chosen ar-guments and can further access devices such as thekeyboard or network to feed the kerneldriver codewith carefully chosen data

Moreover we assume that the only communication chan-nel between the attacker and the kernel code is throughmicro-architectural side channels and the attacker does nothave any other channel direct or indirect to leak data Amongall the micro-architectural side-channels data caches are theones that are predominantly used and are the easiest to ex-ploit Thus the mitigation strategies described in this workprimarily focus on those variants of the attack that use thedata cache as their covert channel to leak information Again

Configuration Registers (MSRs)

Mac

ro-o

p qu

eue

microop-Queue

Scratchpad Regs

Runtime System

DLIFTMacro-op Stream

Macro-to-microop Translation Unit

CSD-Enabled Decoder

INC [0x802ac] FENCE LD t0[0x802ac]ADD t0 t0 1ST [0x802ac] t0 Di

spat

cher Mistaint

Detection and Recovery

Figure 3 Architectural Overview

due to the flexiblity and programmability of the proposedstrategy it is possible to easily extend the approach to miti-gate other side-channels such as functional unit contentionand AVX timing channels [76]

4 Architectural OverviewFigure 3 gives the architectural overview of our defensestrategy ndash context-sensitive fencing The central piece of theproposed architecture is an x86 microcode engine that hascontext-sensitive decoding (CSD) capabilities [80] allow-ing it to optionally translate a native x86 instruction into acustomizable alternate set of micro-ops In this work weleverage this capability to perform the surgical insertion ofspeculation fences (some existing and some newly proposed)at potentially vulnerable Spectre code targets To this end weintroduce new custommicro-op flows and new configurationmechanisms that trigger such micro-op flowsThe CSD-enabled microcode engine is provisioned with

fine-grained reconfiguration capabilities via a set of model-specific registers (MSRs) that can control the frequency typeand enforcement criteria of speculation fences inserted intothe dynamic instruction stream Such a fine-grained recon-figuration capability is especially important to this workbecause speculation fences are particularly expensive allow-ing us to surgically insert fences that impose varying degreesof restrictions on speculative execution depending upon theruntime conditions current level of threat and the nature ofthe code being executedMoreover the context-sensitive fencing framework also

benefits from a novel decoder-level information flow track-ing (DLIFT) engine that has the ability to identify untrustedinstructions that are potentially in Spectre gadgets and trig-ger alternate micro-op flows that insert speculation fencesOwing to its decoder-level and inherently speculative imple-mentation DLIFT relies on a mistaint detection and recoveryhardware implemented in the execute stage to avoid over-tainting and undertainting scenarios Finally the proposeddefense framework also includes hardware and microcode-level mechanisms to achieve branch predictor state isola-tion across protection domains to mitigate the variant-2 andvariant-4 attacks

Table 2 List of Intelrsquos Serializing InstructionsType Instr Desc

PrivilegedSeria

lizingInstructions INVD Invalidate Internal Caches

INVEPT Invalidate Translations from EPTINVLPG Invalidate TLB EntriesINVVPID Invalidate Translations Based on VPIDLIDT Load Interrupt Descriptor Table RegisterLGDT Load Global Descriptor Table RegisterLLDT Load Local Descriptor Table RegisterLTR Load Task RegisterMOV Move to Control RegisterMOV Move to Debug Register

WBINVD Write Back and Invalidate CacheWRMSR Write to Model Specific Register

Non

Priv CPUID CPU Identification

IRET Interrupt ReturnRSM Resume from System Management Mode

Mem

Ordering SFENCE Store Fence

LFENCE Load Fence

MFENCE Memory Fence

5 Design and ImplementationIn this section we describe in greater detail the architecturaltechniques and building blocks that together constitute theproposed defense strategy ndash context-sensitive fencing Firstwe describe a microcode customization framework that en-ables the surgical insertion of fences to secure speculativeexecution with minimal performance impact Second we ex-amine the full design space of fences to provide the isolationproperties we need without undue obstruction of existingpipeline performance features Third we propose a decoder-level information flow tracking (DLIFT) technique to followpotentially malicious execution patterns and trigger securemicrocode customization Finally we propose protectionmechanisms that circumvent the mistraining of the branchpredictor and associated structures

51 Microcode CustomizationSpeculation fences are a processorrsquos primary mechanism tooverride speculative execution For Spectre variant-1 context-sensitive fencing works with CSD by providing alternate de-codings of all load instructions with the alternate decodingalways including a fence micro-op that appears before theload micro-op The alternate decoding will then be triggeredor not based on runtime conditions The first considerationthen is what fence instruction to incorporate Most pro-cessors already provide a variety of fences and serializinginstructions We first study the performance and securityimpact of the existing suite of fences and serializing instruc-tions and then explore and propose new speculation fencesand context-sensitive fencing techniques that manage theimpact on performance For Spectre variant-2 CSF works

micro-op

Que

ue

Macro-op Stream Macro-to-microop

Translation Unit

CSD-Enabled Decoder

FENCE Injection

Point

micro-op

Dis

patc

her

Load Store-Queue

FU1 FU2

RSRS

RSRSRS

Cache

Cac

he

Con

trolle

r

Mem

Mem

C

ontro

ller

Fence Enforcement Point

Figure 4 Fence Enforcement Points

with CSD by providing alternate decodings of some con-trol flow and possibly insert fences andor branch predictorresetting micro-ops ndash more detail is in Section 53

511 Serializing Instructions and Memory FencesSerializing instructions are the strictest amongst all specula-tion fences and completely override speculative executionUpon decoding a serializing instruction the processor stallsfetching any subsequent instruction until all instructionspreceding the serializing instruction retire Due to the highpipeline depth and issue width of modern out-of-order su-perscalars the usage of serializing instructions could resultin long delays and considerable throughput loss Memoryfences on the other hand enforce a memory serializationpoint in the program instruction stream More specificallythese fences ensure that memory operations that appear inexecution after the fence are stalled until all the outstandingmemory requests (including the fence) complete executionTable 2 shows a complete list of Intelrsquos existing serializ-

ing instructions and fences [48] Most of the serializing in-structions need to be run in privileged mode which restrictstheir usage in defenses for victims that lack sufficient privi-leges (eg browsers) Moreover a majority of them modifythe state or contents of architectural registers the programcounter cache TLB etc and only serialize execution as aside effect thereby requiring an additional backuprestorestep when used for the sole purpose of serializing executionAn exception to this is the MOV to debug register instructionthat does not corrupt any architectural state if not actuallyin debug modeThe SFENCE instruction does not allow stores to pass

through it but does not affect loads That is it enforces thatall stores that precede the SFENCE are executed to comple-tion before any store that succeeds the SFENCE is fetchedThe MFENCE instruction restricts all memory operationsfrom passing through it Unlike SFENCE and MFENCE thatare memory ordering operations LFENCE performs a seri-alizing operation In particular LFENCE does not stop the

processor from fetching and decoding instructions that ap-pear after the LFENCE but it restricts the dispatch of suchinstructions until the instructions preceding the LFENCEcomplete execution Since LFENCE serializes execution andyet makes some progress in the front-end it has been recom-mended by Intel as a low-overhead fence that can be insertedat vulnerable points in execution to defend against Spec-tre [45] In other words while serializing instructions suchas CPUID are enforced at Fetch LFENCE is enforced at theinstruction queue level Therefore in this work we start withthe LFENCE and propose new fences that come with fewerrestrictions different enforcement policies andor sport ad-ditional optimizations

512 Fence Enforcement PoliciesIt is important to note that none of the existing instructionsthat provide fence support were actually created for the pur-pose we (or the whole industry) need for Spectre mitigationThus this work will examine existing fences variants ofexisting fences and also introduce a new fence primitive Tobetter understand the design landscape we examine severalpossible properties of fences in this section

Early vs Late Enforcement We first categorize fencesinto early-enforced and late-enforced fences based on thepipeline stage at which they are enforced In particular werefer to any serializing instruction such as an LFENCE thatis enforced at the instruction queue or earlier in the pipelineas early-enforced If the fence is enforced at a later stage suchas the reservation station the loadstore queue or the cachecontroller we refer to it as a late-enforced fence Late enforce-ment in essence shifts the fence enforcement point towardsthe leaking structure (eg the cache) reducing the impacton instructions that do not access that structure and allow-ing for the enforcement of more fine-grained serializationrules (eg allow cache hits but not misses) Figure 4 showspotential fence enforcement points at various stages in theprocessor pipelineIt is important to note that the later a fence is enforced

the fewer the side channels it protects against For exampleIntelrsquos LFENCE prevents information leakage through all mi-croarchitectural structures that appear after the instructionqueue However to prevent information leakage through theinstruction cache side channel we would have to resort toa regular serializing instruction or an MFENCE resultingin prohibitively high performance overheads Similarly afence enforced at the data cache controller level would onlymitigate data cache-based side channels and will not protectagainst an FPU-based side-channel Instead the FPU-basedside-channel may be mitigated using a different fence thatis appropriately configured and enforced at the reservationstation or the FPU

Strict vs Relaxed Enforcement Depending upon howprohibitive they are we next classify fences into strict andrelaxed Strict fences are highly prohibitive and do not allow

Table 3 Characteristics of Different Fence Types

Fence Name EnforcementPoint

StrictRelaxed

Instructionsnot Allowed

MitigatesVariants

ExistingNew

Intelrsquos SIs (CPUID) Fetch Strict All All ExistingLFENCE IQ Strict All All Existing

LSQ-LFENCE LSQ Relaxed Ld v1 NewLSQ-MFENCE LSQ Relaxed LdampSt v1v11v12 New

CFENCE CC Relaxed None v1 New CC Cache Controller RS Reservation Station

any instructions to pass through them until the fence retireswhereas relaxed fences allow certain types of instructionsto pass through them For example an SFENCE that is en-forced at the loadstore queue and allows all instructions topass except stores that have a greater sequence number thanitself is a late-enforced and relaxed fence On the other handall x86 serializing instructions including LFENCE are early-enforced and strict Customizing the micro-op stream withearly-enforced and strict fences typically results in slower ex-ecution when compared to customization with late-enforcedand relaxed fences However with carefully enforced con-straints the late-enforced and relaxed fences could offer sim-ilar if not better security guarantees

Early vs Late Commit A fence typically remains ef-fective until it gets committed or squashed Based on Intelrsquosmanual [47] a serializing instruction is only allowed to becommitted if there is no preceding outstanding store thatis waiting to be written back While this behavior might benecessary for device synchronization or memory orderingenforcement for the purpose of securing speculative exe-cution against Spectre attacks there is no need to wait forstores to be written back This is because write buffers arenrsquotcommitted to the cache until the store reaches retirementand therefore the fact that a store is waiting for an outstand-ing writeback request to complete is enough evidence that itdid not occur along a misspeculated path Allowing a fenceto commit early without waiting for preceding outstandingstores to write back can considerably improve performancebecause as soon as a fence gets committed a stream of in-structions can advance further in the pipeline Therefore inthis work we propose and study the effects of a late-commitversion of each fence that does not wait for stores to be writ-ten back However we note that these versions should onlybe limited to the security use case we describe and shouldnot be used for synchronization

Newly Proposed Fences Table 3 summarizes the char-acteristics of different existing and newly proposed fencesAlthough the existing set of fences defend against all vari-ants they incur prohibitively high costs on performance dueto their strict enforcement constraints To better understandthe potential for fences beyond those that already exist wepropose and evaluate three new types of fences In an attackscenario we typically insert one of these fences between abranch instruction and a load instruction that potentiallyleaks sensitive information via a cache side-channel Theproposed fences more effectively and efficiently mitigate the

high impact variant-1 attacks by preventing informationleaks along misspeculated paths through cache-based sidechannels We describe each of them in greater detail below

The LSQ-LFENCE and the LSQ-MFENCE are relaxed fencesenforced at the loadstore queue While LSQ-LFENCE fenceis in effect it does not allow any subsequent load instructionto be issued out of the loadstore queue thereby preventingthe cache state from being changed by load instructions onmisspeculated paths Thus the LSQ-LFENCE mitigates theSpectre variant-1 attack On the other hand the more restric-tive LSQ-MFENCE does not allow any subsequent memoryinstruction (both loads and stores) to be issued out of theloadstore queue until the fence commits In addition tomitigating the variant-1 attack that the LSQ-LFENCE pro-tects against the LSQ-MFENCE mitigates the variant-11and variant-12 attacks that exploit store-to-load forwardingbetween speculative loads and stores

The CFENCE is a relaxed fence and is enforced at the cachecontroller level using the following set of rules First like anyother fence it allows all preceding instructions to proceedSecond since store instructions do not commit the contentsof the write buffer until the instruction retires they are unaf-fected by the CFENCE Finally it labels any subsequent loadas a non-modifying load and allows it to pass through thefence but the load is restricted from modifying the cachestate In particular a non-modifying load that results in acache hit is allowed to read the contents of the cache but isrestricted from changing the LRU and other metadata bitsA non-modifying load that results in a cache miss is markedas uncacheable allowing the memory read request to com-plete without altering the cache state In this way we avoidupdating the cache state upon encountering a speculativeload and donrsquot leave any observable cache footprints alongmisspeculated paths thereby mitigating the variant Spectrevariant-1 attack In addition due to locality of referencesthe miss rate of a reasonable program is typically very lowand therefore using CFENCE results in considerably lowerperformance overhead than other types of fencesNote that this can be applied recursively at each cache

level For example an L2 hit (L1 miss) will bypass the L1cache and not initiate a fill then read from the L2 cachewithout altering the LRU bits

513 Fence Frequency OptimizationThe most naiumlve yet secure way to insert fences is to liberallyinstrument every instruction of a vulnerable type (eg loadinstructions in the case of cache side-channels) in the pro-gram and add the fence micro-op to all of them In fact notevery instance of a vulnerable instruction type is necessarilyvulnerable for example all the loads in the program arenrsquotvulnerable to speculative attacks via cache side-channelsTherefore we can reduce the number of fences insertedHowever since failing to insert fences even in one scenariowould enable a Spectre attacker to read the whole victimrsquos

memory space it is of crucial importance that fence fre-quency optimizations be conducted meticulously In the fol-lowing we introduce two secure optimizations for reducingthe number of fencesBasic Block-Level Fence InsertionThe source of the Spec-tre attack is dynamic control speculation which implies thatthe speculation begins with a branch prediction and theprocessor starts speculatively executing along the predictedpath To fully mitigate this attack we want a fence betweeneach branch and subsequent loads but if one branch is fol-lowed by four loads we only need one fence to protect allfour Thus in this optimization we propose to only instru-ment (provide the alternate decoding for) the first instanceof a vulnerable instruction type (eg first load) of each ba-sic block and it is safe to leave the rest of the instructionsuninstrumented This is simply implemented by setting aflag in hardware whenever a branch is decoded then inserta micro-op in the alternative load decoding that along withthe fence also resets the flag When the flag is not set theoriginal decoding is usedTaint-Based Fence Insertion However even one load perbasic block is likely still too conservative In all known in-stances of the attacks the attacker performs some operations(mostly memory read) based on untrusted data that leadsto the information leak For example in Spectre variant-1the attacker provides an untrusted out-of-bound index to anarray In this work we assume any information that comesfrom the user address space and input devices (eg via thex86 IN instruction) as untrusted In addition we also con-sider DMArsquod pages as untrusted for which we rely on theIOMMU [5] to mark DMArsquod pages as tainted in the pagetable For the taint-based fence insertion optimization wepropose the insertion of fences for only vulnerabletaintedloads that operate on untrusted data by leveraging a noveldecoder-level information flow tracking (DLIFT) strategydescribed in the following section

52 Decoder-Level Information Flow TrackingAs its name indicates the distinguishing feature that makesDecoder-Level Information Flow Tracking (DLIFT) uniquefrom typical hardware taint tracking systems is that it canprovide the taint information at the decoder stage of thepipeline rather than at commit stage This is a critical distinc-tion for two reasons First of course our solution is decoder-based Second and perhaps more important execute- orcommit-based taint tracking comes too late in the pipelinefor any speculation-based attack and is of little useHowever commit is still the only stage where taint in-

formation is guaranteed to be correct so the design of adecode-based taint tracker is challenging In particular sincethe front-end of the pipeline typically runs far ahead in ex-ecution than the rest of the pipeline reading actual taintsfrom register files at the decode stage will read inaccuratevalues Therefore in the proposed DLIFT framework we

Macro-to-microop Translation

Unit

CSD-Enabled Decoder

Dis

patc

her

FetchTaint

Evaluator

Load Store Queue

2

Reg File

Actu

al T

aint

Bits

3

4

5

TLB

Tain

t Bitm

ap1

ADD rax rbx

raxSpecTaint |= rbxSpecTaint

Mistaint Recovery -Flushing the pipeline -Restarting translation and execution -Recovering speculative taint map

EA T

aint

CommitUnit

FetchDecode Execute

Speculative Taint Map

~Spec_Taint ampamp

Evaluated_Taint

True

Commit

Arch TaintMap

Arch RF

Figure 5 DLIFT integration with a CSD-enabled pipeline

separate the taint information into four taint structures ndash (a)a decoder-level taint map that tracks and maintains the taintinformation for architectural registers (b) the physical reg-ister file augmented with taint information that maintainsdynamically computed taint information at execute (c) theTLB and page tables augmented with a taint bitmap to trackcache block-level taint information and (d) a commit-leveltaint map that maintains verified architectural register taintinformation While the latter three structures are typical ofany standard DIFT implementation [79] the first structureis a new addition proposed by this work Instruction trans-lation thus depends on some speculative taint tracking thatmight not be exact potentially leaving vulnerable instruc-tions unfenced (ie translating to non-secure micro-ops) Forthis reason DLIFT relies on a mistaint recovery mechanismthat upon detecting a mistaint at the execution stage redi-rects and restarts the execution from the incorrectly taintedinstructionFigure 5 shows the integration of DLIFT with a CSD-

enabled pipeline The DLIFT engine maintains a taint mapthat stores a speculative taint bit for each architectural reg-ister For each incoming instruction (bull1 ) the DLIFT engineevaluates the taint of the destination register(s) based on thespeculative taint bits of the source registers Taint evaluationfollows the standard DIFT procedure [79] If a source regis-ter is marked as tainted in the speculative taint map DLIFTpropagates this taint to the destination register and furthertriggers context-sensitive translation (bull2 ) of the instruction(eg with speculative fence insertion) depending upon theconfigured levels of performance and security

Tracking taints at the decoder level is not always straight-forward First during speculative execution the DLIFT en-gine continues to propagate taints along the misspeculatedpath potentially leaving the taint map in an inconsistentstate upon recovery from a branch misprediction To thisend the DLIFT engine rolls back the state of the taint map

to its commit-level counterpart as part of the branch mis-prediction recovery Second the effective addresses of loadinstructions is usually not known at the decoder stage dueto which the DLIFT engine cannot accurately compute andpropagate taint information A conservative approach thatmarks all loads with unknown effective addresses as taintedcould severely degrade performance since overtainting typi-cally results in a high frequency of fence insertion Thereforewe take a more optimistic approach by assuming loads withunknown effective addresses are untainted and further relyon our mistaint detection mechanism at the execute stageto validate the predicted taints against the dynamically eval-uated taints (bull3 andbull5 ) In the event a mistaint is detectedwe update the speculative taint map flush the pipeline andrestart the translation and execution of the mistainted in-struction Note that we only perform mistaint recovery forthe undertainting scenario since it could potentially leavevulnerable instructions unfenced

Furthermore the DLIFT engine is capable of following in-struction sequences and execution patterns such as a taintedload followed by a branch allowing for the seamless de-tection of different Spectre gadget variants In this way al-though speculative the DLIFT engine has the potential forfast recovery from misspeculation and further allows themicrocode customization framework to perform a more tar-geted and surgical insertion of fences for particular vulnera-ble targets

53 Mitigations for Control-flow MistrainingTwo Spectre variants (v2 and v5) rely on the mistrainingof the branch predictor and the return stack buffer to influ-ence the victimrsquos branch outcome across different protec-tion domains To mitigate these attacks we examine severalhardening mechanisms In the simplest case where we areprotecting kernel branches from getting influenced we caninstrument syscall instructions to enforce branch predictorstate isolation If we want to protect a wider range of domaincrossings we could rely on a simple hardware mechanismthat identifies control transfer to a domain with a higherprivilege and then sets a flag The first control flow instruc-tion that decodes after that flag is set would then triggeran alternate decoding and reset the flag The alternate de-coding would insert a fence (to protect against this branchbeing mispredicted) then execute micro-ops that enforcebranch predictor state isolation and subsequently resumethe original control flowThis solution then is flexible enough that any region of

code could be protected in this way by configuringmodel spe-cific range registers (MSRRs) to indicate a protected regionthat would always reset the branch predictor upon entrythis then even prevents mistraining of sensitive code ofdifferent threads within the same process To prevent theMSRRs from being tampered by the attacker we only allow

Table 4 Architecture Detail for the Baseline x86 CoreBaseline Processor

Frequency 33 GHz I cache 32 KB 8 wayFetch width 4 fused uops D cache 32 KB 8 wayIssue width 6 unfused uops ROB size 168 entriesINTFP Regfile 160144 regs IQ 54 entriesRAS size 816 32 entries BTB size 256 512 1024 entriesLQSQ size 6436 entries Functional Int ALU(6) Mult(1)Branch Predictor LTAGE Units FP ALUMult(2) SIMD(2)

the operating system to configure MSRRs and we use specu-lative fences to guard x86rsquosWRMSR instruction Furthermorein the case of remapping via mmap() the operating systemis also responsible for reconfiguring MSRRs

The easiest and the most heavyweight way to isolate theBP state is to clear all the states by a series of returnbranchmicro-ops However such a hard reset of the BP can be morequickly and efficiently performed using a special micro-opthat clears some or all state of the branch predictor therebyenabling custom solutions to reset different structures withinthe prediction unit if such a micro-op is available In thiswork we assume a special micro-op to clear the branch targetbuffer (BTB) and the return stack buffer (RSB) the primarytargets of the Spectre attackIt should be noted that while resetting the BTB and the

RSB will induce more mispredictions (something the attackerwants) it does the attacker no good with respect to theseSpectre variants because the attacker can no longer controlthe target of the control flow misspeculation

6 MethodologyThis section details the experimental methodology for theperformance and security evaluation of the context-sensitivefencing frameworkOur baseline processor is modeled after the Intel Sandy-

bridge microarchitecture [43] Table 4 shows the architec-tural configuration of our baseline processor in more detailNote that we evaluate against three different branch pre-dictor configurations (with different BTB and RAS sizes) tomeasure the performance impact of our micro-op flows thatperform branch predictor state isolation across protection do-mains to defend against variant-2Wemodel this architectureusing the gem5 [8] architectural simulator which already fea-tures x86 micro-op translation Furthermore gem5 alreadyfeatures a Spectre test infrastructure and a visualization toolthat allow us to validate our claims regarding the securityguarantees of context-sensitive fencing [63]One of the primary goals of this work is to protect ker-

nel memory from being leaked along misspeculated pathsTherefore we use the full system simulation mode of gem5which allows us to boot an Ubuntu 1804 distribution ofLinux with a kernel version of 4813 Furthermore in orderto provide realistic estimates regarding attack coverage andperformance impact of our proposed techniques we selecta good mix of benchmarks that spend different amounts ofexecution time in kernel mode touch different aspects of the

Table 5 Benchmarks DescriptionBenchmark Description Kernel-Time

nginx HTTP Web Server 66ps Process information query 75ping Sends ICMP ECHO_REQUEST to loopback 95ls Performs two level directory listing 79llu Linked list traversal micro-benchmark 7

bzip2 Compression 34gcc C Language optimizing compiler 11

omnetpp Discrete event simulation 15

sjeng Artificial Intelligence (game tree searchand pattern recognition) 22

operating system and perform different number of systemcalls during execution as illustrated in Table 5 To this endwe include four benchmarks bzip2 gcc sjeng omnet fromthe SPEC CPU2006 suite [41] that exhibit varying degreesof instruction-level parallelism Furthermore we use threecommonly used Unix tools that target different functional-ities of the kernel and use different device drivers ndash (ping)that sends and receives ICMP packets (ps aux) that queriesprocess information and (ls ) that queries the filesystemin order to list multi-level directory information In additionwe also add a memory-intensive program llu that allocatesa large amount of memory for a linked list and traversesthe list making random memory accesses [103] Finally wealso evaluate the impact of our fences on the nginx webserver [74] using the wrk [35] framework to generate HTTPrequests

7 EvaluationIn this section we evaluate both security and performanceimpacts of the proposed strategy starting first with securityassessment and following that with performance evaluation

71 Security DiscussionThe proposed defense strategy can mitigate five variants ofthe Spectre attacks We discuss these in detail below

Variants 1 11 and 12 The goal of these variants is toleak memory contents of a victim along a misspseculatedpath by bypassing a bounds check andor further stitch to-gether such Spectre gadgets via store-to-load forwardingAll variants exploit a data cache-based side channel to leakinformation Context-Sensitive Fencing mitigates these vari-ants primarily by preventing information from being leakedalong misspeculated paths via fences that are surgically in-serted into the instruction stream and strategically placedbetween the conditional branch that performs the boundscheck and the first load in the Spectre gadget This workproposes three new fences ndash LSQ-LFENCE LSQ-MFENCEand CFENCE that each defend against the variant-1 attack bydisallowing loads along misspeculated paths to modify thedata cache state LSQ-MFENCE additionally protects againstthe variants 11 and 12 by preventing both speculative loadsand stores from being issued out of the loadstore queue

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

Injected for All Kernel Loads

100

105

110

115

120

125

130

135

140

145

150

155

160

165

170

175

180

185

190

195

200

Nor

mal

ized

Exe

cution

Tim

e

LFENCE-LC

LSQ-MFENCE-LC

CFENCE-LC

Figure 6 Execution Time of Different Fence EnforcementLevels (normalized to insecure execution)

effectively avoiding store-to-load forwarding between spec-ulative loads and stores

The surgical insertion of these fences is facilitated by thedecoder-level information flow tracker (DLIFT) that can fol-low instruction sequences that could serve as Spectre gad-gets as described in Section 52 While the DLIFT itself per-forms speculative tracking of taint information it has theability to detect mistaints and recover within three stages ofthe pipeline We further experimentally show later in thissection that the speculative nature of the DLIFT engine maysometimes result in overtainting (where trusted operand-sinstructions get marked as tainted) but never results in anundetected and unrecovered undertainting (where untrustedoperandsinstructions remain untainted)

Variants 2 and 5 These variants reverse-engineer andmistrain the branch predictor and the return address stack toinfluence the branch outcomes of a victim which in our caseis the Linux kernel In addition to the surgical fence injec-tion and speculative taint tracking context-sensitive fencingfurther employs mistraining mitigations that circumventsuch attempts More specifically we intercept all protectiondomain crossings via syscall and int 0x80 instructions andperform a full reset of the branch target buffer (BTB) and thereturn stack buffer (RSB) clearing all previous state Thisprevents user code from influencing branch outcomes of thekernel code We record no instance of BTB collision betweenuser and kernel branches and we ensure that we alwaysstart with a clean BTB and RSB upon entering kernel mode

To show that our different fences and frequency optimiza-tions close Spectre attacks we use a proof of concept Spectreimplementation and visualization tool [63] The attacks failin every case when we use context-sensitive fencing to insertfences despite our performance optimizations

72 PerformanceFigure 6 measures the performance impact of three differentfences ndash (a) the standard x86 LFENCE (b) the LSQ-MFENCE

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

Injected for All Kernel Loads

100

105

110

115

120

125

130

135

140

145

Nor

mal

ized

Exe

cution

Tim

e

CFENCE-LC

CFENCE-EC

Figure 7 Execution Time of Early and Late Commit ofCFENCE (normalized to insecure execution)

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

00

01

02

03

04

05

06

07

08

09

10

Rat

ioO

ver

All

Loa

ds

bzip2 gcc omnet llu nginx sjeng ls ping ps

Non-Modifying Misses Non-Modifying Hits Misses Hits

Figure 8 Effects of injecting CFENCE on Cache Miss Rate

and (c) the CFENCE all enforced with the standard latecommit pessimistically inserted for every kernel load in theprogram Clearly from the figure the CFENCE incurs thelowest performance overhead since it is less restrictive thanthe other two and is enforced at a much later pipeline stageWe further study the effect of late and early commit policiesfor the CFENCE as shown in Figure 7 The early commitversion of the CFENCE consistently performs better than thelate commit version saving about 4 in overall executiontime on averageOverall the CFENCE reduces the incurred performance

overhead due to fencing by 23X bringing down the exe-cution time overhead from 48 to 21 Furthermore thisperformance improvement is consistent across all the bench-marks the only exception being llu which performs randommemory accesses due to the linked list traversal and suffersfrom a very high cache miss rateIn Figure 8 we study the effect of the CFENCE on cache

miss rate Recall that when a load passes a CFENCE it getsmarked as non-modifying and as a result if it ends up beinga miss in the cache it is deemed uncacheable and therefore

the fetched cache block isnrsquot brought into cache If the missrate of a program is already high as in the case of llu thepresence of a CFENCE in the instruction stream could po-tentially result in under-utilization of the cache since themissed blocks arenrsquot being filled back into the cache whilethe CFENCE is begin enforced For programs that have un-usually low hit rates we suggest using the standard LFENCEinstead of the CFENCE However in those cases it should benoted that the overhead of the mitigation is low regardlessof the fence usedFurthermore it is important to note that the number of

non-modifying loads in the dynamic instruction stream donot necessarily have a negative impact on performancewhile a CFENCE is being enforced For example while theping program has the most non-modifying loads it doesnot suffer much in terms of overall performance becausethe working set fits well into the cache and most of thenon-modifying accesses end up being hits In general wegain more from employing the CFENCE instead of standardserializing fences when the cache hit rate is lowFigure 9 shows the performance of our proposed fence

frequency optimization techniques In the first optimizationwe inject the CFENCE only for tainted loads and branches asindicated by the DLIFT engine This reduces the performanceoverhead of the defense from 21 to 11 on average We fur-ther optimize the number of fences inserted by performingbasic block-level fence insertion where we only instrumentthe first instance of a vulnerable load in each basic block Thisresults in an additional 4 improvement in performance AsFigure 9 shows the once per basic block opitimization is suc-cessful at improving the performance of all the benchmarksexcept llu That is because CFENCE changes the cache accesspattern usually at some cost but occasionally beneficiallyby bypassing accesses and reducing pressure on the cacheIn such a case like llu with its large working set executingfewer fences would then be slightly less beneficial Overallcompared to the state-of-the-art fence injection scheme thatpessimistically injects an LFENCE for all kernel loads ourDLIFT-Based CFENCE injection reduces the performanceoverhead from 48 to just 77 on averageWe next study the accuracy and coverage of our DLIFT

implementation Figure 10 examines overtainting scenarioswhere the DLIFT engine conservatively marks instructionsas tainted when theyrsquore actually not The results are nor-malized to the total number of dynamic load instructionsexecuted Further we examine two scenarios ndash the bars tothe left represent DLIFT-based fencing without the basic-block level fence insertion and the bars to the right includesthe basic-block level fence insertion optimization In mostcases the percentage of overtainted loads remains low inboth scenariosFigure 11 examines undertainting scenarios where the

DLIFT engine optimistically forgoes instrumenting certaininstructions that are actually vulnerable but recovers from

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

CFENCE

100

105

110

115

120

125

130

135

140

145

150

155

160

Nor

mal

ized

Exe

cution

Tim

e

Always Fencing

DLIFT-Based Fencing

DLIFT + Once per BB

Figure 9 Impact of Fence Frequency Optimizations on Exe-cution Time

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

00

01

02

03

04

05

06

07

08

09

10

Rat

ioO

ver

All

Loa

ds

ping ps nginx bzip2 ls sjeng llu omnet gcc

Over-tainted Loads Tainted Loads

Figure 10 Accuracy of DLIFT Overtainting Rate of Loads

such scenarios later in the pipeline We observe that thepercentage of undertainted loads is low and consistentlybelow 10 with the outliers being the kernel-heavy applica-tions ping and ps which have the highest number of taintedinstructions (instructions with at least one tainted sourceregister) as shown in Figure 12Note that our DLIFT engine performs explicit informa-

tion flow tracking modeled after DIFT [79] that tracks copyload-address store-address and computational dependen-cies However we also evaluate CSF with a more conser-vative implicit information flow tracking model in whichwe taint the program counter when the branch outcomedepends upon a tainted value and further track both sidesof the branch This results in more tainted instructions andincurs 118 extra performance overhead on top of our taint-based CFENCE insertionTo evaluate the effect of our mistraining mitigations we

measure the impact of a full micro-op based BTB and RSBreset at every protection domain crossing We do this ex-periment on three separate branch predictor configurations(a) a small predictor with 256 BTB and 8 RAS entries (b) a

pssje

ngbz

ip2

ping llu gc

c ls

nginx

omne

t

Speculative DIFT ndash Sfence

000

001

002

003

004

005

006

007

008

009

Rat

ioof

Under

-tai

nte

dLoa

ds

Figure 11 Coverage of DLIFT Undertainting Rate of Loads

omnet bzip2 llu ls nginx ping ps sjeng gcc

Speculative DIFT ndash Sfence

00

01

02

03

04

05

06

07

08

09

10

Tai

nte

dIn

tstr

uct

ions

Tainted Untatinted

Figure 12 Ratio of instructions marked as tainted by DLIFT

medium predictor with 512 BTB and 16 RAS entries and(c) a large predictor with 1024 BTB and 32 RAS entries Wemeasure an average performance degradation of 27 on oursmall predictor 66 on our medium predictor and 152 onour large predictor Naturally the overhead is almost com-pletely due to the loss of prediction accuracy rather thanthe cost of micro-op expansionIn summary the proposed defense strategy introduces a

flexible microcode customization framework that performthe surgical insertion of newly introduced speculation fencesthat mitigate five variants of the Spectre class of attacksreducing the fencing overhead of state-of-the-art fence-basedSpectre mitigations by a factor of 6

8 ConclusionIn this work we propose context-sensitive fencing (CSF) aset of architectural techniques that provide high-performancedefense against Spectre-style attacks In particular we showthat we can reduce fencing overhead by a factor of 6 com-pared to a conservative fence insertion method

This is done by injecting fence instructions dynamicallyin the decoder with no recompilation and binary translationrequired This allows us to employ runtime information tostrategically insert fences when needed using taint infor-mation and avoiding redundant fences after branches Thiswork also introduces new fence primitives which protectsensitive structures from speculation-based microarchitec-tural effects with minimal impact on instruction throughputin the pipeline

AcknowledgmentsThe authors would like to thank the anonymous reviewersfor their helpful insights This research was supported inpart by NSF Grant CNS-1652925 NSFIntel FoundationalMicroarchitecture Research Grant CCF-1823444 and DARPAunder the agreement number HR0011-18-C-0020

References[1] Onur Acıiccedilmez Ccediletin Kaya Koccedil and Jean-Pierre Seifert 2007 Predict-

ing secret keys via branch prediction In CryptographersacircĂŹ Track atthe RSA Conference

[2] Onur Acıiccedilmez Werner Schindler and Ccediletin K Koccedil 2007 Cachebased remote timing attack on the AES In CryptographersacircĂŹ Trackat the RSA Conference

[3] Shaizeen Aga and Satish Narayanasamy 2017 InvisiMem SmartMemory Defenses for Memory Bus Side Channel In Proceedings ofthe 44th Annual International Symposium on Computer Architecture(ISCA rsquo17) ACM New York NY USA 94ndash106 httpsdoiorg10114530798563080232

[4] AMD 2018 White paper SOFTWARE TECHNIQUES FOR MANAG-ING SPECULATION ON AMD PROCESSORS Technical Report RE-VISION 12418 httpsdeveloperamdcomwp-contentresourcesManaging-Speculation-on-AMD-Processorspdf

[5] Nadav Amit Muli Ben-Yehuda Dan Tsafrir and Assaf Schuster 2011vIOMMU Efficient IOMMU Emulation In USENIX Annual TechnicalConference

[6] ARM 2018 Whitepaper Cache Speculation Side-channelsTechnical Report Version 22 httpsdeveloperarmcom-mediadeveloperpdfSecurity20update201020July2018CacheSpeculationSide-channels-v22pdfrevision=7de26366-a49f-4c23-85f0-34e8f5e38881

[7] Eli Biham and Adi Shamir 1997 Differential fault analysis of secretkey cryptosystems In Annual International Cryptology Conference

[8] Nathan L Binkert Ronald G Dreslinski Lisa R Hsu Kevin T Lim Ali GSaidi and Steven K Reinhardt 2006 The M5 Simulator ModelingNetworked Systems Micro IEEE (2006)

[9] Darrell Boggs Gary Brown Nathan Tuck and K S Venkatraman2015 Denver Nvidiarsquos First 64-bit ARM Processor IEEE Micro (2015)

[10] Chandler Carruth 2018 Mitigating Speculative Attacks inCrypto httpsgithubcomHACS-workshopspectre-mitigationsblobmastercryptoguidelinesmd Online accessed Jul 2018

[11] Chandler Carruth 2018 RFC Speculative Load Hardening (aSpectre variant1 mitigation) httpsdocsgooglecomdocumentd1wwcfv3UV9ZnZVcGiGuoITT61eKo3TmoCS3uXLcJR0edit Onlineaccessed Jul 2018

[12] Chandler Carruth 2018 RFC Speculative Load Hardening (a Spectrevariant1 mitigation) httpslistsllvmorgpipermailllvm-dev2018-March122085html Online accessed Jul 2018

[13] Guoxing Chen Sanchuan Chen Yuan Xiao Yinqian Zhang ZhiqiangLin and Ten H Lai 2018 SgxPectre Attacks Leaking Enclave

Secrets via Speculative Execution CoRR abs180209085 (2018)arXiv180209085 httparxivorgabs180209085

[14] H Chen X Wu L Yuan B Zang P c Yew and F T Chong 2008From Speculation to Security Practical and Efficient InformationFlow Tracking Using Speculative Hardware In 2008 InternationalSymposium on Computer Architecture 401ndash412 httpsdoiorg101109ISCA200818

[15] Shimin Chen Phillip B GibbonsMichael Kozuch and Todd CMowry2011 Log-based Architectures Using Multicore to Help SoftwareBehave Correctly SIGOPS Oper Syst Rev 45 1 (Feb 2011) 84ndash91httpsdoiorg10114519450231945034

[16] Hari Cherupalli Henry Duwe Weidong Ye Rakesh Kumar and JohnSartori 2017 Software-based Gate-level Information Flow Securityfor IoT Systems In Proceedings of the 50th Annual IEEEACM Interna-tional Symposium on Microarchitecture (MICRO-50 rsquo17) ACM NewYork NY USA 328ndash340 httpsdoiorg10114531239393123955

[17] JonathanCorbet 2017 The current state of kernel page-table isolationhttpslwnnetArticles741878 Online accessed Jan 2018

[18] Marc L Corliss E Christopher Lewis and Amir Roth 2003 DISE Aprogrammable macro engine for customizing applications In Com-puter Architecture 2003 Proceedings 30th Annual International Sym-posium on

[19] Marc L Corliss E Christopher Lewis and Amir Roth 2005 Low-overhead interactive debugging via dynamic instrumentation withDISE In 11th International Symposium on High-Performance ComputerArchitecture

[20] Marc L Corliss E Christopher Lewis and Amir Roth 2005 UsingDISE to Protect Return Addresses from Attack SIGARCH ComputArchit News (March 2005)

[21] Jedidiah R Crandall and Frederic T Chong 2004 Minos Control DataAttack Prevention Orthogonal toMemoryModel In Proceedings of the37th Annual IEEEACM International Symposium on Microarchitecture(MICRO 37) IEEE Computer Society Washington DC USA 221ndash232httpsdoiorg101109MICRO200426

[22] John Criswell Nathan Dautenhahn and Vikram Adve 2014 VirtualGhost Protecting Applications from Hostile Operating Systems InProceedings of the 19th International Conference on Architectural Sup-port for Programming Languages and Operating Systems (ASPLOS rsquo14)ACM New York NY USA 81ndash96 httpsdoiorg10114525419402541986

[23] John Criswell Andrew Lenharth Dinakar Dhurjati and Vikram Adve2007 Secure Virtual Architecture A Safe Execution Environmentfor Commodity Operating Systems In Proceedings of Twenty-firstACM SIGOPS Symposium on Operating Systems Principles (SOSP rsquo07)ACM New York NY USA 351ndash366 httpsdoiorg10114512942611294295

[24] Michael Dalton Hari Kannan and Christos Kozyrakis 2007 RakshaA Flexible Information Flow Architecture for Software Security InProceedings of the 34th Annual International Symposium on ComputerArchitecture (ISCA rsquo07) ACM New York NY USA 482ndash493 httpsdoiorg10114512506621250722

[25] James C Dehnert Brian K Grant John P Banning Richard JohnsonThomas Kistler Alexander Klaiber and Jim Mattson 2003 The Trans-meta Code Morphing Software using speculation recovery andadaptive retranslation to address real-life challenges In Proceedingsof the international symposium on Code generation and optimizationfeedback-directed and runtime optimization

[26] Matthew DeVuyst Ashish Venkat and Dean M Tullsen 2012 Ex-ecution migration in a heterogeneous-isa chip multiprocessor InProceedings of the Seventeenth International Conference on Architec-tural Support for Programming Languages and Operating Systems

[27] Craig Disselkoen David Kohlbrenner Leo Porter and Dean Tullsen2017 Prime+Abort A Timer-Free High-Precision L3 CacheAttack using Intel TSX In 26th USENIX Security Symposium

(USENIX Security 17) USENIX Association Vancouver BC 51ndash67 httpswwwusenixorgconferenceusenixsecurity17technical-sessionspresentationdisselkoen

[28] Xiaowan Dong Zhuojia Shen John Criswell Alan Cox and SandhyaDwarkadas 2018 Spectres Virtual Ghosts and Hardware SupportIn Proceedings of the 7th International Workshop on Hardware andArchitectural Support for Security and Privacy (HASP rsquo18) ACM NewYork NY USA Article 5 9 pages httpsdoiorg10114532142923214297

[29] Xiaowan Dong Zhuojia Shen John Criswell Alan L Cox and Sand-hya Dwarkadas 2018 Shielding Software From Privileged Side-Channel Attacks In 27th USENIX Security Symposium (USENIX Se-curity 18) USENIX Association Baltimore MD 1441ndash1458 httpswwwusenixorgconferenceusenixsecurity18presentationdong

[30] Dmitry Evtyushkin Ryan Riley Nael CSE Abu-Ghazaleh ECE andDmitry Ponomarev 2018 BranchScope A New Side-Channel Attackon Directional Branch Predictor In Proceedings of the Twenty-ThirdInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems (ASPLOS rsquo18) ACM New York NYUSA 693ndash707 httpsdoiorg10114531731623173204

[31] Karine Gandolfi Christophe Mourtel and Francis Olivier 2001 Elec-tromagnetic analysis Concrete results In International Workshop onCryptographic Hardware and Embedded Systems

[32] Qian Ge Yuval Yarom David Cock and Gernot Heiser 2018 Asurvey of microarchitectural timing attacks and countermeasures oncontemporary hardware Journal of Cryptographic Engineering 8 1(2018) 1ndash27

[33] Daniel Genkin Adi Shamir and Eran Tromer 2014 RSA key ex-traction via low-bandwidth acoustic cryptanalysis In InternationalCryptology Conference

[34] Daniel Genkin Adi Shamir and Eran Tromer 2016 Acoustic Crypt-analysis Journal of Cryptology (2016)

[35] Github [n d] wrk - a HTTP benchmarking tool httpsgithubcomwgwrk Online accessed Jun 2018

[36] Daniel Gruss Moritz Lipp Michael Schwarz Richard Fellner Cleacute-mentine Maurice and Stefan Mangard 2017 Kaslr is dead long livekaslr In International Symposium on Engineering Secure Software andSystems Springer 161ndash176

[37] Daniel Gruss Raphael Spreitzer and Stefan Mangard 2015Cache Template Attacks Automating Attacks on InclusiveLast-Level Caches In 24th USENIX Security Symposium (USENIXSecurity 15) USENIX Association Washington DC 897ndash912httpswwwusenixorgconferenceusenixsecurity15technical-sessionspresentationgruss

[38] Maurice CleacutementineWagner Klaus Gruss Daniel and StefanMangard2016 Flush+Flush A Fast and Stealthy Cache Attack

[39] V Haldar D Chandra andM Franz 2005 Dynamic taint propagationfor Java In 21st Annual Computer Security Applications Conference(ACSACrsquo05) 9 ppndash311 httpsdoiorg101109CSAC200521

[40] W Halfond A Orso and P Manolios 2008 WASP Protecting WebApplications Using Positive Tainting and Syntax-Aware EvaluationIEEE Transactions on Software Engineering 34 1 (Jan 2008) 65ndash81httpsdoiorg101109TSE200770748

[41] John L Henning 2006 SPEC CPU2006 Benchmark DescriptionsSIGARCH Comput Archit News (Sept 2006)

[42] Jonathan J Hoch and Adi Shamir 2004 Fault analysis of streamciphers In International Workshop on Cryptographic Hardware andEmbedded Systems

[43] Intel 2008 2nd Generation Intel Core vPro Processor Family Tech-nical Report httpwwwintelcomcontentdamdocwhite-paperperformance-2nd-generation-core-vpro-family-paperpdf

[44] Intel 2014 Software guard extensions programming reference (2014)httpssoftwareintelcomsitesdefaultfiles329298-001pdf

[45] Intel 2018 White paper Intel Analysis of Speculative Exe-cution Side Channels Technical Report 336983-001 Revision10 httpsnewsroomintelcomwp-contentuploadssites11201801Intel-Analysis-of-Speculative-Execution-Side-Channelspdf

[46] Intel 2018 White paper Retpoline A Branch Target In-jection Mitigation Technical Report 337131-003 Revision003 httpssoftwareintelcomsitesdefaultfilesmanaged1d46Retpoline-A-Branch-Target-Injection-Mitigationpdf

[47] Intel Corporation 2009 Intelreg 64 and IA-32 Architectures OptimizationReference Manual

[48] Intel Corporation 2011 Intel 64 and IA-32 Architectures SoftwareDeveloperrsquos Manual Intel Corporation

[49] Project Zero Jann Horn 2018 Reading privileged memory with a side-channel httpsgoogleprojectzeroblogspotcom201801reading-privileged-memory-with-sidehtml Online accessed Jan 2018

[50] Morteza Mohajjel Kafshdooz Mohammadkazem Taram Sepehr As-sadi and Alireza Ejlali 2016 A Compile-Time Optimization MethodforWCET Reduction in Real-Time Embedded Systems Through BlockFormation ACMTrans Archit Code Optim 12 4 Article 66 (Jan 2016)25 pages httpsdoiorg1011452845083

[51] R Karri K Wu P Mishra and Yongkook Kim 2002 Concurrenterror detection schemes for fault-based side-channel cryptanalysisof symmetric block ciphers IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems 21 12 (Dec 2002) 1509ndash1517httpsdoiorg101109TCAD2002804378

[52] Mehmet Kayaalp Khaled N Khasawneh Hodjat Asghari EsfedenJesse Elwell Nael Abu-Ghazaleh Dmitry Ponomarev and AamerJaleel 2017 RIC Relaxed Inclusion Caches for Mitigating LLC Side-Channel Attacks In Proceedings of the 54th Annual Design AutomationConference 2017 (DAC rsquo17) ACM New York NY USA Article 76 pages httpsdoiorg10114530616393062313

[53] Khaled N Khasawneh Esmaeil Mohammadian Koruyeh ChengyuSong Dmitry Evtyushkin Dmitry Ponomarev and Nael Abu-Ghazaleh 2018 SafeSpec Banishing the Spectre of a Meltdown withLeakage-Free Speculation arXiv preprint arXiv180605179 (2018)

[54] Vladimir Kiriansky and Carl Waldspurger 2018 Speculative bufferoverflows Attacks and defenses arXiv preprint arXiv180703757(2018)

[55] Vladimir Kiriansky and Carl Waldspurger 2018 Speculative BufferOverflows Attacks and Defenses arXiv preprint arXiv180703757(2018)

[56] Paul Kocher 2018 Spectre Mitigations in MicrosoftrsquosCC++ Compiler httpswwwpaulkochercomdocMicrosoftCompilerSpectreMitigationhtml Online accessedJul 2018

[57] Paul Kocher Daniel Genkin Daniel Gruss Werner Haas MikeHamburg Moritz Lipp Stefan Mangard Thomas Prescher MichaelSchwarz and Yuval Yarom 2018 Spectre Attacks Exploiting Specu-lative Execution ArXiv e-prints (Jan 2018) arXiv180101203

[58] Esmaeil Mohammadian Koruyeh Khaled N Khasawneh ChengyuSong and Nael Abu-Ghazaleh 2018 Spectre Returns SpeculationAttacks using the Return Stack Buffer In 12th USENIXWorkshop on Of-fensive Technologies (WOOT 18) USENIX Association Baltimore MDhttpswwwusenixorgconferencewoot18presentationkoruyeh

[59] Esmaeil Mohammadian Koruyeh Khaled N Khasawneh ChengyuSong and Nael Abu-Ghazaleh 2018 Spectre Returns SpeculationAttacks using the Return Stack Buffer In 12th USENIXWorkshop on Of-fensive Technologies (WOOT 18) USENIX Association Baltimore MDhttpswwwusenixorgconferencewoot18presentationkoruyeh

[60] Monica S Lam Michael Martin Benjamin Livshits and John Wha-ley 2008 Securing Web Applications with Static and DynamicInformation Flow Tracking In Proceedings of the 2008 ACM SIG-PLAN Symposium on Partial Evaluation and Semantics-based Pro-gram Manipulation (PEPM rsquo08) ACM New York NY USA 3ndash12

httpsdoiorg10114513284081328410[61] Moritz Lipp Michael Schwarz Daniel Gruss Thomas Prescher

Werner Haas Anders Fogh Jann Horn Stefan Mangard Paul KocherDaniel Genkin Yuval Yarom and Mike Hamburg 2018 MeltdownReading Kernel Memory from User Space In 27th USENIX SecuritySymposium (USENIX Security 18)

[62] Fangfei Liu Yuval Yarom Qian Ge Gernot Heiser and Ruby BLee 2015 Last-Level Cache Side-Channel Attacks Are PracticalIn Proceedings of the 2015 IEEE Symposium on Security and Privacy(SP rsquo15) IEEE Computer Society Washington DC USA 605ndash622httpsdoiorg101109SP201543

[63] Jason Lowe-Power 2018 Visualizing Spectre with gem5 httpwwwlowepowercomjasonvisualizing-spectre-with-gem5html Onlineaccessed Jun 2018

[64] Jason Lowe-Power Venkatesh Akella Matthew K Farrens Samuel TKing and Christopher J Nitta 2018 Position Paper A Case forExposing Extra-architectural State in the ISA In Proceedings of the7th International Workshop on Hardware and Architectural Support forSecurity and Privacy (HASP rsquo18) ACM New York NY USA Article 86 pages httpsdoiorg10114532142923214300

[65] Scott A Mahlke David C Lin William Y Chen Richard E Hank andRoger A Bringmann 1992 Effective compiler support for predicatedexecution using the hyperblock In MICRO

[66] Giorgi Maisuradze and Christian Rossow 2018 Ret2Spec SpeculativeExecution Using Return Stack Buffers In Proceedings of the 2018 ACMSIGSAC Conference on Computer and Communications Security (CCSrsquo18) ACM New York NY USA 2109ndash2122 httpsdoiorg10114532437343243761

[67] Microsoft Security Response Center (MSRC) Matt Miller 2018Analysis and mitigation of speculative store bypass (CVE-2018-3639) httpsblogstechnetmicrosoftcomsrd20180521analysis-and-mitigation-of-speculative-store-bypass-cve-2018-3639 Onlineaccessed Jul 2018

[68] Keaton Mowery Sriram Keelveedhi and Hovav Shacham 2012 AreAES x86 cache timing attacks still feasible In Proceedings of the 2012ACM Workshop on Cloud computing security workshop

[69] Oleksii Oleksenko Bohdan Trach Tobias Reiher Mark Silbersteinand Christof Fetzer 2018 You Shall Not Bypass Employing datadependencies to prevent Bounds Check Bypass CoRR abs180508506(2018) arXiv180508506 httparxivorgabs180508506

[70] Yossef Oren Vasileios P Kemerlis Simha Sethumadhavan and An-gelos D Keromytis 2015 The Spy in the Sandbox Practical CacheAttacks in JavaScript and Their Implications In Proceedings of the22Nd ACM SIGSAC Conference on Computer and CommunicationsSecurity

[71] Guilherme Ottoni Thomas Hartin ChristopherWeaver Jason BrandtBelliappa Kuttanna and HongWang 2011 Harmonia A TransparentEfficient and Harmonious Dynamic Binary Translator Targeting theIntelampReg Architecture In Proceedings of the 8th ACM InternationalConference on Computing Frontiers (CF rsquo11) ACM New York NYUSA Article 26 10 pages httpsdoiorg10114520166042016635

[72] Andrew Pardoe 2018 Spectre mitigations in MSVC httpsblogsmsdnmicrosoftcomvcblog20180115spectre-mitigations-in-msvc Online accessed Jul 2018

[73] Joseph CH Park and Mike Schlansker 1991 On predicated executionHewlett-Packard Laboratories Palo Alto California

[74] Will Reese 2008 Nginx The High-performance Web Server andReverse Proxy Linux J 2008 173 Article 2 (Sept 2008) httpdlacmorgcitationcfmid=14122021412204

[75] Ryan Roemer Erik Buchanan Hovav Shacham and Stefan Savage2012 Return-oriented programming Systems languages and ap-plications ACM Transactions on Information and System Security(2012)

[76] Michael Schwarz Martin Schwarzl Moritz Lipp and Daniel Gruss2018 NetSpectre Read Arbitrary Memory over Network arXivpreprint arXiv180710535 (2018)

[77] Hovav Shacham Matthew Page Ben Pfaff Eu-Jin Goh NagendraModadugu and Dan Boneh 2004 On the Effectiveness of Address-space Randomization In Proceedings of the 11th ACM Conference onComputer and Communications Security (CCS rsquo04) ACM New YorkNY USA 298ndash307 httpsdoiorg10114510300831030124

[78] Julian Stecklina and Thomas Prescher 2018 LazyFP Leaking FPURegister State using Microarchitectural Side-Channels arXiv preprintarXiv180607480 (2018)

[79] G Edward Suh Jae W Lee David Zhang and Srinivas Devadas2004 Secure Program Execution via Dynamic Information FlowTracking In Proceedings of the 11th International Conference onArchitectural Support for Programming Languages and OperatingSystems (ASPLOS XI) ACM New York NY USA 85ndash96 httpsdoiorg10114510243931024404

[80] Mohammadkazem Taram Ashish Venkat and Dean Tullsen 2018Mobilizing the Micro-Ops Exploiting Context Sensitive Decodingfor Security and Energy Efficiency In Proceedings of the 45th AnnualInternational Symposium on Computer Architecture (ISCA rsquo18)

[81] Mohit Tiwari Hassan MG Wassel Bita Mazloom ShashidharMysore Frederic T Chong and Timothy Sherwood 2009 Com-plete Information Flow Tracking from the Gates Up In Proceedings ofthe 14th International Conference on Architectural Support for Program-ming Languages and Operating Systems (ASPLOS XIV) ACM NewYork NY USA 109ndash120 httpsdoiorg10114515082441508258

[82] Caroline Trippel Daniel Lustig and Margaret Martonosi 2018MeltdownPrime and SpectrePrime Automatically-Synthesized At-tacks Exploiting Invalidation-Based Coherence Protocols CoRRabs180203802 (2018) arXiv180203802 httparxivorgabs180203802

[83] DeanM Tullsen Susan J Eggers Joel S Emer Henry M Levy Jack L Loand Rebecca L Stamm 1996 Exploiting choice Instruction fetch andissue on an implementable simultaneous multithreading processor InACM SIGARCH Computer Architecture News Vol 24 ACM 191ndash202

[84] Dean M Tullsen Susan J Eggers and Henry M Levy 1995 Si-multaneous Multithreading Maximizing On-chip Parallelism InProceedings of the 22Nd Annual International Symposium on Com-puter Architecture (ISCA rsquo95) ACM New York NY USA 392ndash403httpsdoiorg101145223982224449

[85] Paul Turner 2018 Retpoline a software construct for preventingbranch-target-injection httpssupportgooglecomfaqsanswer7625886 Online accessed Jul 2018

[86] N Vachharajani M J Bridges J Chang R Rangan G Ottoni J ABlome G A Reis M Vachharajani and D I August 2004 RIFLEAn Architectural Framework for User-Centric Information-Flow Se-curity In Microarchitecture 2004 MICRO-37 2004 37th InternationalSymposium on 243ndash254 httpsdoiorg101109MICRO200431

[87] Jo Van Bulck Marina Minkin Ofir Weisse Daniel Genkin BarisKasikci Frank Piessens Mark Silberstein Thomas F Wenisch Yu-val Yarom and Raoul Strackx 2018 Foreshadow Extracting theKeys to the Intel SGX Kingdom with Transient Out-of-Order Execu-tion In Proceedings of the 27th USENIX Security Symposium USENIXAssociation See also technical report Foreshadow-NG [95]

[88] Ashish Venkat 2018 Breaking the ISA Barrier in Modern ComputingPhD Dissertation UC San Diego

[89] Ashish Venkat Arvind Krishnaswamy Koichi Yamada and RajanPalanivel 2015 Binary Translation driven Program State RelocationIn United States Patent Grant US009135435B2

[90] Ashish Venkat S Shamasunder Hovav Shacham and DeanM Tullsen2016 HIPStR Heterogeneous-ISA Program State Relocation In Pro-ceedings of the International Symposium on Architectural Support forProgramming Languages and Operating Systems

[91] Ashish Venkat and Dean M Tullsen 2014 Harnessing ISA diversityDesign of a heterogeneous-ISA chip multiprocessor In Proceedingsof the International Symposium on Computer Architecture

[92] G Venkataramani I Doudalis Y Solihin and M Prvulovic 2008FlexiTaint A programmable accelerator for dynamic taint propaga-tion In 2008 IEEE 14th International Symposium on High PerformanceComputer Architecture 173ndash184 httpsdoiorg101109HPCA20084658637

[93] Philipp Vogt Florian Nentwich Nenad Jovanovic Engin KirdaChristopher Kruegel and Giovanni Vigna 2007 Cross Site Script-ing Prevention with Dynamic Data Tainting and Static Analysis InNDSS Vol 2007 12

[94] Zhenghong Wang and Ruby B Lee 2006 Covert and side channelsdue to processor architecture In Computer Security Applications Con-ference 2006 ACSACrsquo06 22nd Annual IEEE 473ndash482

[95] Ofir Weisse Jo Van Bulck Marina Minkin Daniel Genkin BarisKasikci Frank Piessens Mark Silberstein Raoul Strackx Thomas FWenisch and Yuval Yarom 2018 Foreshadow-NG Breaking theVirtual Memory Abstraction with Transient Out-of-Order Execu-tion Technical report (2018) See also USENIX Security paper Fore-shadow [87]

[96] Troy Wolverton 2018 Spectre and Meltdown are now a le-gal pain for Intel the chip maker faces 35 lawsuits over the at-tacks httpswwwbusinessinsidercom35-lawsuits-have-been-filed-against-intel-over-spectre-and-meltdown-2018-2 Online ac-cessed Aug 2018

[97] Zhenyu Wu Zhang Xu and Haining Wang 2012 Whispers in theHyper-space High-speed Covert Channel Attacks in the Cloud InPresented as part of the 21st USENIX Security Symposium (USENIX Se-curity 12) USENIX Bellevue WA 159ndash173 httpswwwusenixorgconferenceusenixsecurity12technical-sessionspresentationwu

[98] Wei Xu Sandeep Bhatkar and Ramachandran Sekar 2006 Taint-Enhanced Policy Enforcement A Practical Approach to Defeat aWide Range of Attacks In USENIX Security Symposium 121ndash136

[99] M Yan J Choi D Skarlatos A Morrison C Fletcher and J Torrellas2018 InvisiSpec Making Speculative Execution Invisible in the CacheHierarchy In 2018 51st Annual IEEEACM International Symposium onMicroarchitecture (MICRO) 428ndash441 httpsdoiorg101109MICRO201800042

[100] Yuval Yarom and Katrina Falkner 2014 FLUSH+ RELOAD A HighResolution Low Noise L3 Cache Side-Channel Attack In USENIXSecurity

[101] Yuval Yarom Daniel Genkin and Nadia Heninger 2016 CacheBleedA timing attack on OpenSSL constant time RSA In InternationalConference on Cryptographic Hardware and Embedded Systems

[102] Alexander Yip Xi Wang Nickolai Zeldovich and M Frans Kaashoek2009 Improving Application Security with Data Flow AssertionsIn Proceedings of the ACM SIGOPS 22Nd Symposium on OperatingSystems Principles (SOSP rsquo09) ACM New York NY USA 291ndash304httpsdoiorg10114516295751629604

[103] Craig Zilles 2001 Linked List Traversal Micro-Benchmark httpzillescsillinoisedullubenchmarkhtml Online accessed Jun 2018

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
  • 3 Assumptions and Threat Model
  • 4 Architectural Overview
  • 5 Design and Implementation
    • 51 Microcode Customization
    • 52 Decoder-Level Information Flow Tracking
    • 53 Mitigations for Control-flow Mistraining
      • 6 Methodology
      • 7 Evaluation
        • 71 Security Discussion
        • 72 Performance
          • 8 Conclusion
          • Acknowledgments
          • References
Page 4: Context-Sensitive Fencing: Securing Speculative Execution ...av6ds/papers/asplos2019.pdf · Context-Sensitive Fencing: Securing Speculative Execution via Microcode Customization Mohammadkazem

The Spectre variants 11 and 12 dubbed Speculative BufferOverflows [55] exploit the store-to-load forwarding opti-mization to stitch together Spectre gadgets More specificallythese attacks bypass a stack buffer overflow check similarto variant-1 and execute a Spectre gadget that speculativelystores malicious content (the address of the next Spectregadget) into the return address on the stack Store-to-loadforwarding combined with the fact that most modern pro-cessors break down the return instruction into micro-opsthat load the return address from the stack before transfer-ring control then result in a ROP-style execution of Spectregadgets that leave a trail of microarchitectural state behindThe variant-5 attack [58] achieves similar effect by insteadmistraining the return address stack employed by most pro-cessors to speculatively predict the target of a procedurereturn

Spectre Mitigations The current set of mitigations forSpectre range from simple coding guidelines[10] to propos-als that advocate exposing microarchitectural details in theISA [64] To mitigate Meltdown Kernel Page Table Isolation(KPTI) [17 36] has been proposed and recently patched to theLinux kernel incurring about 6 in performance To mitigateSpectre v1 multiple chip manufacturers including Intel [45]ARM [6] and AMD [4] have suggested instrumenting codewith serializing instructions or fences to inhibit speculationat specific points in execution

For example consider Figure 2a that shows the assemblycode for the Spectre variant-1 gadget in Figure 1 Figure 2bshows a software-patched version that employs a serializinginstruction or a fence to prevent the speculative execution ofthe Spectre gadget (ie lines 6-9) In most implementationsupon decoding the fence the processor stops fetching newinstructions until the fence gets committed or squashedthereby serializing execution In our example if the attackercalls the vulnerable gadget with out-of-bounds values in theedi register the processor front-end stalls until the fence iscommitted thereby disallowing the speculative execution ofthe Spectre gadget and completely mitigating the attackThe associated performance overhead with liberal fence

insertion however could be as high as 10x [69] While itis possible to perform compiler-directed code instrumen-tation with fences at a lower performance overhead [72]locating the potential Spectre targets using static analysis isnon-trivial and could therefore result in less than full cov-erage [56] Speculative Load Hardening [11 12] and YSNB(You Shall Not Bypass) [69] propose to use predicated execu-tion [50 65 73] to alleviate the high overheads for fences byinjecting an artificial data dependency between the condi-tional branch and the Spectre gadget These proposals incur36-60 overhead in performanceTo cope with variant-2 both Intel and AMD have an-

nounced microcode update patches that introduce new fenceinstructions [4 45] that prevent instructions preceding thefence from controlling the indirect branch prediction of

branches that follow the fence The fences also prevent soft-ware with lower privileges from influencing the indirectbranch prediction of software with higher privileges Fur-thermore these patches restrict indirect branch predictionfrom being controlled by co-located threads via simultane-ous multithreading [83 84] Furthermore the use of retpo-lines [45 46 85] have been advocated to replace indirectjumps and calls with an equivalent push+ret instruction se-quence in order to bypass the indirect branch predictorHowever they could further expand the attack surface inthe wake of v5 [58 66] v11 and v12 attacks [54]

On the hardware front SafeSpec [53] and InvisiSpec [99]propose mitigating side effects of speculative execution byadding new shadow user-invisible structures for caches andTLBs that store transient results from speculative instruc-tions and committing them to main cacheTLB only if thespeculation was deemed correct and the corresponding in-structions gracefully retire Although these techniques makedisruptive changes to the processormemory architectureand consistency models they make significant strides in se-cure hardware design with minimal performance impact Fi-nally Dong et al [28 29] propose leveraging Virtual Ghost [22]to protect applications running on a compromised OS kernelfrom Spectre

Secure Instruction StreamCustomization Prior workhas established that instruction customization is an effectivemeans to instrument the dynamic execution stream with se-curity checks and therebymitigate several attack vectors at arelatively low performance overhead Instruction stream cus-tomization has been proposed and evaluated in various formsand levels ranging from secure virtual architectures at theISA and compiler level [23 88 90] to full-blown processor bi-nary translation at the microcode level [9 25 71 89] Corlisset al [18ndash20] propose dynamic instruction stream editing(DISE) a macro-engine that customizes the dynamic instruc-tion stream at the decoder level by pattern-matching user-defined production rules pushed into the decoder Taram etal [80] propose context-sensitive decoding that leverages theCISC to RISC translation feature of modern instruction setdecoders to dynamically alter the behavior of programmer-visible instructions without recompilation or binary trans-lation [26 91] While we leverage these approaches in thisresearch this work targets a different class of emerging at-tacks and proposes several additional techniques beyondmicrocode customization

Information Flow Tracking Suh et al [79] first pro-posed the idea of dynamic information flow tracking (DIFT)that tags data from untrusted channels as spurious and fur-ther tracks the information flow of spurious data flaggingany violation of an enforced security policy Multiple hard-ware information flow tracking techniques have been de-scribed at various levels of the hardware [14ndash16 21 24 81

86 92] as a detection mechanism to circumvent code injec-tion [98] cross-site scripting attacks [24 93] buffer over-flows [24 98] and SQL injection [24 39 40 60 102] Thiswork proposes decoder-level information-flow tracking asa detection mechanism for Spectre and addresses severalassociated challenges

3 Assumptions and Threat ModelAmong the Spectre variants the Bounds Check Bypass (variant-1) is the one which has had the highest impact in terms ofaffected platforms and devices Therefore the main focus ofthis work is to mitigate the variant-1 attack at an acceptablelevel of performance However owing to the flexibility ad-vantages of microcode customization we also propose newmitigations against other Spectre variants documented inTable 1

We assume that the goal of the attacker is to read arbi-trary memory contents by exploiting a Spectre gadget Alsowithout loss of generality in this work we defend against aSpectre attacker that targets the most obvious and the mostimportant victim ie the OS kernel but our frameworkallows us to extend the proposed techniques to mitigate dif-ferent targets such as virtual machines and browsers Ourattack model assumes an adversary with the following capa-bilities

bull Information Leakage We assume that the adver-sary has the ability and privilege to probe flush orevict any cache line including that of the kernel at anyparticular time Also they can make precise timingmeasurements using instructions such as rdtsc andrdtscp

bull Co-location We assume that the adversary is colo-cated with the victim and can not only leak informa-tion through cache-based side channels but can alsomistrain shared branch predictor structures includingbranch target buffers and return address stacks andcan further influence the branch outcome of the victimas described in the variant-2 attack

bull User-Mode Access We assume that the adversaryhas standard user-mode access and can invoke anysystem call with arbitrary and carefully chosen ar-guments and can further access devices such as thekeyboard or network to feed the kerneldriver codewith carefully chosen data

Moreover we assume that the only communication chan-nel between the attacker and the kernel code is throughmicro-architectural side channels and the attacker does nothave any other channel direct or indirect to leak data Amongall the micro-architectural side-channels data caches are theones that are predominantly used and are the easiest to ex-ploit Thus the mitigation strategies described in this workprimarily focus on those variants of the attack that use thedata cache as their covert channel to leak information Again

Configuration Registers (MSRs)

Mac

ro-o

p qu

eue

microop-Queue

Scratchpad Regs

Runtime System

DLIFTMacro-op Stream

Macro-to-microop Translation Unit

CSD-Enabled Decoder

INC [0x802ac] FENCE LD t0[0x802ac]ADD t0 t0 1ST [0x802ac] t0 Di

spat

cher Mistaint

Detection and Recovery

Figure 3 Architectural Overview

due to the flexiblity and programmability of the proposedstrategy it is possible to easily extend the approach to miti-gate other side-channels such as functional unit contentionand AVX timing channels [76]

4 Architectural OverviewFigure 3 gives the architectural overview of our defensestrategy ndash context-sensitive fencing The central piece of theproposed architecture is an x86 microcode engine that hascontext-sensitive decoding (CSD) capabilities [80] allow-ing it to optionally translate a native x86 instruction into acustomizable alternate set of micro-ops In this work weleverage this capability to perform the surgical insertion ofspeculation fences (some existing and some newly proposed)at potentially vulnerable Spectre code targets To this end weintroduce new custommicro-op flows and new configurationmechanisms that trigger such micro-op flowsThe CSD-enabled microcode engine is provisioned with

fine-grained reconfiguration capabilities via a set of model-specific registers (MSRs) that can control the frequency typeand enforcement criteria of speculation fences inserted intothe dynamic instruction stream Such a fine-grained recon-figuration capability is especially important to this workbecause speculation fences are particularly expensive allow-ing us to surgically insert fences that impose varying degreesof restrictions on speculative execution depending upon theruntime conditions current level of threat and the nature ofthe code being executedMoreover the context-sensitive fencing framework also

benefits from a novel decoder-level information flow track-ing (DLIFT) engine that has the ability to identify untrustedinstructions that are potentially in Spectre gadgets and trig-ger alternate micro-op flows that insert speculation fencesOwing to its decoder-level and inherently speculative imple-mentation DLIFT relies on a mistaint detection and recoveryhardware implemented in the execute stage to avoid over-tainting and undertainting scenarios Finally the proposeddefense framework also includes hardware and microcode-level mechanisms to achieve branch predictor state isola-tion across protection domains to mitigate the variant-2 andvariant-4 attacks

Table 2 List of Intelrsquos Serializing InstructionsType Instr Desc

PrivilegedSeria

lizingInstructions INVD Invalidate Internal Caches

INVEPT Invalidate Translations from EPTINVLPG Invalidate TLB EntriesINVVPID Invalidate Translations Based on VPIDLIDT Load Interrupt Descriptor Table RegisterLGDT Load Global Descriptor Table RegisterLLDT Load Local Descriptor Table RegisterLTR Load Task RegisterMOV Move to Control RegisterMOV Move to Debug Register

WBINVD Write Back and Invalidate CacheWRMSR Write to Model Specific Register

Non

Priv CPUID CPU Identification

IRET Interrupt ReturnRSM Resume from System Management Mode

Mem

Ordering SFENCE Store Fence

LFENCE Load Fence

MFENCE Memory Fence

5 Design and ImplementationIn this section we describe in greater detail the architecturaltechniques and building blocks that together constitute theproposed defense strategy ndash context-sensitive fencing Firstwe describe a microcode customization framework that en-ables the surgical insertion of fences to secure speculativeexecution with minimal performance impact Second we ex-amine the full design space of fences to provide the isolationproperties we need without undue obstruction of existingpipeline performance features Third we propose a decoder-level information flow tracking (DLIFT) technique to followpotentially malicious execution patterns and trigger securemicrocode customization Finally we propose protectionmechanisms that circumvent the mistraining of the branchpredictor and associated structures

51 Microcode CustomizationSpeculation fences are a processorrsquos primary mechanism tooverride speculative execution For Spectre variant-1 context-sensitive fencing works with CSD by providing alternate de-codings of all load instructions with the alternate decodingalways including a fence micro-op that appears before theload micro-op The alternate decoding will then be triggeredor not based on runtime conditions The first considerationthen is what fence instruction to incorporate Most pro-cessors already provide a variety of fences and serializinginstructions We first study the performance and securityimpact of the existing suite of fences and serializing instruc-tions and then explore and propose new speculation fencesand context-sensitive fencing techniques that manage theimpact on performance For Spectre variant-2 CSF works

micro-op

Que

ue

Macro-op Stream Macro-to-microop

Translation Unit

CSD-Enabled Decoder

FENCE Injection

Point

micro-op

Dis

patc

her

Load Store-Queue

FU1 FU2

RSRS

RSRSRS

Cache

Cac

he

Con

trolle

r

Mem

Mem

C

ontro

ller

Fence Enforcement Point

Figure 4 Fence Enforcement Points

with CSD by providing alternate decodings of some con-trol flow and possibly insert fences andor branch predictorresetting micro-ops ndash more detail is in Section 53

511 Serializing Instructions and Memory FencesSerializing instructions are the strictest amongst all specula-tion fences and completely override speculative executionUpon decoding a serializing instruction the processor stallsfetching any subsequent instruction until all instructionspreceding the serializing instruction retire Due to the highpipeline depth and issue width of modern out-of-order su-perscalars the usage of serializing instructions could resultin long delays and considerable throughput loss Memoryfences on the other hand enforce a memory serializationpoint in the program instruction stream More specificallythese fences ensure that memory operations that appear inexecution after the fence are stalled until all the outstandingmemory requests (including the fence) complete executionTable 2 shows a complete list of Intelrsquos existing serializ-

ing instructions and fences [48] Most of the serializing in-structions need to be run in privileged mode which restrictstheir usage in defenses for victims that lack sufficient privi-leges (eg browsers) Moreover a majority of them modifythe state or contents of architectural registers the programcounter cache TLB etc and only serialize execution as aside effect thereby requiring an additional backuprestorestep when used for the sole purpose of serializing executionAn exception to this is the MOV to debug register instructionthat does not corrupt any architectural state if not actuallyin debug modeThe SFENCE instruction does not allow stores to pass

through it but does not affect loads That is it enforces thatall stores that precede the SFENCE are executed to comple-tion before any store that succeeds the SFENCE is fetchedThe MFENCE instruction restricts all memory operationsfrom passing through it Unlike SFENCE and MFENCE thatare memory ordering operations LFENCE performs a seri-alizing operation In particular LFENCE does not stop the

processor from fetching and decoding instructions that ap-pear after the LFENCE but it restricts the dispatch of suchinstructions until the instructions preceding the LFENCEcomplete execution Since LFENCE serializes execution andyet makes some progress in the front-end it has been recom-mended by Intel as a low-overhead fence that can be insertedat vulnerable points in execution to defend against Spec-tre [45] In other words while serializing instructions suchas CPUID are enforced at Fetch LFENCE is enforced at theinstruction queue level Therefore in this work we start withthe LFENCE and propose new fences that come with fewerrestrictions different enforcement policies andor sport ad-ditional optimizations

512 Fence Enforcement PoliciesIt is important to note that none of the existing instructionsthat provide fence support were actually created for the pur-pose we (or the whole industry) need for Spectre mitigationThus this work will examine existing fences variants ofexisting fences and also introduce a new fence primitive Tobetter understand the design landscape we examine severalpossible properties of fences in this section

Early vs Late Enforcement We first categorize fencesinto early-enforced and late-enforced fences based on thepipeline stage at which they are enforced In particular werefer to any serializing instruction such as an LFENCE thatis enforced at the instruction queue or earlier in the pipelineas early-enforced If the fence is enforced at a later stage suchas the reservation station the loadstore queue or the cachecontroller we refer to it as a late-enforced fence Late enforce-ment in essence shifts the fence enforcement point towardsthe leaking structure (eg the cache) reducing the impacton instructions that do not access that structure and allow-ing for the enforcement of more fine-grained serializationrules (eg allow cache hits but not misses) Figure 4 showspotential fence enforcement points at various stages in theprocessor pipelineIt is important to note that the later a fence is enforced

the fewer the side channels it protects against For exampleIntelrsquos LFENCE prevents information leakage through all mi-croarchitectural structures that appear after the instructionqueue However to prevent information leakage through theinstruction cache side channel we would have to resort toa regular serializing instruction or an MFENCE resultingin prohibitively high performance overheads Similarly afence enforced at the data cache controller level would onlymitigate data cache-based side channels and will not protectagainst an FPU-based side-channel Instead the FPU-basedside-channel may be mitigated using a different fence thatis appropriately configured and enforced at the reservationstation or the FPU

Strict vs Relaxed Enforcement Depending upon howprohibitive they are we next classify fences into strict andrelaxed Strict fences are highly prohibitive and do not allow

Table 3 Characteristics of Different Fence Types

Fence Name EnforcementPoint

StrictRelaxed

Instructionsnot Allowed

MitigatesVariants

ExistingNew

Intelrsquos SIs (CPUID) Fetch Strict All All ExistingLFENCE IQ Strict All All Existing

LSQ-LFENCE LSQ Relaxed Ld v1 NewLSQ-MFENCE LSQ Relaxed LdampSt v1v11v12 New

CFENCE CC Relaxed None v1 New CC Cache Controller RS Reservation Station

any instructions to pass through them until the fence retireswhereas relaxed fences allow certain types of instructionsto pass through them For example an SFENCE that is en-forced at the loadstore queue and allows all instructions topass except stores that have a greater sequence number thanitself is a late-enforced and relaxed fence On the other handall x86 serializing instructions including LFENCE are early-enforced and strict Customizing the micro-op stream withearly-enforced and strict fences typically results in slower ex-ecution when compared to customization with late-enforcedand relaxed fences However with carefully enforced con-straints the late-enforced and relaxed fences could offer sim-ilar if not better security guarantees

Early vs Late Commit A fence typically remains ef-fective until it gets committed or squashed Based on Intelrsquosmanual [47] a serializing instruction is only allowed to becommitted if there is no preceding outstanding store thatis waiting to be written back While this behavior might benecessary for device synchronization or memory orderingenforcement for the purpose of securing speculative exe-cution against Spectre attacks there is no need to wait forstores to be written back This is because write buffers arenrsquotcommitted to the cache until the store reaches retirementand therefore the fact that a store is waiting for an outstand-ing writeback request to complete is enough evidence that itdid not occur along a misspeculated path Allowing a fenceto commit early without waiting for preceding outstandingstores to write back can considerably improve performancebecause as soon as a fence gets committed a stream of in-structions can advance further in the pipeline Therefore inthis work we propose and study the effects of a late-commitversion of each fence that does not wait for stores to be writ-ten back However we note that these versions should onlybe limited to the security use case we describe and shouldnot be used for synchronization

Newly Proposed Fences Table 3 summarizes the char-acteristics of different existing and newly proposed fencesAlthough the existing set of fences defend against all vari-ants they incur prohibitively high costs on performance dueto their strict enforcement constraints To better understandthe potential for fences beyond those that already exist wepropose and evaluate three new types of fences In an attackscenario we typically insert one of these fences between abranch instruction and a load instruction that potentiallyleaks sensitive information via a cache side-channel Theproposed fences more effectively and efficiently mitigate the

high impact variant-1 attacks by preventing informationleaks along misspeculated paths through cache-based sidechannels We describe each of them in greater detail below

The LSQ-LFENCE and the LSQ-MFENCE are relaxed fencesenforced at the loadstore queue While LSQ-LFENCE fenceis in effect it does not allow any subsequent load instructionto be issued out of the loadstore queue thereby preventingthe cache state from being changed by load instructions onmisspeculated paths Thus the LSQ-LFENCE mitigates theSpectre variant-1 attack On the other hand the more restric-tive LSQ-MFENCE does not allow any subsequent memoryinstruction (both loads and stores) to be issued out of theloadstore queue until the fence commits In addition tomitigating the variant-1 attack that the LSQ-LFENCE pro-tects against the LSQ-MFENCE mitigates the variant-11and variant-12 attacks that exploit store-to-load forwardingbetween speculative loads and stores

The CFENCE is a relaxed fence and is enforced at the cachecontroller level using the following set of rules First like anyother fence it allows all preceding instructions to proceedSecond since store instructions do not commit the contentsof the write buffer until the instruction retires they are unaf-fected by the CFENCE Finally it labels any subsequent loadas a non-modifying load and allows it to pass through thefence but the load is restricted from modifying the cachestate In particular a non-modifying load that results in acache hit is allowed to read the contents of the cache but isrestricted from changing the LRU and other metadata bitsA non-modifying load that results in a cache miss is markedas uncacheable allowing the memory read request to com-plete without altering the cache state In this way we avoidupdating the cache state upon encountering a speculativeload and donrsquot leave any observable cache footprints alongmisspeculated paths thereby mitigating the variant Spectrevariant-1 attack In addition due to locality of referencesthe miss rate of a reasonable program is typically very lowand therefore using CFENCE results in considerably lowerperformance overhead than other types of fencesNote that this can be applied recursively at each cache

level For example an L2 hit (L1 miss) will bypass the L1cache and not initiate a fill then read from the L2 cachewithout altering the LRU bits

513 Fence Frequency OptimizationThe most naiumlve yet secure way to insert fences is to liberallyinstrument every instruction of a vulnerable type (eg loadinstructions in the case of cache side-channels) in the pro-gram and add the fence micro-op to all of them In fact notevery instance of a vulnerable instruction type is necessarilyvulnerable for example all the loads in the program arenrsquotvulnerable to speculative attacks via cache side-channelsTherefore we can reduce the number of fences insertedHowever since failing to insert fences even in one scenariowould enable a Spectre attacker to read the whole victimrsquos

memory space it is of crucial importance that fence fre-quency optimizations be conducted meticulously In the fol-lowing we introduce two secure optimizations for reducingthe number of fencesBasic Block-Level Fence InsertionThe source of the Spec-tre attack is dynamic control speculation which implies thatthe speculation begins with a branch prediction and theprocessor starts speculatively executing along the predictedpath To fully mitigate this attack we want a fence betweeneach branch and subsequent loads but if one branch is fol-lowed by four loads we only need one fence to protect allfour Thus in this optimization we propose to only instru-ment (provide the alternate decoding for) the first instanceof a vulnerable instruction type (eg first load) of each ba-sic block and it is safe to leave the rest of the instructionsuninstrumented This is simply implemented by setting aflag in hardware whenever a branch is decoded then inserta micro-op in the alternative load decoding that along withthe fence also resets the flag When the flag is not set theoriginal decoding is usedTaint-Based Fence Insertion However even one load perbasic block is likely still too conservative In all known in-stances of the attacks the attacker performs some operations(mostly memory read) based on untrusted data that leadsto the information leak For example in Spectre variant-1the attacker provides an untrusted out-of-bound index to anarray In this work we assume any information that comesfrom the user address space and input devices (eg via thex86 IN instruction) as untrusted In addition we also con-sider DMArsquod pages as untrusted for which we rely on theIOMMU [5] to mark DMArsquod pages as tainted in the pagetable For the taint-based fence insertion optimization wepropose the insertion of fences for only vulnerabletaintedloads that operate on untrusted data by leveraging a noveldecoder-level information flow tracking (DLIFT) strategydescribed in the following section

52 Decoder-Level Information Flow TrackingAs its name indicates the distinguishing feature that makesDecoder-Level Information Flow Tracking (DLIFT) uniquefrom typical hardware taint tracking systems is that it canprovide the taint information at the decoder stage of thepipeline rather than at commit stage This is a critical distinc-tion for two reasons First of course our solution is decoder-based Second and perhaps more important execute- orcommit-based taint tracking comes too late in the pipelinefor any speculation-based attack and is of little useHowever commit is still the only stage where taint in-

formation is guaranteed to be correct so the design of adecode-based taint tracker is challenging In particular sincethe front-end of the pipeline typically runs far ahead in ex-ecution than the rest of the pipeline reading actual taintsfrom register files at the decode stage will read inaccuratevalues Therefore in the proposed DLIFT framework we

Macro-to-microop Translation

Unit

CSD-Enabled Decoder

Dis

patc

her

FetchTaint

Evaluator

Load Store Queue

2

Reg File

Actu

al T

aint

Bits

3

4

5

TLB

Tain

t Bitm

ap1

ADD rax rbx

raxSpecTaint |= rbxSpecTaint

Mistaint Recovery -Flushing the pipeline -Restarting translation and execution -Recovering speculative taint map

EA T

aint

CommitUnit

FetchDecode Execute

Speculative Taint Map

~Spec_Taint ampamp

Evaluated_Taint

True

Commit

Arch TaintMap

Arch RF

Figure 5 DLIFT integration with a CSD-enabled pipeline

separate the taint information into four taint structures ndash (a)a decoder-level taint map that tracks and maintains the taintinformation for architectural registers (b) the physical reg-ister file augmented with taint information that maintainsdynamically computed taint information at execute (c) theTLB and page tables augmented with a taint bitmap to trackcache block-level taint information and (d) a commit-leveltaint map that maintains verified architectural register taintinformation While the latter three structures are typical ofany standard DIFT implementation [79] the first structureis a new addition proposed by this work Instruction trans-lation thus depends on some speculative taint tracking thatmight not be exact potentially leaving vulnerable instruc-tions unfenced (ie translating to non-secure micro-ops) Forthis reason DLIFT relies on a mistaint recovery mechanismthat upon detecting a mistaint at the execution stage redi-rects and restarts the execution from the incorrectly taintedinstructionFigure 5 shows the integration of DLIFT with a CSD-

enabled pipeline The DLIFT engine maintains a taint mapthat stores a speculative taint bit for each architectural reg-ister For each incoming instruction (bull1 ) the DLIFT engineevaluates the taint of the destination register(s) based on thespeculative taint bits of the source registers Taint evaluationfollows the standard DIFT procedure [79] If a source regis-ter is marked as tainted in the speculative taint map DLIFTpropagates this taint to the destination register and furthertriggers context-sensitive translation (bull2 ) of the instruction(eg with speculative fence insertion) depending upon theconfigured levels of performance and security

Tracking taints at the decoder level is not always straight-forward First during speculative execution the DLIFT en-gine continues to propagate taints along the misspeculatedpath potentially leaving the taint map in an inconsistentstate upon recovery from a branch misprediction To thisend the DLIFT engine rolls back the state of the taint map

to its commit-level counterpart as part of the branch mis-prediction recovery Second the effective addresses of loadinstructions is usually not known at the decoder stage dueto which the DLIFT engine cannot accurately compute andpropagate taint information A conservative approach thatmarks all loads with unknown effective addresses as taintedcould severely degrade performance since overtainting typi-cally results in a high frequency of fence insertion Thereforewe take a more optimistic approach by assuming loads withunknown effective addresses are untainted and further relyon our mistaint detection mechanism at the execute stageto validate the predicted taints against the dynamically eval-uated taints (bull3 andbull5 ) In the event a mistaint is detectedwe update the speculative taint map flush the pipeline andrestart the translation and execution of the mistainted in-struction Note that we only perform mistaint recovery forthe undertainting scenario since it could potentially leavevulnerable instructions unfenced

Furthermore the DLIFT engine is capable of following in-struction sequences and execution patterns such as a taintedload followed by a branch allowing for the seamless de-tection of different Spectre gadget variants In this way al-though speculative the DLIFT engine has the potential forfast recovery from misspeculation and further allows themicrocode customization framework to perform a more tar-geted and surgical insertion of fences for particular vulnera-ble targets

53 Mitigations for Control-flow MistrainingTwo Spectre variants (v2 and v5) rely on the mistrainingof the branch predictor and the return stack buffer to influ-ence the victimrsquos branch outcome across different protec-tion domains To mitigate these attacks we examine severalhardening mechanisms In the simplest case where we areprotecting kernel branches from getting influenced we caninstrument syscall instructions to enforce branch predictorstate isolation If we want to protect a wider range of domaincrossings we could rely on a simple hardware mechanismthat identifies control transfer to a domain with a higherprivilege and then sets a flag The first control flow instruc-tion that decodes after that flag is set would then triggeran alternate decoding and reset the flag The alternate de-coding would insert a fence (to protect against this branchbeing mispredicted) then execute micro-ops that enforcebranch predictor state isolation and subsequently resumethe original control flowThis solution then is flexible enough that any region of

code could be protected in this way by configuringmodel spe-cific range registers (MSRRs) to indicate a protected regionthat would always reset the branch predictor upon entrythis then even prevents mistraining of sensitive code ofdifferent threads within the same process To prevent theMSRRs from being tampered by the attacker we only allow

Table 4 Architecture Detail for the Baseline x86 CoreBaseline Processor

Frequency 33 GHz I cache 32 KB 8 wayFetch width 4 fused uops D cache 32 KB 8 wayIssue width 6 unfused uops ROB size 168 entriesINTFP Regfile 160144 regs IQ 54 entriesRAS size 816 32 entries BTB size 256 512 1024 entriesLQSQ size 6436 entries Functional Int ALU(6) Mult(1)Branch Predictor LTAGE Units FP ALUMult(2) SIMD(2)

the operating system to configure MSRRs and we use specu-lative fences to guard x86rsquosWRMSR instruction Furthermorein the case of remapping via mmap() the operating systemis also responsible for reconfiguring MSRRs

The easiest and the most heavyweight way to isolate theBP state is to clear all the states by a series of returnbranchmicro-ops However such a hard reset of the BP can be morequickly and efficiently performed using a special micro-opthat clears some or all state of the branch predictor therebyenabling custom solutions to reset different structures withinthe prediction unit if such a micro-op is available In thiswork we assume a special micro-op to clear the branch targetbuffer (BTB) and the return stack buffer (RSB) the primarytargets of the Spectre attackIt should be noted that while resetting the BTB and the

RSB will induce more mispredictions (something the attackerwants) it does the attacker no good with respect to theseSpectre variants because the attacker can no longer controlthe target of the control flow misspeculation

6 MethodologyThis section details the experimental methodology for theperformance and security evaluation of the context-sensitivefencing frameworkOur baseline processor is modeled after the Intel Sandy-

bridge microarchitecture [43] Table 4 shows the architec-tural configuration of our baseline processor in more detailNote that we evaluate against three different branch pre-dictor configurations (with different BTB and RAS sizes) tomeasure the performance impact of our micro-op flows thatperform branch predictor state isolation across protection do-mains to defend against variant-2Wemodel this architectureusing the gem5 [8] architectural simulator which already fea-tures x86 micro-op translation Furthermore gem5 alreadyfeatures a Spectre test infrastructure and a visualization toolthat allow us to validate our claims regarding the securityguarantees of context-sensitive fencing [63]One of the primary goals of this work is to protect ker-

nel memory from being leaked along misspeculated pathsTherefore we use the full system simulation mode of gem5which allows us to boot an Ubuntu 1804 distribution ofLinux with a kernel version of 4813 Furthermore in orderto provide realistic estimates regarding attack coverage andperformance impact of our proposed techniques we selecta good mix of benchmarks that spend different amounts ofexecution time in kernel mode touch different aspects of the

Table 5 Benchmarks DescriptionBenchmark Description Kernel-Time

nginx HTTP Web Server 66ps Process information query 75ping Sends ICMP ECHO_REQUEST to loopback 95ls Performs two level directory listing 79llu Linked list traversal micro-benchmark 7

bzip2 Compression 34gcc C Language optimizing compiler 11

omnetpp Discrete event simulation 15

sjeng Artificial Intelligence (game tree searchand pattern recognition) 22

operating system and perform different number of systemcalls during execution as illustrated in Table 5 To this endwe include four benchmarks bzip2 gcc sjeng omnet fromthe SPEC CPU2006 suite [41] that exhibit varying degreesof instruction-level parallelism Furthermore we use threecommonly used Unix tools that target different functional-ities of the kernel and use different device drivers ndash (ping)that sends and receives ICMP packets (ps aux) that queriesprocess information and (ls ) that queries the filesystemin order to list multi-level directory information In additionwe also add a memory-intensive program llu that allocatesa large amount of memory for a linked list and traversesthe list making random memory accesses [103] Finally wealso evaluate the impact of our fences on the nginx webserver [74] using the wrk [35] framework to generate HTTPrequests

7 EvaluationIn this section we evaluate both security and performanceimpacts of the proposed strategy starting first with securityassessment and following that with performance evaluation

71 Security DiscussionThe proposed defense strategy can mitigate five variants ofthe Spectre attacks We discuss these in detail below

Variants 1 11 and 12 The goal of these variants is toleak memory contents of a victim along a misspseculatedpath by bypassing a bounds check andor further stitch to-gether such Spectre gadgets via store-to-load forwardingAll variants exploit a data cache-based side channel to leakinformation Context-Sensitive Fencing mitigates these vari-ants primarily by preventing information from being leakedalong misspeculated paths via fences that are surgically in-serted into the instruction stream and strategically placedbetween the conditional branch that performs the boundscheck and the first load in the Spectre gadget This workproposes three new fences ndash LSQ-LFENCE LSQ-MFENCEand CFENCE that each defend against the variant-1 attack bydisallowing loads along misspeculated paths to modify thedata cache state LSQ-MFENCE additionally protects againstthe variants 11 and 12 by preventing both speculative loadsand stores from being issued out of the loadstore queue

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

Injected for All Kernel Loads

100

105

110

115

120

125

130

135

140

145

150

155

160

165

170

175

180

185

190

195

200

Nor

mal

ized

Exe

cution

Tim

e

LFENCE-LC

LSQ-MFENCE-LC

CFENCE-LC

Figure 6 Execution Time of Different Fence EnforcementLevels (normalized to insecure execution)

effectively avoiding store-to-load forwarding between spec-ulative loads and stores

The surgical insertion of these fences is facilitated by thedecoder-level information flow tracker (DLIFT) that can fol-low instruction sequences that could serve as Spectre gad-gets as described in Section 52 While the DLIFT itself per-forms speculative tracking of taint information it has theability to detect mistaints and recover within three stages ofthe pipeline We further experimentally show later in thissection that the speculative nature of the DLIFT engine maysometimes result in overtainting (where trusted operand-sinstructions get marked as tainted) but never results in anundetected and unrecovered undertainting (where untrustedoperandsinstructions remain untainted)

Variants 2 and 5 These variants reverse-engineer andmistrain the branch predictor and the return address stack toinfluence the branch outcomes of a victim which in our caseis the Linux kernel In addition to the surgical fence injec-tion and speculative taint tracking context-sensitive fencingfurther employs mistraining mitigations that circumventsuch attempts More specifically we intercept all protectiondomain crossings via syscall and int 0x80 instructions andperform a full reset of the branch target buffer (BTB) and thereturn stack buffer (RSB) clearing all previous state Thisprevents user code from influencing branch outcomes of thekernel code We record no instance of BTB collision betweenuser and kernel branches and we ensure that we alwaysstart with a clean BTB and RSB upon entering kernel mode

To show that our different fences and frequency optimiza-tions close Spectre attacks we use a proof of concept Spectreimplementation and visualization tool [63] The attacks failin every case when we use context-sensitive fencing to insertfences despite our performance optimizations

72 PerformanceFigure 6 measures the performance impact of three differentfences ndash (a) the standard x86 LFENCE (b) the LSQ-MFENCE

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

Injected for All Kernel Loads

100

105

110

115

120

125

130

135

140

145

Nor

mal

ized

Exe

cution

Tim

e

CFENCE-LC

CFENCE-EC

Figure 7 Execution Time of Early and Late Commit ofCFENCE (normalized to insecure execution)

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

00

01

02

03

04

05

06

07

08

09

10

Rat

ioO

ver

All

Loa

ds

bzip2 gcc omnet llu nginx sjeng ls ping ps

Non-Modifying Misses Non-Modifying Hits Misses Hits

Figure 8 Effects of injecting CFENCE on Cache Miss Rate

and (c) the CFENCE all enforced with the standard latecommit pessimistically inserted for every kernel load in theprogram Clearly from the figure the CFENCE incurs thelowest performance overhead since it is less restrictive thanthe other two and is enforced at a much later pipeline stageWe further study the effect of late and early commit policiesfor the CFENCE as shown in Figure 7 The early commitversion of the CFENCE consistently performs better than thelate commit version saving about 4 in overall executiontime on averageOverall the CFENCE reduces the incurred performance

overhead due to fencing by 23X bringing down the exe-cution time overhead from 48 to 21 Furthermore thisperformance improvement is consistent across all the bench-marks the only exception being llu which performs randommemory accesses due to the linked list traversal and suffersfrom a very high cache miss rateIn Figure 8 we study the effect of the CFENCE on cache

miss rate Recall that when a load passes a CFENCE it getsmarked as non-modifying and as a result if it ends up beinga miss in the cache it is deemed uncacheable and therefore

the fetched cache block isnrsquot brought into cache If the missrate of a program is already high as in the case of llu thepresence of a CFENCE in the instruction stream could po-tentially result in under-utilization of the cache since themissed blocks arenrsquot being filled back into the cache whilethe CFENCE is begin enforced For programs that have un-usually low hit rates we suggest using the standard LFENCEinstead of the CFENCE However in those cases it should benoted that the overhead of the mitigation is low regardlessof the fence usedFurthermore it is important to note that the number of

non-modifying loads in the dynamic instruction stream donot necessarily have a negative impact on performancewhile a CFENCE is being enforced For example while theping program has the most non-modifying loads it doesnot suffer much in terms of overall performance becausethe working set fits well into the cache and most of thenon-modifying accesses end up being hits In general wegain more from employing the CFENCE instead of standardserializing fences when the cache hit rate is lowFigure 9 shows the performance of our proposed fence

frequency optimization techniques In the first optimizationwe inject the CFENCE only for tainted loads and branches asindicated by the DLIFT engine This reduces the performanceoverhead of the defense from 21 to 11 on average We fur-ther optimize the number of fences inserted by performingbasic block-level fence insertion where we only instrumentthe first instance of a vulnerable load in each basic block Thisresults in an additional 4 improvement in performance AsFigure 9 shows the once per basic block opitimization is suc-cessful at improving the performance of all the benchmarksexcept llu That is because CFENCE changes the cache accesspattern usually at some cost but occasionally beneficiallyby bypassing accesses and reducing pressure on the cacheIn such a case like llu with its large working set executingfewer fences would then be slightly less beneficial Overallcompared to the state-of-the-art fence injection scheme thatpessimistically injects an LFENCE for all kernel loads ourDLIFT-Based CFENCE injection reduces the performanceoverhead from 48 to just 77 on averageWe next study the accuracy and coverage of our DLIFT

implementation Figure 10 examines overtainting scenarioswhere the DLIFT engine conservatively marks instructionsas tainted when theyrsquore actually not The results are nor-malized to the total number of dynamic load instructionsexecuted Further we examine two scenarios ndash the bars tothe left represent DLIFT-based fencing without the basic-block level fence insertion and the bars to the right includesthe basic-block level fence insertion optimization In mostcases the percentage of overtainted loads remains low inboth scenariosFigure 11 examines undertainting scenarios where the

DLIFT engine optimistically forgoes instrumenting certaininstructions that are actually vulnerable but recovers from

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

CFENCE

100

105

110

115

120

125

130

135

140

145

150

155

160

Nor

mal

ized

Exe

cution

Tim

e

Always Fencing

DLIFT-Based Fencing

DLIFT + Once per BB

Figure 9 Impact of Fence Frequency Optimizations on Exe-cution Time

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

00

01

02

03

04

05

06

07

08

09

10

Rat

ioO

ver

All

Loa

ds

ping ps nginx bzip2 ls sjeng llu omnet gcc

Over-tainted Loads Tainted Loads

Figure 10 Accuracy of DLIFT Overtainting Rate of Loads

such scenarios later in the pipeline We observe that thepercentage of undertainted loads is low and consistentlybelow 10 with the outliers being the kernel-heavy applica-tions ping and ps which have the highest number of taintedinstructions (instructions with at least one tainted sourceregister) as shown in Figure 12Note that our DLIFT engine performs explicit informa-

tion flow tracking modeled after DIFT [79] that tracks copyload-address store-address and computational dependen-cies However we also evaluate CSF with a more conser-vative implicit information flow tracking model in whichwe taint the program counter when the branch outcomedepends upon a tainted value and further track both sidesof the branch This results in more tainted instructions andincurs 118 extra performance overhead on top of our taint-based CFENCE insertionTo evaluate the effect of our mistraining mitigations we

measure the impact of a full micro-op based BTB and RSBreset at every protection domain crossing We do this ex-periment on three separate branch predictor configurations(a) a small predictor with 256 BTB and 8 RAS entries (b) a

pssje

ngbz

ip2

ping llu gc

c ls

nginx

omne

t

Speculative DIFT ndash Sfence

000

001

002

003

004

005

006

007

008

009

Rat

ioof

Under

-tai

nte

dLoa

ds

Figure 11 Coverage of DLIFT Undertainting Rate of Loads

omnet bzip2 llu ls nginx ping ps sjeng gcc

Speculative DIFT ndash Sfence

00

01

02

03

04

05

06

07

08

09

10

Tai

nte

dIn

tstr

uct

ions

Tainted Untatinted

Figure 12 Ratio of instructions marked as tainted by DLIFT

medium predictor with 512 BTB and 16 RAS entries and(c) a large predictor with 1024 BTB and 32 RAS entries Wemeasure an average performance degradation of 27 on oursmall predictor 66 on our medium predictor and 152 onour large predictor Naturally the overhead is almost com-pletely due to the loss of prediction accuracy rather thanthe cost of micro-op expansionIn summary the proposed defense strategy introduces a

flexible microcode customization framework that performthe surgical insertion of newly introduced speculation fencesthat mitigate five variants of the Spectre class of attacksreducing the fencing overhead of state-of-the-art fence-basedSpectre mitigations by a factor of 6

8 ConclusionIn this work we propose context-sensitive fencing (CSF) aset of architectural techniques that provide high-performancedefense against Spectre-style attacks In particular we showthat we can reduce fencing overhead by a factor of 6 com-pared to a conservative fence insertion method

This is done by injecting fence instructions dynamicallyin the decoder with no recompilation and binary translationrequired This allows us to employ runtime information tostrategically insert fences when needed using taint infor-mation and avoiding redundant fences after branches Thiswork also introduces new fence primitives which protectsensitive structures from speculation-based microarchitec-tural effects with minimal impact on instruction throughputin the pipeline

AcknowledgmentsThe authors would like to thank the anonymous reviewersfor their helpful insights This research was supported inpart by NSF Grant CNS-1652925 NSFIntel FoundationalMicroarchitecture Research Grant CCF-1823444 and DARPAunder the agreement number HR0011-18-C-0020

References[1] Onur Acıiccedilmez Ccediletin Kaya Koccedil and Jean-Pierre Seifert 2007 Predict-

ing secret keys via branch prediction In CryptographersacircĂŹ Track atthe RSA Conference

[2] Onur Acıiccedilmez Werner Schindler and Ccediletin K Koccedil 2007 Cachebased remote timing attack on the AES In CryptographersacircĂŹ Trackat the RSA Conference

[3] Shaizeen Aga and Satish Narayanasamy 2017 InvisiMem SmartMemory Defenses for Memory Bus Side Channel In Proceedings ofthe 44th Annual International Symposium on Computer Architecture(ISCA rsquo17) ACM New York NY USA 94ndash106 httpsdoiorg10114530798563080232

[4] AMD 2018 White paper SOFTWARE TECHNIQUES FOR MANAG-ING SPECULATION ON AMD PROCESSORS Technical Report RE-VISION 12418 httpsdeveloperamdcomwp-contentresourcesManaging-Speculation-on-AMD-Processorspdf

[5] Nadav Amit Muli Ben-Yehuda Dan Tsafrir and Assaf Schuster 2011vIOMMU Efficient IOMMU Emulation In USENIX Annual TechnicalConference

[6] ARM 2018 Whitepaper Cache Speculation Side-channelsTechnical Report Version 22 httpsdeveloperarmcom-mediadeveloperpdfSecurity20update201020July2018CacheSpeculationSide-channels-v22pdfrevision=7de26366-a49f-4c23-85f0-34e8f5e38881

[7] Eli Biham and Adi Shamir 1997 Differential fault analysis of secretkey cryptosystems In Annual International Cryptology Conference

[8] Nathan L Binkert Ronald G Dreslinski Lisa R Hsu Kevin T Lim Ali GSaidi and Steven K Reinhardt 2006 The M5 Simulator ModelingNetworked Systems Micro IEEE (2006)

[9] Darrell Boggs Gary Brown Nathan Tuck and K S Venkatraman2015 Denver Nvidiarsquos First 64-bit ARM Processor IEEE Micro (2015)

[10] Chandler Carruth 2018 Mitigating Speculative Attacks inCrypto httpsgithubcomHACS-workshopspectre-mitigationsblobmastercryptoguidelinesmd Online accessed Jul 2018

[11] Chandler Carruth 2018 RFC Speculative Load Hardening (aSpectre variant1 mitigation) httpsdocsgooglecomdocumentd1wwcfv3UV9ZnZVcGiGuoITT61eKo3TmoCS3uXLcJR0edit Onlineaccessed Jul 2018

[12] Chandler Carruth 2018 RFC Speculative Load Hardening (a Spectrevariant1 mitigation) httpslistsllvmorgpipermailllvm-dev2018-March122085html Online accessed Jul 2018

[13] Guoxing Chen Sanchuan Chen Yuan Xiao Yinqian Zhang ZhiqiangLin and Ten H Lai 2018 SgxPectre Attacks Leaking Enclave

Secrets via Speculative Execution CoRR abs180209085 (2018)arXiv180209085 httparxivorgabs180209085

[14] H Chen X Wu L Yuan B Zang P c Yew and F T Chong 2008From Speculation to Security Practical and Efficient InformationFlow Tracking Using Speculative Hardware In 2008 InternationalSymposium on Computer Architecture 401ndash412 httpsdoiorg101109ISCA200818

[15] Shimin Chen Phillip B GibbonsMichael Kozuch and Todd CMowry2011 Log-based Architectures Using Multicore to Help SoftwareBehave Correctly SIGOPS Oper Syst Rev 45 1 (Feb 2011) 84ndash91httpsdoiorg10114519450231945034

[16] Hari Cherupalli Henry Duwe Weidong Ye Rakesh Kumar and JohnSartori 2017 Software-based Gate-level Information Flow Securityfor IoT Systems In Proceedings of the 50th Annual IEEEACM Interna-tional Symposium on Microarchitecture (MICRO-50 rsquo17) ACM NewYork NY USA 328ndash340 httpsdoiorg10114531239393123955

[17] JonathanCorbet 2017 The current state of kernel page-table isolationhttpslwnnetArticles741878 Online accessed Jan 2018

[18] Marc L Corliss E Christopher Lewis and Amir Roth 2003 DISE Aprogrammable macro engine for customizing applications In Com-puter Architecture 2003 Proceedings 30th Annual International Sym-posium on

[19] Marc L Corliss E Christopher Lewis and Amir Roth 2005 Low-overhead interactive debugging via dynamic instrumentation withDISE In 11th International Symposium on High-Performance ComputerArchitecture

[20] Marc L Corliss E Christopher Lewis and Amir Roth 2005 UsingDISE to Protect Return Addresses from Attack SIGARCH ComputArchit News (March 2005)

[21] Jedidiah R Crandall and Frederic T Chong 2004 Minos Control DataAttack Prevention Orthogonal toMemoryModel In Proceedings of the37th Annual IEEEACM International Symposium on Microarchitecture(MICRO 37) IEEE Computer Society Washington DC USA 221ndash232httpsdoiorg101109MICRO200426

[22] John Criswell Nathan Dautenhahn and Vikram Adve 2014 VirtualGhost Protecting Applications from Hostile Operating Systems InProceedings of the 19th International Conference on Architectural Sup-port for Programming Languages and Operating Systems (ASPLOS rsquo14)ACM New York NY USA 81ndash96 httpsdoiorg10114525419402541986

[23] John Criswell Andrew Lenharth Dinakar Dhurjati and Vikram Adve2007 Secure Virtual Architecture A Safe Execution Environmentfor Commodity Operating Systems In Proceedings of Twenty-firstACM SIGOPS Symposium on Operating Systems Principles (SOSP rsquo07)ACM New York NY USA 351ndash366 httpsdoiorg10114512942611294295

[24] Michael Dalton Hari Kannan and Christos Kozyrakis 2007 RakshaA Flexible Information Flow Architecture for Software Security InProceedings of the 34th Annual International Symposium on ComputerArchitecture (ISCA rsquo07) ACM New York NY USA 482ndash493 httpsdoiorg10114512506621250722

[25] James C Dehnert Brian K Grant John P Banning Richard JohnsonThomas Kistler Alexander Klaiber and Jim Mattson 2003 The Trans-meta Code Morphing Software using speculation recovery andadaptive retranslation to address real-life challenges In Proceedingsof the international symposium on Code generation and optimizationfeedback-directed and runtime optimization

[26] Matthew DeVuyst Ashish Venkat and Dean M Tullsen 2012 Ex-ecution migration in a heterogeneous-isa chip multiprocessor InProceedings of the Seventeenth International Conference on Architec-tural Support for Programming Languages and Operating Systems

[27] Craig Disselkoen David Kohlbrenner Leo Porter and Dean Tullsen2017 Prime+Abort A Timer-Free High-Precision L3 CacheAttack using Intel TSX In 26th USENIX Security Symposium

(USENIX Security 17) USENIX Association Vancouver BC 51ndash67 httpswwwusenixorgconferenceusenixsecurity17technical-sessionspresentationdisselkoen

[28] Xiaowan Dong Zhuojia Shen John Criswell Alan Cox and SandhyaDwarkadas 2018 Spectres Virtual Ghosts and Hardware SupportIn Proceedings of the 7th International Workshop on Hardware andArchitectural Support for Security and Privacy (HASP rsquo18) ACM NewYork NY USA Article 5 9 pages httpsdoiorg10114532142923214297

[29] Xiaowan Dong Zhuojia Shen John Criswell Alan L Cox and Sand-hya Dwarkadas 2018 Shielding Software From Privileged Side-Channel Attacks In 27th USENIX Security Symposium (USENIX Se-curity 18) USENIX Association Baltimore MD 1441ndash1458 httpswwwusenixorgconferenceusenixsecurity18presentationdong

[30] Dmitry Evtyushkin Ryan Riley Nael CSE Abu-Ghazaleh ECE andDmitry Ponomarev 2018 BranchScope A New Side-Channel Attackon Directional Branch Predictor In Proceedings of the Twenty-ThirdInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems (ASPLOS rsquo18) ACM New York NYUSA 693ndash707 httpsdoiorg10114531731623173204

[31] Karine Gandolfi Christophe Mourtel and Francis Olivier 2001 Elec-tromagnetic analysis Concrete results In International Workshop onCryptographic Hardware and Embedded Systems

[32] Qian Ge Yuval Yarom David Cock and Gernot Heiser 2018 Asurvey of microarchitectural timing attacks and countermeasures oncontemporary hardware Journal of Cryptographic Engineering 8 1(2018) 1ndash27

[33] Daniel Genkin Adi Shamir and Eran Tromer 2014 RSA key ex-traction via low-bandwidth acoustic cryptanalysis In InternationalCryptology Conference

[34] Daniel Genkin Adi Shamir and Eran Tromer 2016 Acoustic Crypt-analysis Journal of Cryptology (2016)

[35] Github [n d] wrk - a HTTP benchmarking tool httpsgithubcomwgwrk Online accessed Jun 2018

[36] Daniel Gruss Moritz Lipp Michael Schwarz Richard Fellner Cleacute-mentine Maurice and Stefan Mangard 2017 Kaslr is dead long livekaslr In International Symposium on Engineering Secure Software andSystems Springer 161ndash176

[37] Daniel Gruss Raphael Spreitzer and Stefan Mangard 2015Cache Template Attacks Automating Attacks on InclusiveLast-Level Caches In 24th USENIX Security Symposium (USENIXSecurity 15) USENIX Association Washington DC 897ndash912httpswwwusenixorgconferenceusenixsecurity15technical-sessionspresentationgruss

[38] Maurice CleacutementineWagner Klaus Gruss Daniel and StefanMangard2016 Flush+Flush A Fast and Stealthy Cache Attack

[39] V Haldar D Chandra andM Franz 2005 Dynamic taint propagationfor Java In 21st Annual Computer Security Applications Conference(ACSACrsquo05) 9 ppndash311 httpsdoiorg101109CSAC200521

[40] W Halfond A Orso and P Manolios 2008 WASP Protecting WebApplications Using Positive Tainting and Syntax-Aware EvaluationIEEE Transactions on Software Engineering 34 1 (Jan 2008) 65ndash81httpsdoiorg101109TSE200770748

[41] John L Henning 2006 SPEC CPU2006 Benchmark DescriptionsSIGARCH Comput Archit News (Sept 2006)

[42] Jonathan J Hoch and Adi Shamir 2004 Fault analysis of streamciphers In International Workshop on Cryptographic Hardware andEmbedded Systems

[43] Intel 2008 2nd Generation Intel Core vPro Processor Family Tech-nical Report httpwwwintelcomcontentdamdocwhite-paperperformance-2nd-generation-core-vpro-family-paperpdf

[44] Intel 2014 Software guard extensions programming reference (2014)httpssoftwareintelcomsitesdefaultfiles329298-001pdf

[45] Intel 2018 White paper Intel Analysis of Speculative Exe-cution Side Channels Technical Report 336983-001 Revision10 httpsnewsroomintelcomwp-contentuploadssites11201801Intel-Analysis-of-Speculative-Execution-Side-Channelspdf

[46] Intel 2018 White paper Retpoline A Branch Target In-jection Mitigation Technical Report 337131-003 Revision003 httpssoftwareintelcomsitesdefaultfilesmanaged1d46Retpoline-A-Branch-Target-Injection-Mitigationpdf

[47] Intel Corporation 2009 Intelreg 64 and IA-32 Architectures OptimizationReference Manual

[48] Intel Corporation 2011 Intel 64 and IA-32 Architectures SoftwareDeveloperrsquos Manual Intel Corporation

[49] Project Zero Jann Horn 2018 Reading privileged memory with a side-channel httpsgoogleprojectzeroblogspotcom201801reading-privileged-memory-with-sidehtml Online accessed Jan 2018

[50] Morteza Mohajjel Kafshdooz Mohammadkazem Taram Sepehr As-sadi and Alireza Ejlali 2016 A Compile-Time Optimization MethodforWCET Reduction in Real-Time Embedded Systems Through BlockFormation ACMTrans Archit Code Optim 12 4 Article 66 (Jan 2016)25 pages httpsdoiorg1011452845083

[51] R Karri K Wu P Mishra and Yongkook Kim 2002 Concurrenterror detection schemes for fault-based side-channel cryptanalysisof symmetric block ciphers IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems 21 12 (Dec 2002) 1509ndash1517httpsdoiorg101109TCAD2002804378

[52] Mehmet Kayaalp Khaled N Khasawneh Hodjat Asghari EsfedenJesse Elwell Nael Abu-Ghazaleh Dmitry Ponomarev and AamerJaleel 2017 RIC Relaxed Inclusion Caches for Mitigating LLC Side-Channel Attacks In Proceedings of the 54th Annual Design AutomationConference 2017 (DAC rsquo17) ACM New York NY USA Article 76 pages httpsdoiorg10114530616393062313

[53] Khaled N Khasawneh Esmaeil Mohammadian Koruyeh ChengyuSong Dmitry Evtyushkin Dmitry Ponomarev and Nael Abu-Ghazaleh 2018 SafeSpec Banishing the Spectre of a Meltdown withLeakage-Free Speculation arXiv preprint arXiv180605179 (2018)

[54] Vladimir Kiriansky and Carl Waldspurger 2018 Speculative bufferoverflows Attacks and defenses arXiv preprint arXiv180703757(2018)

[55] Vladimir Kiriansky and Carl Waldspurger 2018 Speculative BufferOverflows Attacks and Defenses arXiv preprint arXiv180703757(2018)

[56] Paul Kocher 2018 Spectre Mitigations in MicrosoftrsquosCC++ Compiler httpswwwpaulkochercomdocMicrosoftCompilerSpectreMitigationhtml Online accessedJul 2018

[57] Paul Kocher Daniel Genkin Daniel Gruss Werner Haas MikeHamburg Moritz Lipp Stefan Mangard Thomas Prescher MichaelSchwarz and Yuval Yarom 2018 Spectre Attacks Exploiting Specu-lative Execution ArXiv e-prints (Jan 2018) arXiv180101203

[58] Esmaeil Mohammadian Koruyeh Khaled N Khasawneh ChengyuSong and Nael Abu-Ghazaleh 2018 Spectre Returns SpeculationAttacks using the Return Stack Buffer In 12th USENIXWorkshop on Of-fensive Technologies (WOOT 18) USENIX Association Baltimore MDhttpswwwusenixorgconferencewoot18presentationkoruyeh

[59] Esmaeil Mohammadian Koruyeh Khaled N Khasawneh ChengyuSong and Nael Abu-Ghazaleh 2018 Spectre Returns SpeculationAttacks using the Return Stack Buffer In 12th USENIXWorkshop on Of-fensive Technologies (WOOT 18) USENIX Association Baltimore MDhttpswwwusenixorgconferencewoot18presentationkoruyeh

[60] Monica S Lam Michael Martin Benjamin Livshits and John Wha-ley 2008 Securing Web Applications with Static and DynamicInformation Flow Tracking In Proceedings of the 2008 ACM SIG-PLAN Symposium on Partial Evaluation and Semantics-based Pro-gram Manipulation (PEPM rsquo08) ACM New York NY USA 3ndash12

httpsdoiorg10114513284081328410[61] Moritz Lipp Michael Schwarz Daniel Gruss Thomas Prescher

Werner Haas Anders Fogh Jann Horn Stefan Mangard Paul KocherDaniel Genkin Yuval Yarom and Mike Hamburg 2018 MeltdownReading Kernel Memory from User Space In 27th USENIX SecuritySymposium (USENIX Security 18)

[62] Fangfei Liu Yuval Yarom Qian Ge Gernot Heiser and Ruby BLee 2015 Last-Level Cache Side-Channel Attacks Are PracticalIn Proceedings of the 2015 IEEE Symposium on Security and Privacy(SP rsquo15) IEEE Computer Society Washington DC USA 605ndash622httpsdoiorg101109SP201543

[63] Jason Lowe-Power 2018 Visualizing Spectre with gem5 httpwwwlowepowercomjasonvisualizing-spectre-with-gem5html Onlineaccessed Jun 2018

[64] Jason Lowe-Power Venkatesh Akella Matthew K Farrens Samuel TKing and Christopher J Nitta 2018 Position Paper A Case forExposing Extra-architectural State in the ISA In Proceedings of the7th International Workshop on Hardware and Architectural Support forSecurity and Privacy (HASP rsquo18) ACM New York NY USA Article 86 pages httpsdoiorg10114532142923214300

[65] Scott A Mahlke David C Lin William Y Chen Richard E Hank andRoger A Bringmann 1992 Effective compiler support for predicatedexecution using the hyperblock In MICRO

[66] Giorgi Maisuradze and Christian Rossow 2018 Ret2Spec SpeculativeExecution Using Return Stack Buffers In Proceedings of the 2018 ACMSIGSAC Conference on Computer and Communications Security (CCSrsquo18) ACM New York NY USA 2109ndash2122 httpsdoiorg10114532437343243761

[67] Microsoft Security Response Center (MSRC) Matt Miller 2018Analysis and mitigation of speculative store bypass (CVE-2018-3639) httpsblogstechnetmicrosoftcomsrd20180521analysis-and-mitigation-of-speculative-store-bypass-cve-2018-3639 Onlineaccessed Jul 2018

[68] Keaton Mowery Sriram Keelveedhi and Hovav Shacham 2012 AreAES x86 cache timing attacks still feasible In Proceedings of the 2012ACM Workshop on Cloud computing security workshop

[69] Oleksii Oleksenko Bohdan Trach Tobias Reiher Mark Silbersteinand Christof Fetzer 2018 You Shall Not Bypass Employing datadependencies to prevent Bounds Check Bypass CoRR abs180508506(2018) arXiv180508506 httparxivorgabs180508506

[70] Yossef Oren Vasileios P Kemerlis Simha Sethumadhavan and An-gelos D Keromytis 2015 The Spy in the Sandbox Practical CacheAttacks in JavaScript and Their Implications In Proceedings of the22Nd ACM SIGSAC Conference on Computer and CommunicationsSecurity

[71] Guilherme Ottoni Thomas Hartin ChristopherWeaver Jason BrandtBelliappa Kuttanna and HongWang 2011 Harmonia A TransparentEfficient and Harmonious Dynamic Binary Translator Targeting theIntelampReg Architecture In Proceedings of the 8th ACM InternationalConference on Computing Frontiers (CF rsquo11) ACM New York NYUSA Article 26 10 pages httpsdoiorg10114520166042016635

[72] Andrew Pardoe 2018 Spectre mitigations in MSVC httpsblogsmsdnmicrosoftcomvcblog20180115spectre-mitigations-in-msvc Online accessed Jul 2018

[73] Joseph CH Park and Mike Schlansker 1991 On predicated executionHewlett-Packard Laboratories Palo Alto California

[74] Will Reese 2008 Nginx The High-performance Web Server andReverse Proxy Linux J 2008 173 Article 2 (Sept 2008) httpdlacmorgcitationcfmid=14122021412204

[75] Ryan Roemer Erik Buchanan Hovav Shacham and Stefan Savage2012 Return-oriented programming Systems languages and ap-plications ACM Transactions on Information and System Security(2012)

[76] Michael Schwarz Martin Schwarzl Moritz Lipp and Daniel Gruss2018 NetSpectre Read Arbitrary Memory over Network arXivpreprint arXiv180710535 (2018)

[77] Hovav Shacham Matthew Page Ben Pfaff Eu-Jin Goh NagendraModadugu and Dan Boneh 2004 On the Effectiveness of Address-space Randomization In Proceedings of the 11th ACM Conference onComputer and Communications Security (CCS rsquo04) ACM New YorkNY USA 298ndash307 httpsdoiorg10114510300831030124

[78] Julian Stecklina and Thomas Prescher 2018 LazyFP Leaking FPURegister State using Microarchitectural Side-Channels arXiv preprintarXiv180607480 (2018)

[79] G Edward Suh Jae W Lee David Zhang and Srinivas Devadas2004 Secure Program Execution via Dynamic Information FlowTracking In Proceedings of the 11th International Conference onArchitectural Support for Programming Languages and OperatingSystems (ASPLOS XI) ACM New York NY USA 85ndash96 httpsdoiorg10114510243931024404

[80] Mohammadkazem Taram Ashish Venkat and Dean Tullsen 2018Mobilizing the Micro-Ops Exploiting Context Sensitive Decodingfor Security and Energy Efficiency In Proceedings of the 45th AnnualInternational Symposium on Computer Architecture (ISCA rsquo18)

[81] Mohit Tiwari Hassan MG Wassel Bita Mazloom ShashidharMysore Frederic T Chong and Timothy Sherwood 2009 Com-plete Information Flow Tracking from the Gates Up In Proceedings ofthe 14th International Conference on Architectural Support for Program-ming Languages and Operating Systems (ASPLOS XIV) ACM NewYork NY USA 109ndash120 httpsdoiorg10114515082441508258

[82] Caroline Trippel Daniel Lustig and Margaret Martonosi 2018MeltdownPrime and SpectrePrime Automatically-Synthesized At-tacks Exploiting Invalidation-Based Coherence Protocols CoRRabs180203802 (2018) arXiv180203802 httparxivorgabs180203802

[83] DeanM Tullsen Susan J Eggers Joel S Emer Henry M Levy Jack L Loand Rebecca L Stamm 1996 Exploiting choice Instruction fetch andissue on an implementable simultaneous multithreading processor InACM SIGARCH Computer Architecture News Vol 24 ACM 191ndash202

[84] Dean M Tullsen Susan J Eggers and Henry M Levy 1995 Si-multaneous Multithreading Maximizing On-chip Parallelism InProceedings of the 22Nd Annual International Symposium on Com-puter Architecture (ISCA rsquo95) ACM New York NY USA 392ndash403httpsdoiorg101145223982224449

[85] Paul Turner 2018 Retpoline a software construct for preventingbranch-target-injection httpssupportgooglecomfaqsanswer7625886 Online accessed Jul 2018

[86] N Vachharajani M J Bridges J Chang R Rangan G Ottoni J ABlome G A Reis M Vachharajani and D I August 2004 RIFLEAn Architectural Framework for User-Centric Information-Flow Se-curity In Microarchitecture 2004 MICRO-37 2004 37th InternationalSymposium on 243ndash254 httpsdoiorg101109MICRO200431

[87] Jo Van Bulck Marina Minkin Ofir Weisse Daniel Genkin BarisKasikci Frank Piessens Mark Silberstein Thomas F Wenisch Yu-val Yarom and Raoul Strackx 2018 Foreshadow Extracting theKeys to the Intel SGX Kingdom with Transient Out-of-Order Execu-tion In Proceedings of the 27th USENIX Security Symposium USENIXAssociation See also technical report Foreshadow-NG [95]

[88] Ashish Venkat 2018 Breaking the ISA Barrier in Modern ComputingPhD Dissertation UC San Diego

[89] Ashish Venkat Arvind Krishnaswamy Koichi Yamada and RajanPalanivel 2015 Binary Translation driven Program State RelocationIn United States Patent Grant US009135435B2

[90] Ashish Venkat S Shamasunder Hovav Shacham and DeanM Tullsen2016 HIPStR Heterogeneous-ISA Program State Relocation In Pro-ceedings of the International Symposium on Architectural Support forProgramming Languages and Operating Systems

[91] Ashish Venkat and Dean M Tullsen 2014 Harnessing ISA diversityDesign of a heterogeneous-ISA chip multiprocessor In Proceedingsof the International Symposium on Computer Architecture

[92] G Venkataramani I Doudalis Y Solihin and M Prvulovic 2008FlexiTaint A programmable accelerator for dynamic taint propaga-tion In 2008 IEEE 14th International Symposium on High PerformanceComputer Architecture 173ndash184 httpsdoiorg101109HPCA20084658637

[93] Philipp Vogt Florian Nentwich Nenad Jovanovic Engin KirdaChristopher Kruegel and Giovanni Vigna 2007 Cross Site Script-ing Prevention with Dynamic Data Tainting and Static Analysis InNDSS Vol 2007 12

[94] Zhenghong Wang and Ruby B Lee 2006 Covert and side channelsdue to processor architecture In Computer Security Applications Con-ference 2006 ACSACrsquo06 22nd Annual IEEE 473ndash482

[95] Ofir Weisse Jo Van Bulck Marina Minkin Daniel Genkin BarisKasikci Frank Piessens Mark Silberstein Raoul Strackx Thomas FWenisch and Yuval Yarom 2018 Foreshadow-NG Breaking theVirtual Memory Abstraction with Transient Out-of-Order Execu-tion Technical report (2018) See also USENIX Security paper Fore-shadow [87]

[96] Troy Wolverton 2018 Spectre and Meltdown are now a le-gal pain for Intel the chip maker faces 35 lawsuits over the at-tacks httpswwwbusinessinsidercom35-lawsuits-have-been-filed-against-intel-over-spectre-and-meltdown-2018-2 Online ac-cessed Aug 2018

[97] Zhenyu Wu Zhang Xu and Haining Wang 2012 Whispers in theHyper-space High-speed Covert Channel Attacks in the Cloud InPresented as part of the 21st USENIX Security Symposium (USENIX Se-curity 12) USENIX Bellevue WA 159ndash173 httpswwwusenixorgconferenceusenixsecurity12technical-sessionspresentationwu

[98] Wei Xu Sandeep Bhatkar and Ramachandran Sekar 2006 Taint-Enhanced Policy Enforcement A Practical Approach to Defeat aWide Range of Attacks In USENIX Security Symposium 121ndash136

[99] M Yan J Choi D Skarlatos A Morrison C Fletcher and J Torrellas2018 InvisiSpec Making Speculative Execution Invisible in the CacheHierarchy In 2018 51st Annual IEEEACM International Symposium onMicroarchitecture (MICRO) 428ndash441 httpsdoiorg101109MICRO201800042

[100] Yuval Yarom and Katrina Falkner 2014 FLUSH+ RELOAD A HighResolution Low Noise L3 Cache Side-Channel Attack In USENIXSecurity

[101] Yuval Yarom Daniel Genkin and Nadia Heninger 2016 CacheBleedA timing attack on OpenSSL constant time RSA In InternationalConference on Cryptographic Hardware and Embedded Systems

[102] Alexander Yip Xi Wang Nickolai Zeldovich and M Frans Kaashoek2009 Improving Application Security with Data Flow AssertionsIn Proceedings of the ACM SIGOPS 22Nd Symposium on OperatingSystems Principles (SOSP rsquo09) ACM New York NY USA 291ndash304httpsdoiorg10114516295751629604

[103] Craig Zilles 2001 Linked List Traversal Micro-Benchmark httpzillescsillinoisedullubenchmarkhtml Online accessed Jun 2018

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
  • 3 Assumptions and Threat Model
  • 4 Architectural Overview
  • 5 Design and Implementation
    • 51 Microcode Customization
    • 52 Decoder-Level Information Flow Tracking
    • 53 Mitigations for Control-flow Mistraining
      • 6 Methodology
      • 7 Evaluation
        • 71 Security Discussion
        • 72 Performance
          • 8 Conclusion
          • Acknowledgments
          • References
Page 5: Context-Sensitive Fencing: Securing Speculative Execution ...av6ds/papers/asplos2019.pdf · Context-Sensitive Fencing: Securing Speculative Execution via Microcode Customization Mohammadkazem

86 92] as a detection mechanism to circumvent code injec-tion [98] cross-site scripting attacks [24 93] buffer over-flows [24 98] and SQL injection [24 39 40 60 102] Thiswork proposes decoder-level information-flow tracking asa detection mechanism for Spectre and addresses severalassociated challenges

3 Assumptions and Threat ModelAmong the Spectre variants the Bounds Check Bypass (variant-1) is the one which has had the highest impact in terms ofaffected platforms and devices Therefore the main focus ofthis work is to mitigate the variant-1 attack at an acceptablelevel of performance However owing to the flexibility ad-vantages of microcode customization we also propose newmitigations against other Spectre variants documented inTable 1

We assume that the goal of the attacker is to read arbi-trary memory contents by exploiting a Spectre gadget Alsowithout loss of generality in this work we defend against aSpectre attacker that targets the most obvious and the mostimportant victim ie the OS kernel but our frameworkallows us to extend the proposed techniques to mitigate dif-ferent targets such as virtual machines and browsers Ourattack model assumes an adversary with the following capa-bilities

bull Information Leakage We assume that the adver-sary has the ability and privilege to probe flush orevict any cache line including that of the kernel at anyparticular time Also they can make precise timingmeasurements using instructions such as rdtsc andrdtscp

bull Co-location We assume that the adversary is colo-cated with the victim and can not only leak informa-tion through cache-based side channels but can alsomistrain shared branch predictor structures includingbranch target buffers and return address stacks andcan further influence the branch outcome of the victimas described in the variant-2 attack

bull User-Mode Access We assume that the adversaryhas standard user-mode access and can invoke anysystem call with arbitrary and carefully chosen ar-guments and can further access devices such as thekeyboard or network to feed the kerneldriver codewith carefully chosen data

Moreover we assume that the only communication chan-nel between the attacker and the kernel code is throughmicro-architectural side channels and the attacker does nothave any other channel direct or indirect to leak data Amongall the micro-architectural side-channels data caches are theones that are predominantly used and are the easiest to ex-ploit Thus the mitigation strategies described in this workprimarily focus on those variants of the attack that use thedata cache as their covert channel to leak information Again

Configuration Registers (MSRs)

Mac

ro-o

p qu

eue

microop-Queue

Scratchpad Regs

Runtime System

DLIFTMacro-op Stream

Macro-to-microop Translation Unit

CSD-Enabled Decoder

INC [0x802ac] FENCE LD t0[0x802ac]ADD t0 t0 1ST [0x802ac] t0 Di

spat

cher Mistaint

Detection and Recovery

Figure 3 Architectural Overview

due to the flexiblity and programmability of the proposedstrategy it is possible to easily extend the approach to miti-gate other side-channels such as functional unit contentionand AVX timing channels [76]

4 Architectural OverviewFigure 3 gives the architectural overview of our defensestrategy ndash context-sensitive fencing The central piece of theproposed architecture is an x86 microcode engine that hascontext-sensitive decoding (CSD) capabilities [80] allow-ing it to optionally translate a native x86 instruction into acustomizable alternate set of micro-ops In this work weleverage this capability to perform the surgical insertion ofspeculation fences (some existing and some newly proposed)at potentially vulnerable Spectre code targets To this end weintroduce new custommicro-op flows and new configurationmechanisms that trigger such micro-op flowsThe CSD-enabled microcode engine is provisioned with

fine-grained reconfiguration capabilities via a set of model-specific registers (MSRs) that can control the frequency typeand enforcement criteria of speculation fences inserted intothe dynamic instruction stream Such a fine-grained recon-figuration capability is especially important to this workbecause speculation fences are particularly expensive allow-ing us to surgically insert fences that impose varying degreesof restrictions on speculative execution depending upon theruntime conditions current level of threat and the nature ofthe code being executedMoreover the context-sensitive fencing framework also

benefits from a novel decoder-level information flow track-ing (DLIFT) engine that has the ability to identify untrustedinstructions that are potentially in Spectre gadgets and trig-ger alternate micro-op flows that insert speculation fencesOwing to its decoder-level and inherently speculative imple-mentation DLIFT relies on a mistaint detection and recoveryhardware implemented in the execute stage to avoid over-tainting and undertainting scenarios Finally the proposeddefense framework also includes hardware and microcode-level mechanisms to achieve branch predictor state isola-tion across protection domains to mitigate the variant-2 andvariant-4 attacks

Table 2 List of Intelrsquos Serializing InstructionsType Instr Desc

PrivilegedSeria

lizingInstructions INVD Invalidate Internal Caches

INVEPT Invalidate Translations from EPTINVLPG Invalidate TLB EntriesINVVPID Invalidate Translations Based on VPIDLIDT Load Interrupt Descriptor Table RegisterLGDT Load Global Descriptor Table RegisterLLDT Load Local Descriptor Table RegisterLTR Load Task RegisterMOV Move to Control RegisterMOV Move to Debug Register

WBINVD Write Back and Invalidate CacheWRMSR Write to Model Specific Register

Non

Priv CPUID CPU Identification

IRET Interrupt ReturnRSM Resume from System Management Mode

Mem

Ordering SFENCE Store Fence

LFENCE Load Fence

MFENCE Memory Fence

5 Design and ImplementationIn this section we describe in greater detail the architecturaltechniques and building blocks that together constitute theproposed defense strategy ndash context-sensitive fencing Firstwe describe a microcode customization framework that en-ables the surgical insertion of fences to secure speculativeexecution with minimal performance impact Second we ex-amine the full design space of fences to provide the isolationproperties we need without undue obstruction of existingpipeline performance features Third we propose a decoder-level information flow tracking (DLIFT) technique to followpotentially malicious execution patterns and trigger securemicrocode customization Finally we propose protectionmechanisms that circumvent the mistraining of the branchpredictor and associated structures

51 Microcode CustomizationSpeculation fences are a processorrsquos primary mechanism tooverride speculative execution For Spectre variant-1 context-sensitive fencing works with CSD by providing alternate de-codings of all load instructions with the alternate decodingalways including a fence micro-op that appears before theload micro-op The alternate decoding will then be triggeredor not based on runtime conditions The first considerationthen is what fence instruction to incorporate Most pro-cessors already provide a variety of fences and serializinginstructions We first study the performance and securityimpact of the existing suite of fences and serializing instruc-tions and then explore and propose new speculation fencesand context-sensitive fencing techniques that manage theimpact on performance For Spectre variant-2 CSF works

micro-op

Que

ue

Macro-op Stream Macro-to-microop

Translation Unit

CSD-Enabled Decoder

FENCE Injection

Point

micro-op

Dis

patc

her

Load Store-Queue

FU1 FU2

RSRS

RSRSRS

Cache

Cac

he

Con

trolle

r

Mem

Mem

C

ontro

ller

Fence Enforcement Point

Figure 4 Fence Enforcement Points

with CSD by providing alternate decodings of some con-trol flow and possibly insert fences andor branch predictorresetting micro-ops ndash more detail is in Section 53

511 Serializing Instructions and Memory FencesSerializing instructions are the strictest amongst all specula-tion fences and completely override speculative executionUpon decoding a serializing instruction the processor stallsfetching any subsequent instruction until all instructionspreceding the serializing instruction retire Due to the highpipeline depth and issue width of modern out-of-order su-perscalars the usage of serializing instructions could resultin long delays and considerable throughput loss Memoryfences on the other hand enforce a memory serializationpoint in the program instruction stream More specificallythese fences ensure that memory operations that appear inexecution after the fence are stalled until all the outstandingmemory requests (including the fence) complete executionTable 2 shows a complete list of Intelrsquos existing serializ-

ing instructions and fences [48] Most of the serializing in-structions need to be run in privileged mode which restrictstheir usage in defenses for victims that lack sufficient privi-leges (eg browsers) Moreover a majority of them modifythe state or contents of architectural registers the programcounter cache TLB etc and only serialize execution as aside effect thereby requiring an additional backuprestorestep when used for the sole purpose of serializing executionAn exception to this is the MOV to debug register instructionthat does not corrupt any architectural state if not actuallyin debug modeThe SFENCE instruction does not allow stores to pass

through it but does not affect loads That is it enforces thatall stores that precede the SFENCE are executed to comple-tion before any store that succeeds the SFENCE is fetchedThe MFENCE instruction restricts all memory operationsfrom passing through it Unlike SFENCE and MFENCE thatare memory ordering operations LFENCE performs a seri-alizing operation In particular LFENCE does not stop the

processor from fetching and decoding instructions that ap-pear after the LFENCE but it restricts the dispatch of suchinstructions until the instructions preceding the LFENCEcomplete execution Since LFENCE serializes execution andyet makes some progress in the front-end it has been recom-mended by Intel as a low-overhead fence that can be insertedat vulnerable points in execution to defend against Spec-tre [45] In other words while serializing instructions suchas CPUID are enforced at Fetch LFENCE is enforced at theinstruction queue level Therefore in this work we start withthe LFENCE and propose new fences that come with fewerrestrictions different enforcement policies andor sport ad-ditional optimizations

512 Fence Enforcement PoliciesIt is important to note that none of the existing instructionsthat provide fence support were actually created for the pur-pose we (or the whole industry) need for Spectre mitigationThus this work will examine existing fences variants ofexisting fences and also introduce a new fence primitive Tobetter understand the design landscape we examine severalpossible properties of fences in this section

Early vs Late Enforcement We first categorize fencesinto early-enforced and late-enforced fences based on thepipeline stage at which they are enforced In particular werefer to any serializing instruction such as an LFENCE thatis enforced at the instruction queue or earlier in the pipelineas early-enforced If the fence is enforced at a later stage suchas the reservation station the loadstore queue or the cachecontroller we refer to it as a late-enforced fence Late enforce-ment in essence shifts the fence enforcement point towardsthe leaking structure (eg the cache) reducing the impacton instructions that do not access that structure and allow-ing for the enforcement of more fine-grained serializationrules (eg allow cache hits but not misses) Figure 4 showspotential fence enforcement points at various stages in theprocessor pipelineIt is important to note that the later a fence is enforced

the fewer the side channels it protects against For exampleIntelrsquos LFENCE prevents information leakage through all mi-croarchitectural structures that appear after the instructionqueue However to prevent information leakage through theinstruction cache side channel we would have to resort toa regular serializing instruction or an MFENCE resultingin prohibitively high performance overheads Similarly afence enforced at the data cache controller level would onlymitigate data cache-based side channels and will not protectagainst an FPU-based side-channel Instead the FPU-basedside-channel may be mitigated using a different fence thatis appropriately configured and enforced at the reservationstation or the FPU

Strict vs Relaxed Enforcement Depending upon howprohibitive they are we next classify fences into strict andrelaxed Strict fences are highly prohibitive and do not allow

Table 3 Characteristics of Different Fence Types

Fence Name EnforcementPoint

StrictRelaxed

Instructionsnot Allowed

MitigatesVariants

ExistingNew

Intelrsquos SIs (CPUID) Fetch Strict All All ExistingLFENCE IQ Strict All All Existing

LSQ-LFENCE LSQ Relaxed Ld v1 NewLSQ-MFENCE LSQ Relaxed LdampSt v1v11v12 New

CFENCE CC Relaxed None v1 New CC Cache Controller RS Reservation Station

any instructions to pass through them until the fence retireswhereas relaxed fences allow certain types of instructionsto pass through them For example an SFENCE that is en-forced at the loadstore queue and allows all instructions topass except stores that have a greater sequence number thanitself is a late-enforced and relaxed fence On the other handall x86 serializing instructions including LFENCE are early-enforced and strict Customizing the micro-op stream withearly-enforced and strict fences typically results in slower ex-ecution when compared to customization with late-enforcedand relaxed fences However with carefully enforced con-straints the late-enforced and relaxed fences could offer sim-ilar if not better security guarantees

Early vs Late Commit A fence typically remains ef-fective until it gets committed or squashed Based on Intelrsquosmanual [47] a serializing instruction is only allowed to becommitted if there is no preceding outstanding store thatis waiting to be written back While this behavior might benecessary for device synchronization or memory orderingenforcement for the purpose of securing speculative exe-cution against Spectre attacks there is no need to wait forstores to be written back This is because write buffers arenrsquotcommitted to the cache until the store reaches retirementand therefore the fact that a store is waiting for an outstand-ing writeback request to complete is enough evidence that itdid not occur along a misspeculated path Allowing a fenceto commit early without waiting for preceding outstandingstores to write back can considerably improve performancebecause as soon as a fence gets committed a stream of in-structions can advance further in the pipeline Therefore inthis work we propose and study the effects of a late-commitversion of each fence that does not wait for stores to be writ-ten back However we note that these versions should onlybe limited to the security use case we describe and shouldnot be used for synchronization

Newly Proposed Fences Table 3 summarizes the char-acteristics of different existing and newly proposed fencesAlthough the existing set of fences defend against all vari-ants they incur prohibitively high costs on performance dueto their strict enforcement constraints To better understandthe potential for fences beyond those that already exist wepropose and evaluate three new types of fences In an attackscenario we typically insert one of these fences between abranch instruction and a load instruction that potentiallyleaks sensitive information via a cache side-channel Theproposed fences more effectively and efficiently mitigate the

high impact variant-1 attacks by preventing informationleaks along misspeculated paths through cache-based sidechannels We describe each of them in greater detail below

The LSQ-LFENCE and the LSQ-MFENCE are relaxed fencesenforced at the loadstore queue While LSQ-LFENCE fenceis in effect it does not allow any subsequent load instructionto be issued out of the loadstore queue thereby preventingthe cache state from being changed by load instructions onmisspeculated paths Thus the LSQ-LFENCE mitigates theSpectre variant-1 attack On the other hand the more restric-tive LSQ-MFENCE does not allow any subsequent memoryinstruction (both loads and stores) to be issued out of theloadstore queue until the fence commits In addition tomitigating the variant-1 attack that the LSQ-LFENCE pro-tects against the LSQ-MFENCE mitigates the variant-11and variant-12 attacks that exploit store-to-load forwardingbetween speculative loads and stores

The CFENCE is a relaxed fence and is enforced at the cachecontroller level using the following set of rules First like anyother fence it allows all preceding instructions to proceedSecond since store instructions do not commit the contentsof the write buffer until the instruction retires they are unaf-fected by the CFENCE Finally it labels any subsequent loadas a non-modifying load and allows it to pass through thefence but the load is restricted from modifying the cachestate In particular a non-modifying load that results in acache hit is allowed to read the contents of the cache but isrestricted from changing the LRU and other metadata bitsA non-modifying load that results in a cache miss is markedas uncacheable allowing the memory read request to com-plete without altering the cache state In this way we avoidupdating the cache state upon encountering a speculativeload and donrsquot leave any observable cache footprints alongmisspeculated paths thereby mitigating the variant Spectrevariant-1 attack In addition due to locality of referencesthe miss rate of a reasonable program is typically very lowand therefore using CFENCE results in considerably lowerperformance overhead than other types of fencesNote that this can be applied recursively at each cache

level For example an L2 hit (L1 miss) will bypass the L1cache and not initiate a fill then read from the L2 cachewithout altering the LRU bits

513 Fence Frequency OptimizationThe most naiumlve yet secure way to insert fences is to liberallyinstrument every instruction of a vulnerable type (eg loadinstructions in the case of cache side-channels) in the pro-gram and add the fence micro-op to all of them In fact notevery instance of a vulnerable instruction type is necessarilyvulnerable for example all the loads in the program arenrsquotvulnerable to speculative attacks via cache side-channelsTherefore we can reduce the number of fences insertedHowever since failing to insert fences even in one scenariowould enable a Spectre attacker to read the whole victimrsquos

memory space it is of crucial importance that fence fre-quency optimizations be conducted meticulously In the fol-lowing we introduce two secure optimizations for reducingthe number of fencesBasic Block-Level Fence InsertionThe source of the Spec-tre attack is dynamic control speculation which implies thatthe speculation begins with a branch prediction and theprocessor starts speculatively executing along the predictedpath To fully mitigate this attack we want a fence betweeneach branch and subsequent loads but if one branch is fol-lowed by four loads we only need one fence to protect allfour Thus in this optimization we propose to only instru-ment (provide the alternate decoding for) the first instanceof a vulnerable instruction type (eg first load) of each ba-sic block and it is safe to leave the rest of the instructionsuninstrumented This is simply implemented by setting aflag in hardware whenever a branch is decoded then inserta micro-op in the alternative load decoding that along withthe fence also resets the flag When the flag is not set theoriginal decoding is usedTaint-Based Fence Insertion However even one load perbasic block is likely still too conservative In all known in-stances of the attacks the attacker performs some operations(mostly memory read) based on untrusted data that leadsto the information leak For example in Spectre variant-1the attacker provides an untrusted out-of-bound index to anarray In this work we assume any information that comesfrom the user address space and input devices (eg via thex86 IN instruction) as untrusted In addition we also con-sider DMArsquod pages as untrusted for which we rely on theIOMMU [5] to mark DMArsquod pages as tainted in the pagetable For the taint-based fence insertion optimization wepropose the insertion of fences for only vulnerabletaintedloads that operate on untrusted data by leveraging a noveldecoder-level information flow tracking (DLIFT) strategydescribed in the following section

52 Decoder-Level Information Flow TrackingAs its name indicates the distinguishing feature that makesDecoder-Level Information Flow Tracking (DLIFT) uniquefrom typical hardware taint tracking systems is that it canprovide the taint information at the decoder stage of thepipeline rather than at commit stage This is a critical distinc-tion for two reasons First of course our solution is decoder-based Second and perhaps more important execute- orcommit-based taint tracking comes too late in the pipelinefor any speculation-based attack and is of little useHowever commit is still the only stage where taint in-

formation is guaranteed to be correct so the design of adecode-based taint tracker is challenging In particular sincethe front-end of the pipeline typically runs far ahead in ex-ecution than the rest of the pipeline reading actual taintsfrom register files at the decode stage will read inaccuratevalues Therefore in the proposed DLIFT framework we

Macro-to-microop Translation

Unit

CSD-Enabled Decoder

Dis

patc

her

FetchTaint

Evaluator

Load Store Queue

2

Reg File

Actu

al T

aint

Bits

3

4

5

TLB

Tain

t Bitm

ap1

ADD rax rbx

raxSpecTaint |= rbxSpecTaint

Mistaint Recovery -Flushing the pipeline -Restarting translation and execution -Recovering speculative taint map

EA T

aint

CommitUnit

FetchDecode Execute

Speculative Taint Map

~Spec_Taint ampamp

Evaluated_Taint

True

Commit

Arch TaintMap

Arch RF

Figure 5 DLIFT integration with a CSD-enabled pipeline

separate the taint information into four taint structures ndash (a)a decoder-level taint map that tracks and maintains the taintinformation for architectural registers (b) the physical reg-ister file augmented with taint information that maintainsdynamically computed taint information at execute (c) theTLB and page tables augmented with a taint bitmap to trackcache block-level taint information and (d) a commit-leveltaint map that maintains verified architectural register taintinformation While the latter three structures are typical ofany standard DIFT implementation [79] the first structureis a new addition proposed by this work Instruction trans-lation thus depends on some speculative taint tracking thatmight not be exact potentially leaving vulnerable instruc-tions unfenced (ie translating to non-secure micro-ops) Forthis reason DLIFT relies on a mistaint recovery mechanismthat upon detecting a mistaint at the execution stage redi-rects and restarts the execution from the incorrectly taintedinstructionFigure 5 shows the integration of DLIFT with a CSD-

enabled pipeline The DLIFT engine maintains a taint mapthat stores a speculative taint bit for each architectural reg-ister For each incoming instruction (bull1 ) the DLIFT engineevaluates the taint of the destination register(s) based on thespeculative taint bits of the source registers Taint evaluationfollows the standard DIFT procedure [79] If a source regis-ter is marked as tainted in the speculative taint map DLIFTpropagates this taint to the destination register and furthertriggers context-sensitive translation (bull2 ) of the instruction(eg with speculative fence insertion) depending upon theconfigured levels of performance and security

Tracking taints at the decoder level is not always straight-forward First during speculative execution the DLIFT en-gine continues to propagate taints along the misspeculatedpath potentially leaving the taint map in an inconsistentstate upon recovery from a branch misprediction To thisend the DLIFT engine rolls back the state of the taint map

to its commit-level counterpart as part of the branch mis-prediction recovery Second the effective addresses of loadinstructions is usually not known at the decoder stage dueto which the DLIFT engine cannot accurately compute andpropagate taint information A conservative approach thatmarks all loads with unknown effective addresses as taintedcould severely degrade performance since overtainting typi-cally results in a high frequency of fence insertion Thereforewe take a more optimistic approach by assuming loads withunknown effective addresses are untainted and further relyon our mistaint detection mechanism at the execute stageto validate the predicted taints against the dynamically eval-uated taints (bull3 andbull5 ) In the event a mistaint is detectedwe update the speculative taint map flush the pipeline andrestart the translation and execution of the mistainted in-struction Note that we only perform mistaint recovery forthe undertainting scenario since it could potentially leavevulnerable instructions unfenced

Furthermore the DLIFT engine is capable of following in-struction sequences and execution patterns such as a taintedload followed by a branch allowing for the seamless de-tection of different Spectre gadget variants In this way al-though speculative the DLIFT engine has the potential forfast recovery from misspeculation and further allows themicrocode customization framework to perform a more tar-geted and surgical insertion of fences for particular vulnera-ble targets

53 Mitigations for Control-flow MistrainingTwo Spectre variants (v2 and v5) rely on the mistrainingof the branch predictor and the return stack buffer to influ-ence the victimrsquos branch outcome across different protec-tion domains To mitigate these attacks we examine severalhardening mechanisms In the simplest case where we areprotecting kernel branches from getting influenced we caninstrument syscall instructions to enforce branch predictorstate isolation If we want to protect a wider range of domaincrossings we could rely on a simple hardware mechanismthat identifies control transfer to a domain with a higherprivilege and then sets a flag The first control flow instruc-tion that decodes after that flag is set would then triggeran alternate decoding and reset the flag The alternate de-coding would insert a fence (to protect against this branchbeing mispredicted) then execute micro-ops that enforcebranch predictor state isolation and subsequently resumethe original control flowThis solution then is flexible enough that any region of

code could be protected in this way by configuringmodel spe-cific range registers (MSRRs) to indicate a protected regionthat would always reset the branch predictor upon entrythis then even prevents mistraining of sensitive code ofdifferent threads within the same process To prevent theMSRRs from being tampered by the attacker we only allow

Table 4 Architecture Detail for the Baseline x86 CoreBaseline Processor

Frequency 33 GHz I cache 32 KB 8 wayFetch width 4 fused uops D cache 32 KB 8 wayIssue width 6 unfused uops ROB size 168 entriesINTFP Regfile 160144 regs IQ 54 entriesRAS size 816 32 entries BTB size 256 512 1024 entriesLQSQ size 6436 entries Functional Int ALU(6) Mult(1)Branch Predictor LTAGE Units FP ALUMult(2) SIMD(2)

the operating system to configure MSRRs and we use specu-lative fences to guard x86rsquosWRMSR instruction Furthermorein the case of remapping via mmap() the operating systemis also responsible for reconfiguring MSRRs

The easiest and the most heavyweight way to isolate theBP state is to clear all the states by a series of returnbranchmicro-ops However such a hard reset of the BP can be morequickly and efficiently performed using a special micro-opthat clears some or all state of the branch predictor therebyenabling custom solutions to reset different structures withinthe prediction unit if such a micro-op is available In thiswork we assume a special micro-op to clear the branch targetbuffer (BTB) and the return stack buffer (RSB) the primarytargets of the Spectre attackIt should be noted that while resetting the BTB and the

RSB will induce more mispredictions (something the attackerwants) it does the attacker no good with respect to theseSpectre variants because the attacker can no longer controlthe target of the control flow misspeculation

6 MethodologyThis section details the experimental methodology for theperformance and security evaluation of the context-sensitivefencing frameworkOur baseline processor is modeled after the Intel Sandy-

bridge microarchitecture [43] Table 4 shows the architec-tural configuration of our baseline processor in more detailNote that we evaluate against three different branch pre-dictor configurations (with different BTB and RAS sizes) tomeasure the performance impact of our micro-op flows thatperform branch predictor state isolation across protection do-mains to defend against variant-2Wemodel this architectureusing the gem5 [8] architectural simulator which already fea-tures x86 micro-op translation Furthermore gem5 alreadyfeatures a Spectre test infrastructure and a visualization toolthat allow us to validate our claims regarding the securityguarantees of context-sensitive fencing [63]One of the primary goals of this work is to protect ker-

nel memory from being leaked along misspeculated pathsTherefore we use the full system simulation mode of gem5which allows us to boot an Ubuntu 1804 distribution ofLinux with a kernel version of 4813 Furthermore in orderto provide realistic estimates regarding attack coverage andperformance impact of our proposed techniques we selecta good mix of benchmarks that spend different amounts ofexecution time in kernel mode touch different aspects of the

Table 5 Benchmarks DescriptionBenchmark Description Kernel-Time

nginx HTTP Web Server 66ps Process information query 75ping Sends ICMP ECHO_REQUEST to loopback 95ls Performs two level directory listing 79llu Linked list traversal micro-benchmark 7

bzip2 Compression 34gcc C Language optimizing compiler 11

omnetpp Discrete event simulation 15

sjeng Artificial Intelligence (game tree searchand pattern recognition) 22

operating system and perform different number of systemcalls during execution as illustrated in Table 5 To this endwe include four benchmarks bzip2 gcc sjeng omnet fromthe SPEC CPU2006 suite [41] that exhibit varying degreesof instruction-level parallelism Furthermore we use threecommonly used Unix tools that target different functional-ities of the kernel and use different device drivers ndash (ping)that sends and receives ICMP packets (ps aux) that queriesprocess information and (ls ) that queries the filesystemin order to list multi-level directory information In additionwe also add a memory-intensive program llu that allocatesa large amount of memory for a linked list and traversesthe list making random memory accesses [103] Finally wealso evaluate the impact of our fences on the nginx webserver [74] using the wrk [35] framework to generate HTTPrequests

7 EvaluationIn this section we evaluate both security and performanceimpacts of the proposed strategy starting first with securityassessment and following that with performance evaluation

71 Security DiscussionThe proposed defense strategy can mitigate five variants ofthe Spectre attacks We discuss these in detail below

Variants 1 11 and 12 The goal of these variants is toleak memory contents of a victim along a misspseculatedpath by bypassing a bounds check andor further stitch to-gether such Spectre gadgets via store-to-load forwardingAll variants exploit a data cache-based side channel to leakinformation Context-Sensitive Fencing mitigates these vari-ants primarily by preventing information from being leakedalong misspeculated paths via fences that are surgically in-serted into the instruction stream and strategically placedbetween the conditional branch that performs the boundscheck and the first load in the Spectre gadget This workproposes three new fences ndash LSQ-LFENCE LSQ-MFENCEand CFENCE that each defend against the variant-1 attack bydisallowing loads along misspeculated paths to modify thedata cache state LSQ-MFENCE additionally protects againstthe variants 11 and 12 by preventing both speculative loadsand stores from being issued out of the loadstore queue

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

Injected for All Kernel Loads

100

105

110

115

120

125

130

135

140

145

150

155

160

165

170

175

180

185

190

195

200

Nor

mal

ized

Exe

cution

Tim

e

LFENCE-LC

LSQ-MFENCE-LC

CFENCE-LC

Figure 6 Execution Time of Different Fence EnforcementLevels (normalized to insecure execution)

effectively avoiding store-to-load forwarding between spec-ulative loads and stores

The surgical insertion of these fences is facilitated by thedecoder-level information flow tracker (DLIFT) that can fol-low instruction sequences that could serve as Spectre gad-gets as described in Section 52 While the DLIFT itself per-forms speculative tracking of taint information it has theability to detect mistaints and recover within three stages ofthe pipeline We further experimentally show later in thissection that the speculative nature of the DLIFT engine maysometimes result in overtainting (where trusted operand-sinstructions get marked as tainted) but never results in anundetected and unrecovered undertainting (where untrustedoperandsinstructions remain untainted)

Variants 2 and 5 These variants reverse-engineer andmistrain the branch predictor and the return address stack toinfluence the branch outcomes of a victim which in our caseis the Linux kernel In addition to the surgical fence injec-tion and speculative taint tracking context-sensitive fencingfurther employs mistraining mitigations that circumventsuch attempts More specifically we intercept all protectiondomain crossings via syscall and int 0x80 instructions andperform a full reset of the branch target buffer (BTB) and thereturn stack buffer (RSB) clearing all previous state Thisprevents user code from influencing branch outcomes of thekernel code We record no instance of BTB collision betweenuser and kernel branches and we ensure that we alwaysstart with a clean BTB and RSB upon entering kernel mode

To show that our different fences and frequency optimiza-tions close Spectre attacks we use a proof of concept Spectreimplementation and visualization tool [63] The attacks failin every case when we use context-sensitive fencing to insertfences despite our performance optimizations

72 PerformanceFigure 6 measures the performance impact of three differentfences ndash (a) the standard x86 LFENCE (b) the LSQ-MFENCE

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

Injected for All Kernel Loads

100

105

110

115

120

125

130

135

140

145

Nor

mal

ized

Exe

cution

Tim

e

CFENCE-LC

CFENCE-EC

Figure 7 Execution Time of Early and Late Commit ofCFENCE (normalized to insecure execution)

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

00

01

02

03

04

05

06

07

08

09

10

Rat

ioO

ver

All

Loa

ds

bzip2 gcc omnet llu nginx sjeng ls ping ps

Non-Modifying Misses Non-Modifying Hits Misses Hits

Figure 8 Effects of injecting CFENCE on Cache Miss Rate

and (c) the CFENCE all enforced with the standard latecommit pessimistically inserted for every kernel load in theprogram Clearly from the figure the CFENCE incurs thelowest performance overhead since it is less restrictive thanthe other two and is enforced at a much later pipeline stageWe further study the effect of late and early commit policiesfor the CFENCE as shown in Figure 7 The early commitversion of the CFENCE consistently performs better than thelate commit version saving about 4 in overall executiontime on averageOverall the CFENCE reduces the incurred performance

overhead due to fencing by 23X bringing down the exe-cution time overhead from 48 to 21 Furthermore thisperformance improvement is consistent across all the bench-marks the only exception being llu which performs randommemory accesses due to the linked list traversal and suffersfrom a very high cache miss rateIn Figure 8 we study the effect of the CFENCE on cache

miss rate Recall that when a load passes a CFENCE it getsmarked as non-modifying and as a result if it ends up beinga miss in the cache it is deemed uncacheable and therefore

the fetched cache block isnrsquot brought into cache If the missrate of a program is already high as in the case of llu thepresence of a CFENCE in the instruction stream could po-tentially result in under-utilization of the cache since themissed blocks arenrsquot being filled back into the cache whilethe CFENCE is begin enforced For programs that have un-usually low hit rates we suggest using the standard LFENCEinstead of the CFENCE However in those cases it should benoted that the overhead of the mitigation is low regardlessof the fence usedFurthermore it is important to note that the number of

non-modifying loads in the dynamic instruction stream donot necessarily have a negative impact on performancewhile a CFENCE is being enforced For example while theping program has the most non-modifying loads it doesnot suffer much in terms of overall performance becausethe working set fits well into the cache and most of thenon-modifying accesses end up being hits In general wegain more from employing the CFENCE instead of standardserializing fences when the cache hit rate is lowFigure 9 shows the performance of our proposed fence

frequency optimization techniques In the first optimizationwe inject the CFENCE only for tainted loads and branches asindicated by the DLIFT engine This reduces the performanceoverhead of the defense from 21 to 11 on average We fur-ther optimize the number of fences inserted by performingbasic block-level fence insertion where we only instrumentthe first instance of a vulnerable load in each basic block Thisresults in an additional 4 improvement in performance AsFigure 9 shows the once per basic block opitimization is suc-cessful at improving the performance of all the benchmarksexcept llu That is because CFENCE changes the cache accesspattern usually at some cost but occasionally beneficiallyby bypassing accesses and reducing pressure on the cacheIn such a case like llu with its large working set executingfewer fences would then be slightly less beneficial Overallcompared to the state-of-the-art fence injection scheme thatpessimistically injects an LFENCE for all kernel loads ourDLIFT-Based CFENCE injection reduces the performanceoverhead from 48 to just 77 on averageWe next study the accuracy and coverage of our DLIFT

implementation Figure 10 examines overtainting scenarioswhere the DLIFT engine conservatively marks instructionsas tainted when theyrsquore actually not The results are nor-malized to the total number of dynamic load instructionsexecuted Further we examine two scenarios ndash the bars tothe left represent DLIFT-based fencing without the basic-block level fence insertion and the bars to the right includesthe basic-block level fence insertion optimization In mostcases the percentage of overtainted loads remains low inboth scenariosFigure 11 examines undertainting scenarios where the

DLIFT engine optimistically forgoes instrumenting certaininstructions that are actually vulnerable but recovers from

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

CFENCE

100

105

110

115

120

125

130

135

140

145

150

155

160

Nor

mal

ized

Exe

cution

Tim

e

Always Fencing

DLIFT-Based Fencing

DLIFT + Once per BB

Figure 9 Impact of Fence Frequency Optimizations on Exe-cution Time

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

00

01

02

03

04

05

06

07

08

09

10

Rat

ioO

ver

All

Loa

ds

ping ps nginx bzip2 ls sjeng llu omnet gcc

Over-tainted Loads Tainted Loads

Figure 10 Accuracy of DLIFT Overtainting Rate of Loads

such scenarios later in the pipeline We observe that thepercentage of undertainted loads is low and consistentlybelow 10 with the outliers being the kernel-heavy applica-tions ping and ps which have the highest number of taintedinstructions (instructions with at least one tainted sourceregister) as shown in Figure 12Note that our DLIFT engine performs explicit informa-

tion flow tracking modeled after DIFT [79] that tracks copyload-address store-address and computational dependen-cies However we also evaluate CSF with a more conser-vative implicit information flow tracking model in whichwe taint the program counter when the branch outcomedepends upon a tainted value and further track both sidesof the branch This results in more tainted instructions andincurs 118 extra performance overhead on top of our taint-based CFENCE insertionTo evaluate the effect of our mistraining mitigations we

measure the impact of a full micro-op based BTB and RSBreset at every protection domain crossing We do this ex-periment on three separate branch predictor configurations(a) a small predictor with 256 BTB and 8 RAS entries (b) a

pssje

ngbz

ip2

ping llu gc

c ls

nginx

omne

t

Speculative DIFT ndash Sfence

000

001

002

003

004

005

006

007

008

009

Rat

ioof

Under

-tai

nte

dLoa

ds

Figure 11 Coverage of DLIFT Undertainting Rate of Loads

omnet bzip2 llu ls nginx ping ps sjeng gcc

Speculative DIFT ndash Sfence

00

01

02

03

04

05

06

07

08

09

10

Tai

nte

dIn

tstr

uct

ions

Tainted Untatinted

Figure 12 Ratio of instructions marked as tainted by DLIFT

medium predictor with 512 BTB and 16 RAS entries and(c) a large predictor with 1024 BTB and 32 RAS entries Wemeasure an average performance degradation of 27 on oursmall predictor 66 on our medium predictor and 152 onour large predictor Naturally the overhead is almost com-pletely due to the loss of prediction accuracy rather thanthe cost of micro-op expansionIn summary the proposed defense strategy introduces a

flexible microcode customization framework that performthe surgical insertion of newly introduced speculation fencesthat mitigate five variants of the Spectre class of attacksreducing the fencing overhead of state-of-the-art fence-basedSpectre mitigations by a factor of 6

8 ConclusionIn this work we propose context-sensitive fencing (CSF) aset of architectural techniques that provide high-performancedefense against Spectre-style attacks In particular we showthat we can reduce fencing overhead by a factor of 6 com-pared to a conservative fence insertion method

This is done by injecting fence instructions dynamicallyin the decoder with no recompilation and binary translationrequired This allows us to employ runtime information tostrategically insert fences when needed using taint infor-mation and avoiding redundant fences after branches Thiswork also introduces new fence primitives which protectsensitive structures from speculation-based microarchitec-tural effects with minimal impact on instruction throughputin the pipeline

AcknowledgmentsThe authors would like to thank the anonymous reviewersfor their helpful insights This research was supported inpart by NSF Grant CNS-1652925 NSFIntel FoundationalMicroarchitecture Research Grant CCF-1823444 and DARPAunder the agreement number HR0011-18-C-0020

References[1] Onur Acıiccedilmez Ccediletin Kaya Koccedil and Jean-Pierre Seifert 2007 Predict-

ing secret keys via branch prediction In CryptographersacircĂŹ Track atthe RSA Conference

[2] Onur Acıiccedilmez Werner Schindler and Ccediletin K Koccedil 2007 Cachebased remote timing attack on the AES In CryptographersacircĂŹ Trackat the RSA Conference

[3] Shaizeen Aga and Satish Narayanasamy 2017 InvisiMem SmartMemory Defenses for Memory Bus Side Channel In Proceedings ofthe 44th Annual International Symposium on Computer Architecture(ISCA rsquo17) ACM New York NY USA 94ndash106 httpsdoiorg10114530798563080232

[4] AMD 2018 White paper SOFTWARE TECHNIQUES FOR MANAG-ING SPECULATION ON AMD PROCESSORS Technical Report RE-VISION 12418 httpsdeveloperamdcomwp-contentresourcesManaging-Speculation-on-AMD-Processorspdf

[5] Nadav Amit Muli Ben-Yehuda Dan Tsafrir and Assaf Schuster 2011vIOMMU Efficient IOMMU Emulation In USENIX Annual TechnicalConference

[6] ARM 2018 Whitepaper Cache Speculation Side-channelsTechnical Report Version 22 httpsdeveloperarmcom-mediadeveloperpdfSecurity20update201020July2018CacheSpeculationSide-channels-v22pdfrevision=7de26366-a49f-4c23-85f0-34e8f5e38881

[7] Eli Biham and Adi Shamir 1997 Differential fault analysis of secretkey cryptosystems In Annual International Cryptology Conference

[8] Nathan L Binkert Ronald G Dreslinski Lisa R Hsu Kevin T Lim Ali GSaidi and Steven K Reinhardt 2006 The M5 Simulator ModelingNetworked Systems Micro IEEE (2006)

[9] Darrell Boggs Gary Brown Nathan Tuck and K S Venkatraman2015 Denver Nvidiarsquos First 64-bit ARM Processor IEEE Micro (2015)

[10] Chandler Carruth 2018 Mitigating Speculative Attacks inCrypto httpsgithubcomHACS-workshopspectre-mitigationsblobmastercryptoguidelinesmd Online accessed Jul 2018

[11] Chandler Carruth 2018 RFC Speculative Load Hardening (aSpectre variant1 mitigation) httpsdocsgooglecomdocumentd1wwcfv3UV9ZnZVcGiGuoITT61eKo3TmoCS3uXLcJR0edit Onlineaccessed Jul 2018

[12] Chandler Carruth 2018 RFC Speculative Load Hardening (a Spectrevariant1 mitigation) httpslistsllvmorgpipermailllvm-dev2018-March122085html Online accessed Jul 2018

[13] Guoxing Chen Sanchuan Chen Yuan Xiao Yinqian Zhang ZhiqiangLin and Ten H Lai 2018 SgxPectre Attacks Leaking Enclave

Secrets via Speculative Execution CoRR abs180209085 (2018)arXiv180209085 httparxivorgabs180209085

[14] H Chen X Wu L Yuan B Zang P c Yew and F T Chong 2008From Speculation to Security Practical and Efficient InformationFlow Tracking Using Speculative Hardware In 2008 InternationalSymposium on Computer Architecture 401ndash412 httpsdoiorg101109ISCA200818

[15] Shimin Chen Phillip B GibbonsMichael Kozuch and Todd CMowry2011 Log-based Architectures Using Multicore to Help SoftwareBehave Correctly SIGOPS Oper Syst Rev 45 1 (Feb 2011) 84ndash91httpsdoiorg10114519450231945034

[16] Hari Cherupalli Henry Duwe Weidong Ye Rakesh Kumar and JohnSartori 2017 Software-based Gate-level Information Flow Securityfor IoT Systems In Proceedings of the 50th Annual IEEEACM Interna-tional Symposium on Microarchitecture (MICRO-50 rsquo17) ACM NewYork NY USA 328ndash340 httpsdoiorg10114531239393123955

[17] JonathanCorbet 2017 The current state of kernel page-table isolationhttpslwnnetArticles741878 Online accessed Jan 2018

[18] Marc L Corliss E Christopher Lewis and Amir Roth 2003 DISE Aprogrammable macro engine for customizing applications In Com-puter Architecture 2003 Proceedings 30th Annual International Sym-posium on

[19] Marc L Corliss E Christopher Lewis and Amir Roth 2005 Low-overhead interactive debugging via dynamic instrumentation withDISE In 11th International Symposium on High-Performance ComputerArchitecture

[20] Marc L Corliss E Christopher Lewis and Amir Roth 2005 UsingDISE to Protect Return Addresses from Attack SIGARCH ComputArchit News (March 2005)

[21] Jedidiah R Crandall and Frederic T Chong 2004 Minos Control DataAttack Prevention Orthogonal toMemoryModel In Proceedings of the37th Annual IEEEACM International Symposium on Microarchitecture(MICRO 37) IEEE Computer Society Washington DC USA 221ndash232httpsdoiorg101109MICRO200426

[22] John Criswell Nathan Dautenhahn and Vikram Adve 2014 VirtualGhost Protecting Applications from Hostile Operating Systems InProceedings of the 19th International Conference on Architectural Sup-port for Programming Languages and Operating Systems (ASPLOS rsquo14)ACM New York NY USA 81ndash96 httpsdoiorg10114525419402541986

[23] John Criswell Andrew Lenharth Dinakar Dhurjati and Vikram Adve2007 Secure Virtual Architecture A Safe Execution Environmentfor Commodity Operating Systems In Proceedings of Twenty-firstACM SIGOPS Symposium on Operating Systems Principles (SOSP rsquo07)ACM New York NY USA 351ndash366 httpsdoiorg10114512942611294295

[24] Michael Dalton Hari Kannan and Christos Kozyrakis 2007 RakshaA Flexible Information Flow Architecture for Software Security InProceedings of the 34th Annual International Symposium on ComputerArchitecture (ISCA rsquo07) ACM New York NY USA 482ndash493 httpsdoiorg10114512506621250722

[25] James C Dehnert Brian K Grant John P Banning Richard JohnsonThomas Kistler Alexander Klaiber and Jim Mattson 2003 The Trans-meta Code Morphing Software using speculation recovery andadaptive retranslation to address real-life challenges In Proceedingsof the international symposium on Code generation and optimizationfeedback-directed and runtime optimization

[26] Matthew DeVuyst Ashish Venkat and Dean M Tullsen 2012 Ex-ecution migration in a heterogeneous-isa chip multiprocessor InProceedings of the Seventeenth International Conference on Architec-tural Support for Programming Languages and Operating Systems

[27] Craig Disselkoen David Kohlbrenner Leo Porter and Dean Tullsen2017 Prime+Abort A Timer-Free High-Precision L3 CacheAttack using Intel TSX In 26th USENIX Security Symposium

(USENIX Security 17) USENIX Association Vancouver BC 51ndash67 httpswwwusenixorgconferenceusenixsecurity17technical-sessionspresentationdisselkoen

[28] Xiaowan Dong Zhuojia Shen John Criswell Alan Cox and SandhyaDwarkadas 2018 Spectres Virtual Ghosts and Hardware SupportIn Proceedings of the 7th International Workshop on Hardware andArchitectural Support for Security and Privacy (HASP rsquo18) ACM NewYork NY USA Article 5 9 pages httpsdoiorg10114532142923214297

[29] Xiaowan Dong Zhuojia Shen John Criswell Alan L Cox and Sand-hya Dwarkadas 2018 Shielding Software From Privileged Side-Channel Attacks In 27th USENIX Security Symposium (USENIX Se-curity 18) USENIX Association Baltimore MD 1441ndash1458 httpswwwusenixorgconferenceusenixsecurity18presentationdong

[30] Dmitry Evtyushkin Ryan Riley Nael CSE Abu-Ghazaleh ECE andDmitry Ponomarev 2018 BranchScope A New Side-Channel Attackon Directional Branch Predictor In Proceedings of the Twenty-ThirdInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems (ASPLOS rsquo18) ACM New York NYUSA 693ndash707 httpsdoiorg10114531731623173204

[31] Karine Gandolfi Christophe Mourtel and Francis Olivier 2001 Elec-tromagnetic analysis Concrete results In International Workshop onCryptographic Hardware and Embedded Systems

[32] Qian Ge Yuval Yarom David Cock and Gernot Heiser 2018 Asurvey of microarchitectural timing attacks and countermeasures oncontemporary hardware Journal of Cryptographic Engineering 8 1(2018) 1ndash27

[33] Daniel Genkin Adi Shamir and Eran Tromer 2014 RSA key ex-traction via low-bandwidth acoustic cryptanalysis In InternationalCryptology Conference

[34] Daniel Genkin Adi Shamir and Eran Tromer 2016 Acoustic Crypt-analysis Journal of Cryptology (2016)

[35] Github [n d] wrk - a HTTP benchmarking tool httpsgithubcomwgwrk Online accessed Jun 2018

[36] Daniel Gruss Moritz Lipp Michael Schwarz Richard Fellner Cleacute-mentine Maurice and Stefan Mangard 2017 Kaslr is dead long livekaslr In International Symposium on Engineering Secure Software andSystems Springer 161ndash176

[37] Daniel Gruss Raphael Spreitzer and Stefan Mangard 2015Cache Template Attacks Automating Attacks on InclusiveLast-Level Caches In 24th USENIX Security Symposium (USENIXSecurity 15) USENIX Association Washington DC 897ndash912httpswwwusenixorgconferenceusenixsecurity15technical-sessionspresentationgruss

[38] Maurice CleacutementineWagner Klaus Gruss Daniel and StefanMangard2016 Flush+Flush A Fast and Stealthy Cache Attack

[39] V Haldar D Chandra andM Franz 2005 Dynamic taint propagationfor Java In 21st Annual Computer Security Applications Conference(ACSACrsquo05) 9 ppndash311 httpsdoiorg101109CSAC200521

[40] W Halfond A Orso and P Manolios 2008 WASP Protecting WebApplications Using Positive Tainting and Syntax-Aware EvaluationIEEE Transactions on Software Engineering 34 1 (Jan 2008) 65ndash81httpsdoiorg101109TSE200770748

[41] John L Henning 2006 SPEC CPU2006 Benchmark DescriptionsSIGARCH Comput Archit News (Sept 2006)

[42] Jonathan J Hoch and Adi Shamir 2004 Fault analysis of streamciphers In International Workshop on Cryptographic Hardware andEmbedded Systems

[43] Intel 2008 2nd Generation Intel Core vPro Processor Family Tech-nical Report httpwwwintelcomcontentdamdocwhite-paperperformance-2nd-generation-core-vpro-family-paperpdf

[44] Intel 2014 Software guard extensions programming reference (2014)httpssoftwareintelcomsitesdefaultfiles329298-001pdf

[45] Intel 2018 White paper Intel Analysis of Speculative Exe-cution Side Channels Technical Report 336983-001 Revision10 httpsnewsroomintelcomwp-contentuploadssites11201801Intel-Analysis-of-Speculative-Execution-Side-Channelspdf

[46] Intel 2018 White paper Retpoline A Branch Target In-jection Mitigation Technical Report 337131-003 Revision003 httpssoftwareintelcomsitesdefaultfilesmanaged1d46Retpoline-A-Branch-Target-Injection-Mitigationpdf

[47] Intel Corporation 2009 Intelreg 64 and IA-32 Architectures OptimizationReference Manual

[48] Intel Corporation 2011 Intel 64 and IA-32 Architectures SoftwareDeveloperrsquos Manual Intel Corporation

[49] Project Zero Jann Horn 2018 Reading privileged memory with a side-channel httpsgoogleprojectzeroblogspotcom201801reading-privileged-memory-with-sidehtml Online accessed Jan 2018

[50] Morteza Mohajjel Kafshdooz Mohammadkazem Taram Sepehr As-sadi and Alireza Ejlali 2016 A Compile-Time Optimization MethodforWCET Reduction in Real-Time Embedded Systems Through BlockFormation ACMTrans Archit Code Optim 12 4 Article 66 (Jan 2016)25 pages httpsdoiorg1011452845083

[51] R Karri K Wu P Mishra and Yongkook Kim 2002 Concurrenterror detection schemes for fault-based side-channel cryptanalysisof symmetric block ciphers IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems 21 12 (Dec 2002) 1509ndash1517httpsdoiorg101109TCAD2002804378

[52] Mehmet Kayaalp Khaled N Khasawneh Hodjat Asghari EsfedenJesse Elwell Nael Abu-Ghazaleh Dmitry Ponomarev and AamerJaleel 2017 RIC Relaxed Inclusion Caches for Mitigating LLC Side-Channel Attacks In Proceedings of the 54th Annual Design AutomationConference 2017 (DAC rsquo17) ACM New York NY USA Article 76 pages httpsdoiorg10114530616393062313

[53] Khaled N Khasawneh Esmaeil Mohammadian Koruyeh ChengyuSong Dmitry Evtyushkin Dmitry Ponomarev and Nael Abu-Ghazaleh 2018 SafeSpec Banishing the Spectre of a Meltdown withLeakage-Free Speculation arXiv preprint arXiv180605179 (2018)

[54] Vladimir Kiriansky and Carl Waldspurger 2018 Speculative bufferoverflows Attacks and defenses arXiv preprint arXiv180703757(2018)

[55] Vladimir Kiriansky and Carl Waldspurger 2018 Speculative BufferOverflows Attacks and Defenses arXiv preprint arXiv180703757(2018)

[56] Paul Kocher 2018 Spectre Mitigations in MicrosoftrsquosCC++ Compiler httpswwwpaulkochercomdocMicrosoftCompilerSpectreMitigationhtml Online accessedJul 2018

[57] Paul Kocher Daniel Genkin Daniel Gruss Werner Haas MikeHamburg Moritz Lipp Stefan Mangard Thomas Prescher MichaelSchwarz and Yuval Yarom 2018 Spectre Attacks Exploiting Specu-lative Execution ArXiv e-prints (Jan 2018) arXiv180101203

[58] Esmaeil Mohammadian Koruyeh Khaled N Khasawneh ChengyuSong and Nael Abu-Ghazaleh 2018 Spectre Returns SpeculationAttacks using the Return Stack Buffer In 12th USENIXWorkshop on Of-fensive Technologies (WOOT 18) USENIX Association Baltimore MDhttpswwwusenixorgconferencewoot18presentationkoruyeh

[59] Esmaeil Mohammadian Koruyeh Khaled N Khasawneh ChengyuSong and Nael Abu-Ghazaleh 2018 Spectre Returns SpeculationAttacks using the Return Stack Buffer In 12th USENIXWorkshop on Of-fensive Technologies (WOOT 18) USENIX Association Baltimore MDhttpswwwusenixorgconferencewoot18presentationkoruyeh

[60] Monica S Lam Michael Martin Benjamin Livshits and John Wha-ley 2008 Securing Web Applications with Static and DynamicInformation Flow Tracking In Proceedings of the 2008 ACM SIG-PLAN Symposium on Partial Evaluation and Semantics-based Pro-gram Manipulation (PEPM rsquo08) ACM New York NY USA 3ndash12

httpsdoiorg10114513284081328410[61] Moritz Lipp Michael Schwarz Daniel Gruss Thomas Prescher

Werner Haas Anders Fogh Jann Horn Stefan Mangard Paul KocherDaniel Genkin Yuval Yarom and Mike Hamburg 2018 MeltdownReading Kernel Memory from User Space In 27th USENIX SecuritySymposium (USENIX Security 18)

[62] Fangfei Liu Yuval Yarom Qian Ge Gernot Heiser and Ruby BLee 2015 Last-Level Cache Side-Channel Attacks Are PracticalIn Proceedings of the 2015 IEEE Symposium on Security and Privacy(SP rsquo15) IEEE Computer Society Washington DC USA 605ndash622httpsdoiorg101109SP201543

[63] Jason Lowe-Power 2018 Visualizing Spectre with gem5 httpwwwlowepowercomjasonvisualizing-spectre-with-gem5html Onlineaccessed Jun 2018

[64] Jason Lowe-Power Venkatesh Akella Matthew K Farrens Samuel TKing and Christopher J Nitta 2018 Position Paper A Case forExposing Extra-architectural State in the ISA In Proceedings of the7th International Workshop on Hardware and Architectural Support forSecurity and Privacy (HASP rsquo18) ACM New York NY USA Article 86 pages httpsdoiorg10114532142923214300

[65] Scott A Mahlke David C Lin William Y Chen Richard E Hank andRoger A Bringmann 1992 Effective compiler support for predicatedexecution using the hyperblock In MICRO

[66] Giorgi Maisuradze and Christian Rossow 2018 Ret2Spec SpeculativeExecution Using Return Stack Buffers In Proceedings of the 2018 ACMSIGSAC Conference on Computer and Communications Security (CCSrsquo18) ACM New York NY USA 2109ndash2122 httpsdoiorg10114532437343243761

[67] Microsoft Security Response Center (MSRC) Matt Miller 2018Analysis and mitigation of speculative store bypass (CVE-2018-3639) httpsblogstechnetmicrosoftcomsrd20180521analysis-and-mitigation-of-speculative-store-bypass-cve-2018-3639 Onlineaccessed Jul 2018

[68] Keaton Mowery Sriram Keelveedhi and Hovav Shacham 2012 AreAES x86 cache timing attacks still feasible In Proceedings of the 2012ACM Workshop on Cloud computing security workshop

[69] Oleksii Oleksenko Bohdan Trach Tobias Reiher Mark Silbersteinand Christof Fetzer 2018 You Shall Not Bypass Employing datadependencies to prevent Bounds Check Bypass CoRR abs180508506(2018) arXiv180508506 httparxivorgabs180508506

[70] Yossef Oren Vasileios P Kemerlis Simha Sethumadhavan and An-gelos D Keromytis 2015 The Spy in the Sandbox Practical CacheAttacks in JavaScript and Their Implications In Proceedings of the22Nd ACM SIGSAC Conference on Computer and CommunicationsSecurity

[71] Guilherme Ottoni Thomas Hartin ChristopherWeaver Jason BrandtBelliappa Kuttanna and HongWang 2011 Harmonia A TransparentEfficient and Harmonious Dynamic Binary Translator Targeting theIntelampReg Architecture In Proceedings of the 8th ACM InternationalConference on Computing Frontiers (CF rsquo11) ACM New York NYUSA Article 26 10 pages httpsdoiorg10114520166042016635

[72] Andrew Pardoe 2018 Spectre mitigations in MSVC httpsblogsmsdnmicrosoftcomvcblog20180115spectre-mitigations-in-msvc Online accessed Jul 2018

[73] Joseph CH Park and Mike Schlansker 1991 On predicated executionHewlett-Packard Laboratories Palo Alto California

[74] Will Reese 2008 Nginx The High-performance Web Server andReverse Proxy Linux J 2008 173 Article 2 (Sept 2008) httpdlacmorgcitationcfmid=14122021412204

[75] Ryan Roemer Erik Buchanan Hovav Shacham and Stefan Savage2012 Return-oriented programming Systems languages and ap-plications ACM Transactions on Information and System Security(2012)

[76] Michael Schwarz Martin Schwarzl Moritz Lipp and Daniel Gruss2018 NetSpectre Read Arbitrary Memory over Network arXivpreprint arXiv180710535 (2018)

[77] Hovav Shacham Matthew Page Ben Pfaff Eu-Jin Goh NagendraModadugu and Dan Boneh 2004 On the Effectiveness of Address-space Randomization In Proceedings of the 11th ACM Conference onComputer and Communications Security (CCS rsquo04) ACM New YorkNY USA 298ndash307 httpsdoiorg10114510300831030124

[78] Julian Stecklina and Thomas Prescher 2018 LazyFP Leaking FPURegister State using Microarchitectural Side-Channels arXiv preprintarXiv180607480 (2018)

[79] G Edward Suh Jae W Lee David Zhang and Srinivas Devadas2004 Secure Program Execution via Dynamic Information FlowTracking In Proceedings of the 11th International Conference onArchitectural Support for Programming Languages and OperatingSystems (ASPLOS XI) ACM New York NY USA 85ndash96 httpsdoiorg10114510243931024404

[80] Mohammadkazem Taram Ashish Venkat and Dean Tullsen 2018Mobilizing the Micro-Ops Exploiting Context Sensitive Decodingfor Security and Energy Efficiency In Proceedings of the 45th AnnualInternational Symposium on Computer Architecture (ISCA rsquo18)

[81] Mohit Tiwari Hassan MG Wassel Bita Mazloom ShashidharMysore Frederic T Chong and Timothy Sherwood 2009 Com-plete Information Flow Tracking from the Gates Up In Proceedings ofthe 14th International Conference on Architectural Support for Program-ming Languages and Operating Systems (ASPLOS XIV) ACM NewYork NY USA 109ndash120 httpsdoiorg10114515082441508258

[82] Caroline Trippel Daniel Lustig and Margaret Martonosi 2018MeltdownPrime and SpectrePrime Automatically-Synthesized At-tacks Exploiting Invalidation-Based Coherence Protocols CoRRabs180203802 (2018) arXiv180203802 httparxivorgabs180203802

[83] DeanM Tullsen Susan J Eggers Joel S Emer Henry M Levy Jack L Loand Rebecca L Stamm 1996 Exploiting choice Instruction fetch andissue on an implementable simultaneous multithreading processor InACM SIGARCH Computer Architecture News Vol 24 ACM 191ndash202

[84] Dean M Tullsen Susan J Eggers and Henry M Levy 1995 Si-multaneous Multithreading Maximizing On-chip Parallelism InProceedings of the 22Nd Annual International Symposium on Com-puter Architecture (ISCA rsquo95) ACM New York NY USA 392ndash403httpsdoiorg101145223982224449

[85] Paul Turner 2018 Retpoline a software construct for preventingbranch-target-injection httpssupportgooglecomfaqsanswer7625886 Online accessed Jul 2018

[86] N Vachharajani M J Bridges J Chang R Rangan G Ottoni J ABlome G A Reis M Vachharajani and D I August 2004 RIFLEAn Architectural Framework for User-Centric Information-Flow Se-curity In Microarchitecture 2004 MICRO-37 2004 37th InternationalSymposium on 243ndash254 httpsdoiorg101109MICRO200431

[87] Jo Van Bulck Marina Minkin Ofir Weisse Daniel Genkin BarisKasikci Frank Piessens Mark Silberstein Thomas F Wenisch Yu-val Yarom and Raoul Strackx 2018 Foreshadow Extracting theKeys to the Intel SGX Kingdom with Transient Out-of-Order Execu-tion In Proceedings of the 27th USENIX Security Symposium USENIXAssociation See also technical report Foreshadow-NG [95]

[88] Ashish Venkat 2018 Breaking the ISA Barrier in Modern ComputingPhD Dissertation UC San Diego

[89] Ashish Venkat Arvind Krishnaswamy Koichi Yamada and RajanPalanivel 2015 Binary Translation driven Program State RelocationIn United States Patent Grant US009135435B2

[90] Ashish Venkat S Shamasunder Hovav Shacham and DeanM Tullsen2016 HIPStR Heterogeneous-ISA Program State Relocation In Pro-ceedings of the International Symposium on Architectural Support forProgramming Languages and Operating Systems

[91] Ashish Venkat and Dean M Tullsen 2014 Harnessing ISA diversityDesign of a heterogeneous-ISA chip multiprocessor In Proceedingsof the International Symposium on Computer Architecture

[92] G Venkataramani I Doudalis Y Solihin and M Prvulovic 2008FlexiTaint A programmable accelerator for dynamic taint propaga-tion In 2008 IEEE 14th International Symposium on High PerformanceComputer Architecture 173ndash184 httpsdoiorg101109HPCA20084658637

[93] Philipp Vogt Florian Nentwich Nenad Jovanovic Engin KirdaChristopher Kruegel and Giovanni Vigna 2007 Cross Site Script-ing Prevention with Dynamic Data Tainting and Static Analysis InNDSS Vol 2007 12

[94] Zhenghong Wang and Ruby B Lee 2006 Covert and side channelsdue to processor architecture In Computer Security Applications Con-ference 2006 ACSACrsquo06 22nd Annual IEEE 473ndash482

[95] Ofir Weisse Jo Van Bulck Marina Minkin Daniel Genkin BarisKasikci Frank Piessens Mark Silberstein Raoul Strackx Thomas FWenisch and Yuval Yarom 2018 Foreshadow-NG Breaking theVirtual Memory Abstraction with Transient Out-of-Order Execu-tion Technical report (2018) See also USENIX Security paper Fore-shadow [87]

[96] Troy Wolverton 2018 Spectre and Meltdown are now a le-gal pain for Intel the chip maker faces 35 lawsuits over the at-tacks httpswwwbusinessinsidercom35-lawsuits-have-been-filed-against-intel-over-spectre-and-meltdown-2018-2 Online ac-cessed Aug 2018

[97] Zhenyu Wu Zhang Xu and Haining Wang 2012 Whispers in theHyper-space High-speed Covert Channel Attacks in the Cloud InPresented as part of the 21st USENIX Security Symposium (USENIX Se-curity 12) USENIX Bellevue WA 159ndash173 httpswwwusenixorgconferenceusenixsecurity12technical-sessionspresentationwu

[98] Wei Xu Sandeep Bhatkar and Ramachandran Sekar 2006 Taint-Enhanced Policy Enforcement A Practical Approach to Defeat aWide Range of Attacks In USENIX Security Symposium 121ndash136

[99] M Yan J Choi D Skarlatos A Morrison C Fletcher and J Torrellas2018 InvisiSpec Making Speculative Execution Invisible in the CacheHierarchy In 2018 51st Annual IEEEACM International Symposium onMicroarchitecture (MICRO) 428ndash441 httpsdoiorg101109MICRO201800042

[100] Yuval Yarom and Katrina Falkner 2014 FLUSH+ RELOAD A HighResolution Low Noise L3 Cache Side-Channel Attack In USENIXSecurity

[101] Yuval Yarom Daniel Genkin and Nadia Heninger 2016 CacheBleedA timing attack on OpenSSL constant time RSA In InternationalConference on Cryptographic Hardware and Embedded Systems

[102] Alexander Yip Xi Wang Nickolai Zeldovich and M Frans Kaashoek2009 Improving Application Security with Data Flow AssertionsIn Proceedings of the ACM SIGOPS 22Nd Symposium on OperatingSystems Principles (SOSP rsquo09) ACM New York NY USA 291ndash304httpsdoiorg10114516295751629604

[103] Craig Zilles 2001 Linked List Traversal Micro-Benchmark httpzillescsillinoisedullubenchmarkhtml Online accessed Jun 2018

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
  • 3 Assumptions and Threat Model
  • 4 Architectural Overview
  • 5 Design and Implementation
    • 51 Microcode Customization
    • 52 Decoder-Level Information Flow Tracking
    • 53 Mitigations for Control-flow Mistraining
      • 6 Methodology
      • 7 Evaluation
        • 71 Security Discussion
        • 72 Performance
          • 8 Conclusion
          • Acknowledgments
          • References
Page 6: Context-Sensitive Fencing: Securing Speculative Execution ...av6ds/papers/asplos2019.pdf · Context-Sensitive Fencing: Securing Speculative Execution via Microcode Customization Mohammadkazem

Table 2 List of Intelrsquos Serializing InstructionsType Instr Desc

PrivilegedSeria

lizingInstructions INVD Invalidate Internal Caches

INVEPT Invalidate Translations from EPTINVLPG Invalidate TLB EntriesINVVPID Invalidate Translations Based on VPIDLIDT Load Interrupt Descriptor Table RegisterLGDT Load Global Descriptor Table RegisterLLDT Load Local Descriptor Table RegisterLTR Load Task RegisterMOV Move to Control RegisterMOV Move to Debug Register

WBINVD Write Back and Invalidate CacheWRMSR Write to Model Specific Register

Non

Priv CPUID CPU Identification

IRET Interrupt ReturnRSM Resume from System Management Mode

Mem

Ordering SFENCE Store Fence

LFENCE Load Fence

MFENCE Memory Fence

5 Design and ImplementationIn this section we describe in greater detail the architecturaltechniques and building blocks that together constitute theproposed defense strategy ndash context-sensitive fencing Firstwe describe a microcode customization framework that en-ables the surgical insertion of fences to secure speculativeexecution with minimal performance impact Second we ex-amine the full design space of fences to provide the isolationproperties we need without undue obstruction of existingpipeline performance features Third we propose a decoder-level information flow tracking (DLIFT) technique to followpotentially malicious execution patterns and trigger securemicrocode customization Finally we propose protectionmechanisms that circumvent the mistraining of the branchpredictor and associated structures

51 Microcode CustomizationSpeculation fences are a processorrsquos primary mechanism tooverride speculative execution For Spectre variant-1 context-sensitive fencing works with CSD by providing alternate de-codings of all load instructions with the alternate decodingalways including a fence micro-op that appears before theload micro-op The alternate decoding will then be triggeredor not based on runtime conditions The first considerationthen is what fence instruction to incorporate Most pro-cessors already provide a variety of fences and serializinginstructions We first study the performance and securityimpact of the existing suite of fences and serializing instruc-tions and then explore and propose new speculation fencesand context-sensitive fencing techniques that manage theimpact on performance For Spectre variant-2 CSF works

micro-op

Que

ue

Macro-op Stream Macro-to-microop

Translation Unit

CSD-Enabled Decoder

FENCE Injection

Point

micro-op

Dis

patc

her

Load Store-Queue

FU1 FU2

RSRS

RSRSRS

Cache

Cac

he

Con

trolle

r

Mem

Mem

C

ontro

ller

Fence Enforcement Point

Figure 4 Fence Enforcement Points

with CSD by providing alternate decodings of some con-trol flow and possibly insert fences andor branch predictorresetting micro-ops ndash more detail is in Section 53

511 Serializing Instructions and Memory FencesSerializing instructions are the strictest amongst all specula-tion fences and completely override speculative executionUpon decoding a serializing instruction the processor stallsfetching any subsequent instruction until all instructionspreceding the serializing instruction retire Due to the highpipeline depth and issue width of modern out-of-order su-perscalars the usage of serializing instructions could resultin long delays and considerable throughput loss Memoryfences on the other hand enforce a memory serializationpoint in the program instruction stream More specificallythese fences ensure that memory operations that appear inexecution after the fence are stalled until all the outstandingmemory requests (including the fence) complete executionTable 2 shows a complete list of Intelrsquos existing serializ-

ing instructions and fences [48] Most of the serializing in-structions need to be run in privileged mode which restrictstheir usage in defenses for victims that lack sufficient privi-leges (eg browsers) Moreover a majority of them modifythe state or contents of architectural registers the programcounter cache TLB etc and only serialize execution as aside effect thereby requiring an additional backuprestorestep when used for the sole purpose of serializing executionAn exception to this is the MOV to debug register instructionthat does not corrupt any architectural state if not actuallyin debug modeThe SFENCE instruction does not allow stores to pass

through it but does not affect loads That is it enforces thatall stores that precede the SFENCE are executed to comple-tion before any store that succeeds the SFENCE is fetchedThe MFENCE instruction restricts all memory operationsfrom passing through it Unlike SFENCE and MFENCE thatare memory ordering operations LFENCE performs a seri-alizing operation In particular LFENCE does not stop the

processor from fetching and decoding instructions that ap-pear after the LFENCE but it restricts the dispatch of suchinstructions until the instructions preceding the LFENCEcomplete execution Since LFENCE serializes execution andyet makes some progress in the front-end it has been recom-mended by Intel as a low-overhead fence that can be insertedat vulnerable points in execution to defend against Spec-tre [45] In other words while serializing instructions suchas CPUID are enforced at Fetch LFENCE is enforced at theinstruction queue level Therefore in this work we start withthe LFENCE and propose new fences that come with fewerrestrictions different enforcement policies andor sport ad-ditional optimizations

512 Fence Enforcement PoliciesIt is important to note that none of the existing instructionsthat provide fence support were actually created for the pur-pose we (or the whole industry) need for Spectre mitigationThus this work will examine existing fences variants ofexisting fences and also introduce a new fence primitive Tobetter understand the design landscape we examine severalpossible properties of fences in this section

Early vs Late Enforcement We first categorize fencesinto early-enforced and late-enforced fences based on thepipeline stage at which they are enforced In particular werefer to any serializing instruction such as an LFENCE thatis enforced at the instruction queue or earlier in the pipelineas early-enforced If the fence is enforced at a later stage suchas the reservation station the loadstore queue or the cachecontroller we refer to it as a late-enforced fence Late enforce-ment in essence shifts the fence enforcement point towardsthe leaking structure (eg the cache) reducing the impacton instructions that do not access that structure and allow-ing for the enforcement of more fine-grained serializationrules (eg allow cache hits but not misses) Figure 4 showspotential fence enforcement points at various stages in theprocessor pipelineIt is important to note that the later a fence is enforced

the fewer the side channels it protects against For exampleIntelrsquos LFENCE prevents information leakage through all mi-croarchitectural structures that appear after the instructionqueue However to prevent information leakage through theinstruction cache side channel we would have to resort toa regular serializing instruction or an MFENCE resultingin prohibitively high performance overheads Similarly afence enforced at the data cache controller level would onlymitigate data cache-based side channels and will not protectagainst an FPU-based side-channel Instead the FPU-basedside-channel may be mitigated using a different fence thatis appropriately configured and enforced at the reservationstation or the FPU

Strict vs Relaxed Enforcement Depending upon howprohibitive they are we next classify fences into strict andrelaxed Strict fences are highly prohibitive and do not allow

Table 3 Characteristics of Different Fence Types

Fence Name EnforcementPoint

StrictRelaxed

Instructionsnot Allowed

MitigatesVariants

ExistingNew

Intelrsquos SIs (CPUID) Fetch Strict All All ExistingLFENCE IQ Strict All All Existing

LSQ-LFENCE LSQ Relaxed Ld v1 NewLSQ-MFENCE LSQ Relaxed LdampSt v1v11v12 New

CFENCE CC Relaxed None v1 New CC Cache Controller RS Reservation Station

any instructions to pass through them until the fence retireswhereas relaxed fences allow certain types of instructionsto pass through them For example an SFENCE that is en-forced at the loadstore queue and allows all instructions topass except stores that have a greater sequence number thanitself is a late-enforced and relaxed fence On the other handall x86 serializing instructions including LFENCE are early-enforced and strict Customizing the micro-op stream withearly-enforced and strict fences typically results in slower ex-ecution when compared to customization with late-enforcedand relaxed fences However with carefully enforced con-straints the late-enforced and relaxed fences could offer sim-ilar if not better security guarantees

Early vs Late Commit A fence typically remains ef-fective until it gets committed or squashed Based on Intelrsquosmanual [47] a serializing instruction is only allowed to becommitted if there is no preceding outstanding store thatis waiting to be written back While this behavior might benecessary for device synchronization or memory orderingenforcement for the purpose of securing speculative exe-cution against Spectre attacks there is no need to wait forstores to be written back This is because write buffers arenrsquotcommitted to the cache until the store reaches retirementand therefore the fact that a store is waiting for an outstand-ing writeback request to complete is enough evidence that itdid not occur along a misspeculated path Allowing a fenceto commit early without waiting for preceding outstandingstores to write back can considerably improve performancebecause as soon as a fence gets committed a stream of in-structions can advance further in the pipeline Therefore inthis work we propose and study the effects of a late-commitversion of each fence that does not wait for stores to be writ-ten back However we note that these versions should onlybe limited to the security use case we describe and shouldnot be used for synchronization

Newly Proposed Fences Table 3 summarizes the char-acteristics of different existing and newly proposed fencesAlthough the existing set of fences defend against all vari-ants they incur prohibitively high costs on performance dueto their strict enforcement constraints To better understandthe potential for fences beyond those that already exist wepropose and evaluate three new types of fences In an attackscenario we typically insert one of these fences between abranch instruction and a load instruction that potentiallyleaks sensitive information via a cache side-channel Theproposed fences more effectively and efficiently mitigate the

high impact variant-1 attacks by preventing informationleaks along misspeculated paths through cache-based sidechannels We describe each of them in greater detail below

The LSQ-LFENCE and the LSQ-MFENCE are relaxed fencesenforced at the loadstore queue While LSQ-LFENCE fenceis in effect it does not allow any subsequent load instructionto be issued out of the loadstore queue thereby preventingthe cache state from being changed by load instructions onmisspeculated paths Thus the LSQ-LFENCE mitigates theSpectre variant-1 attack On the other hand the more restric-tive LSQ-MFENCE does not allow any subsequent memoryinstruction (both loads and stores) to be issued out of theloadstore queue until the fence commits In addition tomitigating the variant-1 attack that the LSQ-LFENCE pro-tects against the LSQ-MFENCE mitigates the variant-11and variant-12 attacks that exploit store-to-load forwardingbetween speculative loads and stores

The CFENCE is a relaxed fence and is enforced at the cachecontroller level using the following set of rules First like anyother fence it allows all preceding instructions to proceedSecond since store instructions do not commit the contentsof the write buffer until the instruction retires they are unaf-fected by the CFENCE Finally it labels any subsequent loadas a non-modifying load and allows it to pass through thefence but the load is restricted from modifying the cachestate In particular a non-modifying load that results in acache hit is allowed to read the contents of the cache but isrestricted from changing the LRU and other metadata bitsA non-modifying load that results in a cache miss is markedas uncacheable allowing the memory read request to com-plete without altering the cache state In this way we avoidupdating the cache state upon encountering a speculativeload and donrsquot leave any observable cache footprints alongmisspeculated paths thereby mitigating the variant Spectrevariant-1 attack In addition due to locality of referencesthe miss rate of a reasonable program is typically very lowand therefore using CFENCE results in considerably lowerperformance overhead than other types of fencesNote that this can be applied recursively at each cache

level For example an L2 hit (L1 miss) will bypass the L1cache and not initiate a fill then read from the L2 cachewithout altering the LRU bits

513 Fence Frequency OptimizationThe most naiumlve yet secure way to insert fences is to liberallyinstrument every instruction of a vulnerable type (eg loadinstructions in the case of cache side-channels) in the pro-gram and add the fence micro-op to all of them In fact notevery instance of a vulnerable instruction type is necessarilyvulnerable for example all the loads in the program arenrsquotvulnerable to speculative attacks via cache side-channelsTherefore we can reduce the number of fences insertedHowever since failing to insert fences even in one scenariowould enable a Spectre attacker to read the whole victimrsquos

memory space it is of crucial importance that fence fre-quency optimizations be conducted meticulously In the fol-lowing we introduce two secure optimizations for reducingthe number of fencesBasic Block-Level Fence InsertionThe source of the Spec-tre attack is dynamic control speculation which implies thatthe speculation begins with a branch prediction and theprocessor starts speculatively executing along the predictedpath To fully mitigate this attack we want a fence betweeneach branch and subsequent loads but if one branch is fol-lowed by four loads we only need one fence to protect allfour Thus in this optimization we propose to only instru-ment (provide the alternate decoding for) the first instanceof a vulnerable instruction type (eg first load) of each ba-sic block and it is safe to leave the rest of the instructionsuninstrumented This is simply implemented by setting aflag in hardware whenever a branch is decoded then inserta micro-op in the alternative load decoding that along withthe fence also resets the flag When the flag is not set theoriginal decoding is usedTaint-Based Fence Insertion However even one load perbasic block is likely still too conservative In all known in-stances of the attacks the attacker performs some operations(mostly memory read) based on untrusted data that leadsto the information leak For example in Spectre variant-1the attacker provides an untrusted out-of-bound index to anarray In this work we assume any information that comesfrom the user address space and input devices (eg via thex86 IN instruction) as untrusted In addition we also con-sider DMArsquod pages as untrusted for which we rely on theIOMMU [5] to mark DMArsquod pages as tainted in the pagetable For the taint-based fence insertion optimization wepropose the insertion of fences for only vulnerabletaintedloads that operate on untrusted data by leveraging a noveldecoder-level information flow tracking (DLIFT) strategydescribed in the following section

52 Decoder-Level Information Flow TrackingAs its name indicates the distinguishing feature that makesDecoder-Level Information Flow Tracking (DLIFT) uniquefrom typical hardware taint tracking systems is that it canprovide the taint information at the decoder stage of thepipeline rather than at commit stage This is a critical distinc-tion for two reasons First of course our solution is decoder-based Second and perhaps more important execute- orcommit-based taint tracking comes too late in the pipelinefor any speculation-based attack and is of little useHowever commit is still the only stage where taint in-

formation is guaranteed to be correct so the design of adecode-based taint tracker is challenging In particular sincethe front-end of the pipeline typically runs far ahead in ex-ecution than the rest of the pipeline reading actual taintsfrom register files at the decode stage will read inaccuratevalues Therefore in the proposed DLIFT framework we

Macro-to-microop Translation

Unit

CSD-Enabled Decoder

Dis

patc

her

FetchTaint

Evaluator

Load Store Queue

2

Reg File

Actu

al T

aint

Bits

3

4

5

TLB

Tain

t Bitm

ap1

ADD rax rbx

raxSpecTaint |= rbxSpecTaint

Mistaint Recovery -Flushing the pipeline -Restarting translation and execution -Recovering speculative taint map

EA T

aint

CommitUnit

FetchDecode Execute

Speculative Taint Map

~Spec_Taint ampamp

Evaluated_Taint

True

Commit

Arch TaintMap

Arch RF

Figure 5 DLIFT integration with a CSD-enabled pipeline

separate the taint information into four taint structures ndash (a)a decoder-level taint map that tracks and maintains the taintinformation for architectural registers (b) the physical reg-ister file augmented with taint information that maintainsdynamically computed taint information at execute (c) theTLB and page tables augmented with a taint bitmap to trackcache block-level taint information and (d) a commit-leveltaint map that maintains verified architectural register taintinformation While the latter three structures are typical ofany standard DIFT implementation [79] the first structureis a new addition proposed by this work Instruction trans-lation thus depends on some speculative taint tracking thatmight not be exact potentially leaving vulnerable instruc-tions unfenced (ie translating to non-secure micro-ops) Forthis reason DLIFT relies on a mistaint recovery mechanismthat upon detecting a mistaint at the execution stage redi-rects and restarts the execution from the incorrectly taintedinstructionFigure 5 shows the integration of DLIFT with a CSD-

enabled pipeline The DLIFT engine maintains a taint mapthat stores a speculative taint bit for each architectural reg-ister For each incoming instruction (bull1 ) the DLIFT engineevaluates the taint of the destination register(s) based on thespeculative taint bits of the source registers Taint evaluationfollows the standard DIFT procedure [79] If a source regis-ter is marked as tainted in the speculative taint map DLIFTpropagates this taint to the destination register and furthertriggers context-sensitive translation (bull2 ) of the instruction(eg with speculative fence insertion) depending upon theconfigured levels of performance and security

Tracking taints at the decoder level is not always straight-forward First during speculative execution the DLIFT en-gine continues to propagate taints along the misspeculatedpath potentially leaving the taint map in an inconsistentstate upon recovery from a branch misprediction To thisend the DLIFT engine rolls back the state of the taint map

to its commit-level counterpart as part of the branch mis-prediction recovery Second the effective addresses of loadinstructions is usually not known at the decoder stage dueto which the DLIFT engine cannot accurately compute andpropagate taint information A conservative approach thatmarks all loads with unknown effective addresses as taintedcould severely degrade performance since overtainting typi-cally results in a high frequency of fence insertion Thereforewe take a more optimistic approach by assuming loads withunknown effective addresses are untainted and further relyon our mistaint detection mechanism at the execute stageto validate the predicted taints against the dynamically eval-uated taints (bull3 andbull5 ) In the event a mistaint is detectedwe update the speculative taint map flush the pipeline andrestart the translation and execution of the mistainted in-struction Note that we only perform mistaint recovery forthe undertainting scenario since it could potentially leavevulnerable instructions unfenced

Furthermore the DLIFT engine is capable of following in-struction sequences and execution patterns such as a taintedload followed by a branch allowing for the seamless de-tection of different Spectre gadget variants In this way al-though speculative the DLIFT engine has the potential forfast recovery from misspeculation and further allows themicrocode customization framework to perform a more tar-geted and surgical insertion of fences for particular vulnera-ble targets

53 Mitigations for Control-flow MistrainingTwo Spectre variants (v2 and v5) rely on the mistrainingof the branch predictor and the return stack buffer to influ-ence the victimrsquos branch outcome across different protec-tion domains To mitigate these attacks we examine severalhardening mechanisms In the simplest case where we areprotecting kernel branches from getting influenced we caninstrument syscall instructions to enforce branch predictorstate isolation If we want to protect a wider range of domaincrossings we could rely on a simple hardware mechanismthat identifies control transfer to a domain with a higherprivilege and then sets a flag The first control flow instruc-tion that decodes after that flag is set would then triggeran alternate decoding and reset the flag The alternate de-coding would insert a fence (to protect against this branchbeing mispredicted) then execute micro-ops that enforcebranch predictor state isolation and subsequently resumethe original control flowThis solution then is flexible enough that any region of

code could be protected in this way by configuringmodel spe-cific range registers (MSRRs) to indicate a protected regionthat would always reset the branch predictor upon entrythis then even prevents mistraining of sensitive code ofdifferent threads within the same process To prevent theMSRRs from being tampered by the attacker we only allow

Table 4 Architecture Detail for the Baseline x86 CoreBaseline Processor

Frequency 33 GHz I cache 32 KB 8 wayFetch width 4 fused uops D cache 32 KB 8 wayIssue width 6 unfused uops ROB size 168 entriesINTFP Regfile 160144 regs IQ 54 entriesRAS size 816 32 entries BTB size 256 512 1024 entriesLQSQ size 6436 entries Functional Int ALU(6) Mult(1)Branch Predictor LTAGE Units FP ALUMult(2) SIMD(2)

the operating system to configure MSRRs and we use specu-lative fences to guard x86rsquosWRMSR instruction Furthermorein the case of remapping via mmap() the operating systemis also responsible for reconfiguring MSRRs

The easiest and the most heavyweight way to isolate theBP state is to clear all the states by a series of returnbranchmicro-ops However such a hard reset of the BP can be morequickly and efficiently performed using a special micro-opthat clears some or all state of the branch predictor therebyenabling custom solutions to reset different structures withinthe prediction unit if such a micro-op is available In thiswork we assume a special micro-op to clear the branch targetbuffer (BTB) and the return stack buffer (RSB) the primarytargets of the Spectre attackIt should be noted that while resetting the BTB and the

RSB will induce more mispredictions (something the attackerwants) it does the attacker no good with respect to theseSpectre variants because the attacker can no longer controlthe target of the control flow misspeculation

6 MethodologyThis section details the experimental methodology for theperformance and security evaluation of the context-sensitivefencing frameworkOur baseline processor is modeled after the Intel Sandy-

bridge microarchitecture [43] Table 4 shows the architec-tural configuration of our baseline processor in more detailNote that we evaluate against three different branch pre-dictor configurations (with different BTB and RAS sizes) tomeasure the performance impact of our micro-op flows thatperform branch predictor state isolation across protection do-mains to defend against variant-2Wemodel this architectureusing the gem5 [8] architectural simulator which already fea-tures x86 micro-op translation Furthermore gem5 alreadyfeatures a Spectre test infrastructure and a visualization toolthat allow us to validate our claims regarding the securityguarantees of context-sensitive fencing [63]One of the primary goals of this work is to protect ker-

nel memory from being leaked along misspeculated pathsTherefore we use the full system simulation mode of gem5which allows us to boot an Ubuntu 1804 distribution ofLinux with a kernel version of 4813 Furthermore in orderto provide realistic estimates regarding attack coverage andperformance impact of our proposed techniques we selecta good mix of benchmarks that spend different amounts ofexecution time in kernel mode touch different aspects of the

Table 5 Benchmarks DescriptionBenchmark Description Kernel-Time

nginx HTTP Web Server 66ps Process information query 75ping Sends ICMP ECHO_REQUEST to loopback 95ls Performs two level directory listing 79llu Linked list traversal micro-benchmark 7

bzip2 Compression 34gcc C Language optimizing compiler 11

omnetpp Discrete event simulation 15

sjeng Artificial Intelligence (game tree searchand pattern recognition) 22

operating system and perform different number of systemcalls during execution as illustrated in Table 5 To this endwe include four benchmarks bzip2 gcc sjeng omnet fromthe SPEC CPU2006 suite [41] that exhibit varying degreesof instruction-level parallelism Furthermore we use threecommonly used Unix tools that target different functional-ities of the kernel and use different device drivers ndash (ping)that sends and receives ICMP packets (ps aux) that queriesprocess information and (ls ) that queries the filesystemin order to list multi-level directory information In additionwe also add a memory-intensive program llu that allocatesa large amount of memory for a linked list and traversesthe list making random memory accesses [103] Finally wealso evaluate the impact of our fences on the nginx webserver [74] using the wrk [35] framework to generate HTTPrequests

7 EvaluationIn this section we evaluate both security and performanceimpacts of the proposed strategy starting first with securityassessment and following that with performance evaluation

71 Security DiscussionThe proposed defense strategy can mitigate five variants ofthe Spectre attacks We discuss these in detail below

Variants 1 11 and 12 The goal of these variants is toleak memory contents of a victim along a misspseculatedpath by bypassing a bounds check andor further stitch to-gether such Spectre gadgets via store-to-load forwardingAll variants exploit a data cache-based side channel to leakinformation Context-Sensitive Fencing mitigates these vari-ants primarily by preventing information from being leakedalong misspeculated paths via fences that are surgically in-serted into the instruction stream and strategically placedbetween the conditional branch that performs the boundscheck and the first load in the Spectre gadget This workproposes three new fences ndash LSQ-LFENCE LSQ-MFENCEand CFENCE that each defend against the variant-1 attack bydisallowing loads along misspeculated paths to modify thedata cache state LSQ-MFENCE additionally protects againstthe variants 11 and 12 by preventing both speculative loadsand stores from being issued out of the loadstore queue

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

Injected for All Kernel Loads

100

105

110

115

120

125

130

135

140

145

150

155

160

165

170

175

180

185

190

195

200

Nor

mal

ized

Exe

cution

Tim

e

LFENCE-LC

LSQ-MFENCE-LC

CFENCE-LC

Figure 6 Execution Time of Different Fence EnforcementLevels (normalized to insecure execution)

effectively avoiding store-to-load forwarding between spec-ulative loads and stores

The surgical insertion of these fences is facilitated by thedecoder-level information flow tracker (DLIFT) that can fol-low instruction sequences that could serve as Spectre gad-gets as described in Section 52 While the DLIFT itself per-forms speculative tracking of taint information it has theability to detect mistaints and recover within three stages ofthe pipeline We further experimentally show later in thissection that the speculative nature of the DLIFT engine maysometimes result in overtainting (where trusted operand-sinstructions get marked as tainted) but never results in anundetected and unrecovered undertainting (where untrustedoperandsinstructions remain untainted)

Variants 2 and 5 These variants reverse-engineer andmistrain the branch predictor and the return address stack toinfluence the branch outcomes of a victim which in our caseis the Linux kernel In addition to the surgical fence injec-tion and speculative taint tracking context-sensitive fencingfurther employs mistraining mitigations that circumventsuch attempts More specifically we intercept all protectiondomain crossings via syscall and int 0x80 instructions andperform a full reset of the branch target buffer (BTB) and thereturn stack buffer (RSB) clearing all previous state Thisprevents user code from influencing branch outcomes of thekernel code We record no instance of BTB collision betweenuser and kernel branches and we ensure that we alwaysstart with a clean BTB and RSB upon entering kernel mode

To show that our different fences and frequency optimiza-tions close Spectre attacks we use a proof of concept Spectreimplementation and visualization tool [63] The attacks failin every case when we use context-sensitive fencing to insertfences despite our performance optimizations

72 PerformanceFigure 6 measures the performance impact of three differentfences ndash (a) the standard x86 LFENCE (b) the LSQ-MFENCE

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

Injected for All Kernel Loads

100

105

110

115

120

125

130

135

140

145

Nor

mal

ized

Exe

cution

Tim

e

CFENCE-LC

CFENCE-EC

Figure 7 Execution Time of Early and Late Commit ofCFENCE (normalized to insecure execution)

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

00

01

02

03

04

05

06

07

08

09

10

Rat

ioO

ver

All

Loa

ds

bzip2 gcc omnet llu nginx sjeng ls ping ps

Non-Modifying Misses Non-Modifying Hits Misses Hits

Figure 8 Effects of injecting CFENCE on Cache Miss Rate

and (c) the CFENCE all enforced with the standard latecommit pessimistically inserted for every kernel load in theprogram Clearly from the figure the CFENCE incurs thelowest performance overhead since it is less restrictive thanthe other two and is enforced at a much later pipeline stageWe further study the effect of late and early commit policiesfor the CFENCE as shown in Figure 7 The early commitversion of the CFENCE consistently performs better than thelate commit version saving about 4 in overall executiontime on averageOverall the CFENCE reduces the incurred performance

overhead due to fencing by 23X bringing down the exe-cution time overhead from 48 to 21 Furthermore thisperformance improvement is consistent across all the bench-marks the only exception being llu which performs randommemory accesses due to the linked list traversal and suffersfrom a very high cache miss rateIn Figure 8 we study the effect of the CFENCE on cache

miss rate Recall that when a load passes a CFENCE it getsmarked as non-modifying and as a result if it ends up beinga miss in the cache it is deemed uncacheable and therefore

the fetched cache block isnrsquot brought into cache If the missrate of a program is already high as in the case of llu thepresence of a CFENCE in the instruction stream could po-tentially result in under-utilization of the cache since themissed blocks arenrsquot being filled back into the cache whilethe CFENCE is begin enforced For programs that have un-usually low hit rates we suggest using the standard LFENCEinstead of the CFENCE However in those cases it should benoted that the overhead of the mitigation is low regardlessof the fence usedFurthermore it is important to note that the number of

non-modifying loads in the dynamic instruction stream donot necessarily have a negative impact on performancewhile a CFENCE is being enforced For example while theping program has the most non-modifying loads it doesnot suffer much in terms of overall performance becausethe working set fits well into the cache and most of thenon-modifying accesses end up being hits In general wegain more from employing the CFENCE instead of standardserializing fences when the cache hit rate is lowFigure 9 shows the performance of our proposed fence

frequency optimization techniques In the first optimizationwe inject the CFENCE only for tainted loads and branches asindicated by the DLIFT engine This reduces the performanceoverhead of the defense from 21 to 11 on average We fur-ther optimize the number of fences inserted by performingbasic block-level fence insertion where we only instrumentthe first instance of a vulnerable load in each basic block Thisresults in an additional 4 improvement in performance AsFigure 9 shows the once per basic block opitimization is suc-cessful at improving the performance of all the benchmarksexcept llu That is because CFENCE changes the cache accesspattern usually at some cost but occasionally beneficiallyby bypassing accesses and reducing pressure on the cacheIn such a case like llu with its large working set executingfewer fences would then be slightly less beneficial Overallcompared to the state-of-the-art fence injection scheme thatpessimistically injects an LFENCE for all kernel loads ourDLIFT-Based CFENCE injection reduces the performanceoverhead from 48 to just 77 on averageWe next study the accuracy and coverage of our DLIFT

implementation Figure 10 examines overtainting scenarioswhere the DLIFT engine conservatively marks instructionsas tainted when theyrsquore actually not The results are nor-malized to the total number of dynamic load instructionsexecuted Further we examine two scenarios ndash the bars tothe left represent DLIFT-based fencing without the basic-block level fence insertion and the bars to the right includesthe basic-block level fence insertion optimization In mostcases the percentage of overtainted loads remains low inboth scenariosFigure 11 examines undertainting scenarios where the

DLIFT engine optimistically forgoes instrumenting certaininstructions that are actually vulnerable but recovers from

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

CFENCE

100

105

110

115

120

125

130

135

140

145

150

155

160

Nor

mal

ized

Exe

cution

Tim

e

Always Fencing

DLIFT-Based Fencing

DLIFT + Once per BB

Figure 9 Impact of Fence Frequency Optimizations on Exe-cution Time

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

00

01

02

03

04

05

06

07

08

09

10

Rat

ioO

ver

All

Loa

ds

ping ps nginx bzip2 ls sjeng llu omnet gcc

Over-tainted Loads Tainted Loads

Figure 10 Accuracy of DLIFT Overtainting Rate of Loads

such scenarios later in the pipeline We observe that thepercentage of undertainted loads is low and consistentlybelow 10 with the outliers being the kernel-heavy applica-tions ping and ps which have the highest number of taintedinstructions (instructions with at least one tainted sourceregister) as shown in Figure 12Note that our DLIFT engine performs explicit informa-

tion flow tracking modeled after DIFT [79] that tracks copyload-address store-address and computational dependen-cies However we also evaluate CSF with a more conser-vative implicit information flow tracking model in whichwe taint the program counter when the branch outcomedepends upon a tainted value and further track both sidesof the branch This results in more tainted instructions andincurs 118 extra performance overhead on top of our taint-based CFENCE insertionTo evaluate the effect of our mistraining mitigations we

measure the impact of a full micro-op based BTB and RSBreset at every protection domain crossing We do this ex-periment on three separate branch predictor configurations(a) a small predictor with 256 BTB and 8 RAS entries (b) a

pssje

ngbz

ip2

ping llu gc

c ls

nginx

omne

t

Speculative DIFT ndash Sfence

000

001

002

003

004

005

006

007

008

009

Rat

ioof

Under

-tai

nte

dLoa

ds

Figure 11 Coverage of DLIFT Undertainting Rate of Loads

omnet bzip2 llu ls nginx ping ps sjeng gcc

Speculative DIFT ndash Sfence

00

01

02

03

04

05

06

07

08

09

10

Tai

nte

dIn

tstr

uct

ions

Tainted Untatinted

Figure 12 Ratio of instructions marked as tainted by DLIFT

medium predictor with 512 BTB and 16 RAS entries and(c) a large predictor with 1024 BTB and 32 RAS entries Wemeasure an average performance degradation of 27 on oursmall predictor 66 on our medium predictor and 152 onour large predictor Naturally the overhead is almost com-pletely due to the loss of prediction accuracy rather thanthe cost of micro-op expansionIn summary the proposed defense strategy introduces a

flexible microcode customization framework that performthe surgical insertion of newly introduced speculation fencesthat mitigate five variants of the Spectre class of attacksreducing the fencing overhead of state-of-the-art fence-basedSpectre mitigations by a factor of 6

8 ConclusionIn this work we propose context-sensitive fencing (CSF) aset of architectural techniques that provide high-performancedefense against Spectre-style attacks In particular we showthat we can reduce fencing overhead by a factor of 6 com-pared to a conservative fence insertion method

This is done by injecting fence instructions dynamicallyin the decoder with no recompilation and binary translationrequired This allows us to employ runtime information tostrategically insert fences when needed using taint infor-mation and avoiding redundant fences after branches Thiswork also introduces new fence primitives which protectsensitive structures from speculation-based microarchitec-tural effects with minimal impact on instruction throughputin the pipeline

AcknowledgmentsThe authors would like to thank the anonymous reviewersfor their helpful insights This research was supported inpart by NSF Grant CNS-1652925 NSFIntel FoundationalMicroarchitecture Research Grant CCF-1823444 and DARPAunder the agreement number HR0011-18-C-0020

References[1] Onur Acıiccedilmez Ccediletin Kaya Koccedil and Jean-Pierre Seifert 2007 Predict-

ing secret keys via branch prediction In CryptographersacircĂŹ Track atthe RSA Conference

[2] Onur Acıiccedilmez Werner Schindler and Ccediletin K Koccedil 2007 Cachebased remote timing attack on the AES In CryptographersacircĂŹ Trackat the RSA Conference

[3] Shaizeen Aga and Satish Narayanasamy 2017 InvisiMem SmartMemory Defenses for Memory Bus Side Channel In Proceedings ofthe 44th Annual International Symposium on Computer Architecture(ISCA rsquo17) ACM New York NY USA 94ndash106 httpsdoiorg10114530798563080232

[4] AMD 2018 White paper SOFTWARE TECHNIQUES FOR MANAG-ING SPECULATION ON AMD PROCESSORS Technical Report RE-VISION 12418 httpsdeveloperamdcomwp-contentresourcesManaging-Speculation-on-AMD-Processorspdf

[5] Nadav Amit Muli Ben-Yehuda Dan Tsafrir and Assaf Schuster 2011vIOMMU Efficient IOMMU Emulation In USENIX Annual TechnicalConference

[6] ARM 2018 Whitepaper Cache Speculation Side-channelsTechnical Report Version 22 httpsdeveloperarmcom-mediadeveloperpdfSecurity20update201020July2018CacheSpeculationSide-channels-v22pdfrevision=7de26366-a49f-4c23-85f0-34e8f5e38881

[7] Eli Biham and Adi Shamir 1997 Differential fault analysis of secretkey cryptosystems In Annual International Cryptology Conference

[8] Nathan L Binkert Ronald G Dreslinski Lisa R Hsu Kevin T Lim Ali GSaidi and Steven K Reinhardt 2006 The M5 Simulator ModelingNetworked Systems Micro IEEE (2006)

[9] Darrell Boggs Gary Brown Nathan Tuck and K S Venkatraman2015 Denver Nvidiarsquos First 64-bit ARM Processor IEEE Micro (2015)

[10] Chandler Carruth 2018 Mitigating Speculative Attacks inCrypto httpsgithubcomHACS-workshopspectre-mitigationsblobmastercryptoguidelinesmd Online accessed Jul 2018

[11] Chandler Carruth 2018 RFC Speculative Load Hardening (aSpectre variant1 mitigation) httpsdocsgooglecomdocumentd1wwcfv3UV9ZnZVcGiGuoITT61eKo3TmoCS3uXLcJR0edit Onlineaccessed Jul 2018

[12] Chandler Carruth 2018 RFC Speculative Load Hardening (a Spectrevariant1 mitigation) httpslistsllvmorgpipermailllvm-dev2018-March122085html Online accessed Jul 2018

[13] Guoxing Chen Sanchuan Chen Yuan Xiao Yinqian Zhang ZhiqiangLin and Ten H Lai 2018 SgxPectre Attacks Leaking Enclave

Secrets via Speculative Execution CoRR abs180209085 (2018)arXiv180209085 httparxivorgabs180209085

[14] H Chen X Wu L Yuan B Zang P c Yew and F T Chong 2008From Speculation to Security Practical and Efficient InformationFlow Tracking Using Speculative Hardware In 2008 InternationalSymposium on Computer Architecture 401ndash412 httpsdoiorg101109ISCA200818

[15] Shimin Chen Phillip B GibbonsMichael Kozuch and Todd CMowry2011 Log-based Architectures Using Multicore to Help SoftwareBehave Correctly SIGOPS Oper Syst Rev 45 1 (Feb 2011) 84ndash91httpsdoiorg10114519450231945034

[16] Hari Cherupalli Henry Duwe Weidong Ye Rakesh Kumar and JohnSartori 2017 Software-based Gate-level Information Flow Securityfor IoT Systems In Proceedings of the 50th Annual IEEEACM Interna-tional Symposium on Microarchitecture (MICRO-50 rsquo17) ACM NewYork NY USA 328ndash340 httpsdoiorg10114531239393123955

[17] JonathanCorbet 2017 The current state of kernel page-table isolationhttpslwnnetArticles741878 Online accessed Jan 2018

[18] Marc L Corliss E Christopher Lewis and Amir Roth 2003 DISE Aprogrammable macro engine for customizing applications In Com-puter Architecture 2003 Proceedings 30th Annual International Sym-posium on

[19] Marc L Corliss E Christopher Lewis and Amir Roth 2005 Low-overhead interactive debugging via dynamic instrumentation withDISE In 11th International Symposium on High-Performance ComputerArchitecture

[20] Marc L Corliss E Christopher Lewis and Amir Roth 2005 UsingDISE to Protect Return Addresses from Attack SIGARCH ComputArchit News (March 2005)

[21] Jedidiah R Crandall and Frederic T Chong 2004 Minos Control DataAttack Prevention Orthogonal toMemoryModel In Proceedings of the37th Annual IEEEACM International Symposium on Microarchitecture(MICRO 37) IEEE Computer Society Washington DC USA 221ndash232httpsdoiorg101109MICRO200426

[22] John Criswell Nathan Dautenhahn and Vikram Adve 2014 VirtualGhost Protecting Applications from Hostile Operating Systems InProceedings of the 19th International Conference on Architectural Sup-port for Programming Languages and Operating Systems (ASPLOS rsquo14)ACM New York NY USA 81ndash96 httpsdoiorg10114525419402541986

[23] John Criswell Andrew Lenharth Dinakar Dhurjati and Vikram Adve2007 Secure Virtual Architecture A Safe Execution Environmentfor Commodity Operating Systems In Proceedings of Twenty-firstACM SIGOPS Symposium on Operating Systems Principles (SOSP rsquo07)ACM New York NY USA 351ndash366 httpsdoiorg10114512942611294295

[24] Michael Dalton Hari Kannan and Christos Kozyrakis 2007 RakshaA Flexible Information Flow Architecture for Software Security InProceedings of the 34th Annual International Symposium on ComputerArchitecture (ISCA rsquo07) ACM New York NY USA 482ndash493 httpsdoiorg10114512506621250722

[25] James C Dehnert Brian K Grant John P Banning Richard JohnsonThomas Kistler Alexander Klaiber and Jim Mattson 2003 The Trans-meta Code Morphing Software using speculation recovery andadaptive retranslation to address real-life challenges In Proceedingsof the international symposium on Code generation and optimizationfeedback-directed and runtime optimization

[26] Matthew DeVuyst Ashish Venkat and Dean M Tullsen 2012 Ex-ecution migration in a heterogeneous-isa chip multiprocessor InProceedings of the Seventeenth International Conference on Architec-tural Support for Programming Languages and Operating Systems

[27] Craig Disselkoen David Kohlbrenner Leo Porter and Dean Tullsen2017 Prime+Abort A Timer-Free High-Precision L3 CacheAttack using Intel TSX In 26th USENIX Security Symposium

(USENIX Security 17) USENIX Association Vancouver BC 51ndash67 httpswwwusenixorgconferenceusenixsecurity17technical-sessionspresentationdisselkoen

[28] Xiaowan Dong Zhuojia Shen John Criswell Alan Cox and SandhyaDwarkadas 2018 Spectres Virtual Ghosts and Hardware SupportIn Proceedings of the 7th International Workshop on Hardware andArchitectural Support for Security and Privacy (HASP rsquo18) ACM NewYork NY USA Article 5 9 pages httpsdoiorg10114532142923214297

[29] Xiaowan Dong Zhuojia Shen John Criswell Alan L Cox and Sand-hya Dwarkadas 2018 Shielding Software From Privileged Side-Channel Attacks In 27th USENIX Security Symposium (USENIX Se-curity 18) USENIX Association Baltimore MD 1441ndash1458 httpswwwusenixorgconferenceusenixsecurity18presentationdong

[30] Dmitry Evtyushkin Ryan Riley Nael CSE Abu-Ghazaleh ECE andDmitry Ponomarev 2018 BranchScope A New Side-Channel Attackon Directional Branch Predictor In Proceedings of the Twenty-ThirdInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems (ASPLOS rsquo18) ACM New York NYUSA 693ndash707 httpsdoiorg10114531731623173204

[31] Karine Gandolfi Christophe Mourtel and Francis Olivier 2001 Elec-tromagnetic analysis Concrete results In International Workshop onCryptographic Hardware and Embedded Systems

[32] Qian Ge Yuval Yarom David Cock and Gernot Heiser 2018 Asurvey of microarchitectural timing attacks and countermeasures oncontemporary hardware Journal of Cryptographic Engineering 8 1(2018) 1ndash27

[33] Daniel Genkin Adi Shamir and Eran Tromer 2014 RSA key ex-traction via low-bandwidth acoustic cryptanalysis In InternationalCryptology Conference

[34] Daniel Genkin Adi Shamir and Eran Tromer 2016 Acoustic Crypt-analysis Journal of Cryptology (2016)

[35] Github [n d] wrk - a HTTP benchmarking tool httpsgithubcomwgwrk Online accessed Jun 2018

[36] Daniel Gruss Moritz Lipp Michael Schwarz Richard Fellner Cleacute-mentine Maurice and Stefan Mangard 2017 Kaslr is dead long livekaslr In International Symposium on Engineering Secure Software andSystems Springer 161ndash176

[37] Daniel Gruss Raphael Spreitzer and Stefan Mangard 2015Cache Template Attacks Automating Attacks on InclusiveLast-Level Caches In 24th USENIX Security Symposium (USENIXSecurity 15) USENIX Association Washington DC 897ndash912httpswwwusenixorgconferenceusenixsecurity15technical-sessionspresentationgruss

[38] Maurice CleacutementineWagner Klaus Gruss Daniel and StefanMangard2016 Flush+Flush A Fast and Stealthy Cache Attack

[39] V Haldar D Chandra andM Franz 2005 Dynamic taint propagationfor Java In 21st Annual Computer Security Applications Conference(ACSACrsquo05) 9 ppndash311 httpsdoiorg101109CSAC200521

[40] W Halfond A Orso and P Manolios 2008 WASP Protecting WebApplications Using Positive Tainting and Syntax-Aware EvaluationIEEE Transactions on Software Engineering 34 1 (Jan 2008) 65ndash81httpsdoiorg101109TSE200770748

[41] John L Henning 2006 SPEC CPU2006 Benchmark DescriptionsSIGARCH Comput Archit News (Sept 2006)

[42] Jonathan J Hoch and Adi Shamir 2004 Fault analysis of streamciphers In International Workshop on Cryptographic Hardware andEmbedded Systems

[43] Intel 2008 2nd Generation Intel Core vPro Processor Family Tech-nical Report httpwwwintelcomcontentdamdocwhite-paperperformance-2nd-generation-core-vpro-family-paperpdf

[44] Intel 2014 Software guard extensions programming reference (2014)httpssoftwareintelcomsitesdefaultfiles329298-001pdf

[45] Intel 2018 White paper Intel Analysis of Speculative Exe-cution Side Channels Technical Report 336983-001 Revision10 httpsnewsroomintelcomwp-contentuploadssites11201801Intel-Analysis-of-Speculative-Execution-Side-Channelspdf

[46] Intel 2018 White paper Retpoline A Branch Target In-jection Mitigation Technical Report 337131-003 Revision003 httpssoftwareintelcomsitesdefaultfilesmanaged1d46Retpoline-A-Branch-Target-Injection-Mitigationpdf

[47] Intel Corporation 2009 Intelreg 64 and IA-32 Architectures OptimizationReference Manual

[48] Intel Corporation 2011 Intel 64 and IA-32 Architectures SoftwareDeveloperrsquos Manual Intel Corporation

[49] Project Zero Jann Horn 2018 Reading privileged memory with a side-channel httpsgoogleprojectzeroblogspotcom201801reading-privileged-memory-with-sidehtml Online accessed Jan 2018

[50] Morteza Mohajjel Kafshdooz Mohammadkazem Taram Sepehr As-sadi and Alireza Ejlali 2016 A Compile-Time Optimization MethodforWCET Reduction in Real-Time Embedded Systems Through BlockFormation ACMTrans Archit Code Optim 12 4 Article 66 (Jan 2016)25 pages httpsdoiorg1011452845083

[51] R Karri K Wu P Mishra and Yongkook Kim 2002 Concurrenterror detection schemes for fault-based side-channel cryptanalysisof symmetric block ciphers IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems 21 12 (Dec 2002) 1509ndash1517httpsdoiorg101109TCAD2002804378

[52] Mehmet Kayaalp Khaled N Khasawneh Hodjat Asghari EsfedenJesse Elwell Nael Abu-Ghazaleh Dmitry Ponomarev and AamerJaleel 2017 RIC Relaxed Inclusion Caches for Mitigating LLC Side-Channel Attacks In Proceedings of the 54th Annual Design AutomationConference 2017 (DAC rsquo17) ACM New York NY USA Article 76 pages httpsdoiorg10114530616393062313

[53] Khaled N Khasawneh Esmaeil Mohammadian Koruyeh ChengyuSong Dmitry Evtyushkin Dmitry Ponomarev and Nael Abu-Ghazaleh 2018 SafeSpec Banishing the Spectre of a Meltdown withLeakage-Free Speculation arXiv preprint arXiv180605179 (2018)

[54] Vladimir Kiriansky and Carl Waldspurger 2018 Speculative bufferoverflows Attacks and defenses arXiv preprint arXiv180703757(2018)

[55] Vladimir Kiriansky and Carl Waldspurger 2018 Speculative BufferOverflows Attacks and Defenses arXiv preprint arXiv180703757(2018)

[56] Paul Kocher 2018 Spectre Mitigations in MicrosoftrsquosCC++ Compiler httpswwwpaulkochercomdocMicrosoftCompilerSpectreMitigationhtml Online accessedJul 2018

[57] Paul Kocher Daniel Genkin Daniel Gruss Werner Haas MikeHamburg Moritz Lipp Stefan Mangard Thomas Prescher MichaelSchwarz and Yuval Yarom 2018 Spectre Attacks Exploiting Specu-lative Execution ArXiv e-prints (Jan 2018) arXiv180101203

[58] Esmaeil Mohammadian Koruyeh Khaled N Khasawneh ChengyuSong and Nael Abu-Ghazaleh 2018 Spectre Returns SpeculationAttacks using the Return Stack Buffer In 12th USENIXWorkshop on Of-fensive Technologies (WOOT 18) USENIX Association Baltimore MDhttpswwwusenixorgconferencewoot18presentationkoruyeh

[59] Esmaeil Mohammadian Koruyeh Khaled N Khasawneh ChengyuSong and Nael Abu-Ghazaleh 2018 Spectre Returns SpeculationAttacks using the Return Stack Buffer In 12th USENIXWorkshop on Of-fensive Technologies (WOOT 18) USENIX Association Baltimore MDhttpswwwusenixorgconferencewoot18presentationkoruyeh

[60] Monica S Lam Michael Martin Benjamin Livshits and John Wha-ley 2008 Securing Web Applications with Static and DynamicInformation Flow Tracking In Proceedings of the 2008 ACM SIG-PLAN Symposium on Partial Evaluation and Semantics-based Pro-gram Manipulation (PEPM rsquo08) ACM New York NY USA 3ndash12

httpsdoiorg10114513284081328410[61] Moritz Lipp Michael Schwarz Daniel Gruss Thomas Prescher

Werner Haas Anders Fogh Jann Horn Stefan Mangard Paul KocherDaniel Genkin Yuval Yarom and Mike Hamburg 2018 MeltdownReading Kernel Memory from User Space In 27th USENIX SecuritySymposium (USENIX Security 18)

[62] Fangfei Liu Yuval Yarom Qian Ge Gernot Heiser and Ruby BLee 2015 Last-Level Cache Side-Channel Attacks Are PracticalIn Proceedings of the 2015 IEEE Symposium on Security and Privacy(SP rsquo15) IEEE Computer Society Washington DC USA 605ndash622httpsdoiorg101109SP201543

[63] Jason Lowe-Power 2018 Visualizing Spectre with gem5 httpwwwlowepowercomjasonvisualizing-spectre-with-gem5html Onlineaccessed Jun 2018

[64] Jason Lowe-Power Venkatesh Akella Matthew K Farrens Samuel TKing and Christopher J Nitta 2018 Position Paper A Case forExposing Extra-architectural State in the ISA In Proceedings of the7th International Workshop on Hardware and Architectural Support forSecurity and Privacy (HASP rsquo18) ACM New York NY USA Article 86 pages httpsdoiorg10114532142923214300

[65] Scott A Mahlke David C Lin William Y Chen Richard E Hank andRoger A Bringmann 1992 Effective compiler support for predicatedexecution using the hyperblock In MICRO

[66] Giorgi Maisuradze and Christian Rossow 2018 Ret2Spec SpeculativeExecution Using Return Stack Buffers In Proceedings of the 2018 ACMSIGSAC Conference on Computer and Communications Security (CCSrsquo18) ACM New York NY USA 2109ndash2122 httpsdoiorg10114532437343243761

[67] Microsoft Security Response Center (MSRC) Matt Miller 2018Analysis and mitigation of speculative store bypass (CVE-2018-3639) httpsblogstechnetmicrosoftcomsrd20180521analysis-and-mitigation-of-speculative-store-bypass-cve-2018-3639 Onlineaccessed Jul 2018

[68] Keaton Mowery Sriram Keelveedhi and Hovav Shacham 2012 AreAES x86 cache timing attacks still feasible In Proceedings of the 2012ACM Workshop on Cloud computing security workshop

[69] Oleksii Oleksenko Bohdan Trach Tobias Reiher Mark Silbersteinand Christof Fetzer 2018 You Shall Not Bypass Employing datadependencies to prevent Bounds Check Bypass CoRR abs180508506(2018) arXiv180508506 httparxivorgabs180508506

[70] Yossef Oren Vasileios P Kemerlis Simha Sethumadhavan and An-gelos D Keromytis 2015 The Spy in the Sandbox Practical CacheAttacks in JavaScript and Their Implications In Proceedings of the22Nd ACM SIGSAC Conference on Computer and CommunicationsSecurity

[71] Guilherme Ottoni Thomas Hartin ChristopherWeaver Jason BrandtBelliappa Kuttanna and HongWang 2011 Harmonia A TransparentEfficient and Harmonious Dynamic Binary Translator Targeting theIntelampReg Architecture In Proceedings of the 8th ACM InternationalConference on Computing Frontiers (CF rsquo11) ACM New York NYUSA Article 26 10 pages httpsdoiorg10114520166042016635

[72] Andrew Pardoe 2018 Spectre mitigations in MSVC httpsblogsmsdnmicrosoftcomvcblog20180115spectre-mitigations-in-msvc Online accessed Jul 2018

[73] Joseph CH Park and Mike Schlansker 1991 On predicated executionHewlett-Packard Laboratories Palo Alto California

[74] Will Reese 2008 Nginx The High-performance Web Server andReverse Proxy Linux J 2008 173 Article 2 (Sept 2008) httpdlacmorgcitationcfmid=14122021412204

[75] Ryan Roemer Erik Buchanan Hovav Shacham and Stefan Savage2012 Return-oriented programming Systems languages and ap-plications ACM Transactions on Information and System Security(2012)

[76] Michael Schwarz Martin Schwarzl Moritz Lipp and Daniel Gruss2018 NetSpectre Read Arbitrary Memory over Network arXivpreprint arXiv180710535 (2018)

[77] Hovav Shacham Matthew Page Ben Pfaff Eu-Jin Goh NagendraModadugu and Dan Boneh 2004 On the Effectiveness of Address-space Randomization In Proceedings of the 11th ACM Conference onComputer and Communications Security (CCS rsquo04) ACM New YorkNY USA 298ndash307 httpsdoiorg10114510300831030124

[78] Julian Stecklina and Thomas Prescher 2018 LazyFP Leaking FPURegister State using Microarchitectural Side-Channels arXiv preprintarXiv180607480 (2018)

[79] G Edward Suh Jae W Lee David Zhang and Srinivas Devadas2004 Secure Program Execution via Dynamic Information FlowTracking In Proceedings of the 11th International Conference onArchitectural Support for Programming Languages and OperatingSystems (ASPLOS XI) ACM New York NY USA 85ndash96 httpsdoiorg10114510243931024404

[80] Mohammadkazem Taram Ashish Venkat and Dean Tullsen 2018Mobilizing the Micro-Ops Exploiting Context Sensitive Decodingfor Security and Energy Efficiency In Proceedings of the 45th AnnualInternational Symposium on Computer Architecture (ISCA rsquo18)

[81] Mohit Tiwari Hassan MG Wassel Bita Mazloom ShashidharMysore Frederic T Chong and Timothy Sherwood 2009 Com-plete Information Flow Tracking from the Gates Up In Proceedings ofthe 14th International Conference on Architectural Support for Program-ming Languages and Operating Systems (ASPLOS XIV) ACM NewYork NY USA 109ndash120 httpsdoiorg10114515082441508258

[82] Caroline Trippel Daniel Lustig and Margaret Martonosi 2018MeltdownPrime and SpectrePrime Automatically-Synthesized At-tacks Exploiting Invalidation-Based Coherence Protocols CoRRabs180203802 (2018) arXiv180203802 httparxivorgabs180203802

[83] DeanM Tullsen Susan J Eggers Joel S Emer Henry M Levy Jack L Loand Rebecca L Stamm 1996 Exploiting choice Instruction fetch andissue on an implementable simultaneous multithreading processor InACM SIGARCH Computer Architecture News Vol 24 ACM 191ndash202

[84] Dean M Tullsen Susan J Eggers and Henry M Levy 1995 Si-multaneous Multithreading Maximizing On-chip Parallelism InProceedings of the 22Nd Annual International Symposium on Com-puter Architecture (ISCA rsquo95) ACM New York NY USA 392ndash403httpsdoiorg101145223982224449

[85] Paul Turner 2018 Retpoline a software construct for preventingbranch-target-injection httpssupportgooglecomfaqsanswer7625886 Online accessed Jul 2018

[86] N Vachharajani M J Bridges J Chang R Rangan G Ottoni J ABlome G A Reis M Vachharajani and D I August 2004 RIFLEAn Architectural Framework for User-Centric Information-Flow Se-curity In Microarchitecture 2004 MICRO-37 2004 37th InternationalSymposium on 243ndash254 httpsdoiorg101109MICRO200431

[87] Jo Van Bulck Marina Minkin Ofir Weisse Daniel Genkin BarisKasikci Frank Piessens Mark Silberstein Thomas F Wenisch Yu-val Yarom and Raoul Strackx 2018 Foreshadow Extracting theKeys to the Intel SGX Kingdom with Transient Out-of-Order Execu-tion In Proceedings of the 27th USENIX Security Symposium USENIXAssociation See also technical report Foreshadow-NG [95]

[88] Ashish Venkat 2018 Breaking the ISA Barrier in Modern ComputingPhD Dissertation UC San Diego

[89] Ashish Venkat Arvind Krishnaswamy Koichi Yamada and RajanPalanivel 2015 Binary Translation driven Program State RelocationIn United States Patent Grant US009135435B2

[90] Ashish Venkat S Shamasunder Hovav Shacham and DeanM Tullsen2016 HIPStR Heterogeneous-ISA Program State Relocation In Pro-ceedings of the International Symposium on Architectural Support forProgramming Languages and Operating Systems

[91] Ashish Venkat and Dean M Tullsen 2014 Harnessing ISA diversityDesign of a heterogeneous-ISA chip multiprocessor In Proceedingsof the International Symposium on Computer Architecture

[92] G Venkataramani I Doudalis Y Solihin and M Prvulovic 2008FlexiTaint A programmable accelerator for dynamic taint propaga-tion In 2008 IEEE 14th International Symposium on High PerformanceComputer Architecture 173ndash184 httpsdoiorg101109HPCA20084658637

[93] Philipp Vogt Florian Nentwich Nenad Jovanovic Engin KirdaChristopher Kruegel and Giovanni Vigna 2007 Cross Site Script-ing Prevention with Dynamic Data Tainting and Static Analysis InNDSS Vol 2007 12

[94] Zhenghong Wang and Ruby B Lee 2006 Covert and side channelsdue to processor architecture In Computer Security Applications Con-ference 2006 ACSACrsquo06 22nd Annual IEEE 473ndash482

[95] Ofir Weisse Jo Van Bulck Marina Minkin Daniel Genkin BarisKasikci Frank Piessens Mark Silberstein Raoul Strackx Thomas FWenisch and Yuval Yarom 2018 Foreshadow-NG Breaking theVirtual Memory Abstraction with Transient Out-of-Order Execu-tion Technical report (2018) See also USENIX Security paper Fore-shadow [87]

[96] Troy Wolverton 2018 Spectre and Meltdown are now a le-gal pain for Intel the chip maker faces 35 lawsuits over the at-tacks httpswwwbusinessinsidercom35-lawsuits-have-been-filed-against-intel-over-spectre-and-meltdown-2018-2 Online ac-cessed Aug 2018

[97] Zhenyu Wu Zhang Xu and Haining Wang 2012 Whispers in theHyper-space High-speed Covert Channel Attacks in the Cloud InPresented as part of the 21st USENIX Security Symposium (USENIX Se-curity 12) USENIX Bellevue WA 159ndash173 httpswwwusenixorgconferenceusenixsecurity12technical-sessionspresentationwu

[98] Wei Xu Sandeep Bhatkar and Ramachandran Sekar 2006 Taint-Enhanced Policy Enforcement A Practical Approach to Defeat aWide Range of Attacks In USENIX Security Symposium 121ndash136

[99] M Yan J Choi D Skarlatos A Morrison C Fletcher and J Torrellas2018 InvisiSpec Making Speculative Execution Invisible in the CacheHierarchy In 2018 51st Annual IEEEACM International Symposium onMicroarchitecture (MICRO) 428ndash441 httpsdoiorg101109MICRO201800042

[100] Yuval Yarom and Katrina Falkner 2014 FLUSH+ RELOAD A HighResolution Low Noise L3 Cache Side-Channel Attack In USENIXSecurity

[101] Yuval Yarom Daniel Genkin and Nadia Heninger 2016 CacheBleedA timing attack on OpenSSL constant time RSA In InternationalConference on Cryptographic Hardware and Embedded Systems

[102] Alexander Yip Xi Wang Nickolai Zeldovich and M Frans Kaashoek2009 Improving Application Security with Data Flow AssertionsIn Proceedings of the ACM SIGOPS 22Nd Symposium on OperatingSystems Principles (SOSP rsquo09) ACM New York NY USA 291ndash304httpsdoiorg10114516295751629604

[103] Craig Zilles 2001 Linked List Traversal Micro-Benchmark httpzillescsillinoisedullubenchmarkhtml Online accessed Jun 2018

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
  • 3 Assumptions and Threat Model
  • 4 Architectural Overview
  • 5 Design and Implementation
    • 51 Microcode Customization
    • 52 Decoder-Level Information Flow Tracking
    • 53 Mitigations for Control-flow Mistraining
      • 6 Methodology
      • 7 Evaluation
        • 71 Security Discussion
        • 72 Performance
          • 8 Conclusion
          • Acknowledgments
          • References
Page 7: Context-Sensitive Fencing: Securing Speculative Execution ...av6ds/papers/asplos2019.pdf · Context-Sensitive Fencing: Securing Speculative Execution via Microcode Customization Mohammadkazem

processor from fetching and decoding instructions that ap-pear after the LFENCE but it restricts the dispatch of suchinstructions until the instructions preceding the LFENCEcomplete execution Since LFENCE serializes execution andyet makes some progress in the front-end it has been recom-mended by Intel as a low-overhead fence that can be insertedat vulnerable points in execution to defend against Spec-tre [45] In other words while serializing instructions suchas CPUID are enforced at Fetch LFENCE is enforced at theinstruction queue level Therefore in this work we start withthe LFENCE and propose new fences that come with fewerrestrictions different enforcement policies andor sport ad-ditional optimizations

512 Fence Enforcement PoliciesIt is important to note that none of the existing instructionsthat provide fence support were actually created for the pur-pose we (or the whole industry) need for Spectre mitigationThus this work will examine existing fences variants ofexisting fences and also introduce a new fence primitive Tobetter understand the design landscape we examine severalpossible properties of fences in this section

Early vs Late Enforcement We first categorize fencesinto early-enforced and late-enforced fences based on thepipeline stage at which they are enforced In particular werefer to any serializing instruction such as an LFENCE thatis enforced at the instruction queue or earlier in the pipelineas early-enforced If the fence is enforced at a later stage suchas the reservation station the loadstore queue or the cachecontroller we refer to it as a late-enforced fence Late enforce-ment in essence shifts the fence enforcement point towardsthe leaking structure (eg the cache) reducing the impacton instructions that do not access that structure and allow-ing for the enforcement of more fine-grained serializationrules (eg allow cache hits but not misses) Figure 4 showspotential fence enforcement points at various stages in theprocessor pipelineIt is important to note that the later a fence is enforced

the fewer the side channels it protects against For exampleIntelrsquos LFENCE prevents information leakage through all mi-croarchitectural structures that appear after the instructionqueue However to prevent information leakage through theinstruction cache side channel we would have to resort toa regular serializing instruction or an MFENCE resultingin prohibitively high performance overheads Similarly afence enforced at the data cache controller level would onlymitigate data cache-based side channels and will not protectagainst an FPU-based side-channel Instead the FPU-basedside-channel may be mitigated using a different fence thatis appropriately configured and enforced at the reservationstation or the FPU

Strict vs Relaxed Enforcement Depending upon howprohibitive they are we next classify fences into strict andrelaxed Strict fences are highly prohibitive and do not allow

Table 3 Characteristics of Different Fence Types

Fence Name EnforcementPoint

StrictRelaxed

Instructionsnot Allowed

MitigatesVariants

ExistingNew

Intelrsquos SIs (CPUID) Fetch Strict All All ExistingLFENCE IQ Strict All All Existing

LSQ-LFENCE LSQ Relaxed Ld v1 NewLSQ-MFENCE LSQ Relaxed LdampSt v1v11v12 New

CFENCE CC Relaxed None v1 New CC Cache Controller RS Reservation Station

any instructions to pass through them until the fence retireswhereas relaxed fences allow certain types of instructionsto pass through them For example an SFENCE that is en-forced at the loadstore queue and allows all instructions topass except stores that have a greater sequence number thanitself is a late-enforced and relaxed fence On the other handall x86 serializing instructions including LFENCE are early-enforced and strict Customizing the micro-op stream withearly-enforced and strict fences typically results in slower ex-ecution when compared to customization with late-enforcedand relaxed fences However with carefully enforced con-straints the late-enforced and relaxed fences could offer sim-ilar if not better security guarantees

Early vs Late Commit A fence typically remains ef-fective until it gets committed or squashed Based on Intelrsquosmanual [47] a serializing instruction is only allowed to becommitted if there is no preceding outstanding store thatis waiting to be written back While this behavior might benecessary for device synchronization or memory orderingenforcement for the purpose of securing speculative exe-cution against Spectre attacks there is no need to wait forstores to be written back This is because write buffers arenrsquotcommitted to the cache until the store reaches retirementand therefore the fact that a store is waiting for an outstand-ing writeback request to complete is enough evidence that itdid not occur along a misspeculated path Allowing a fenceto commit early without waiting for preceding outstandingstores to write back can considerably improve performancebecause as soon as a fence gets committed a stream of in-structions can advance further in the pipeline Therefore inthis work we propose and study the effects of a late-commitversion of each fence that does not wait for stores to be writ-ten back However we note that these versions should onlybe limited to the security use case we describe and shouldnot be used for synchronization

Newly Proposed Fences Table 3 summarizes the char-acteristics of different existing and newly proposed fencesAlthough the existing set of fences defend against all vari-ants they incur prohibitively high costs on performance dueto their strict enforcement constraints To better understandthe potential for fences beyond those that already exist wepropose and evaluate three new types of fences In an attackscenario we typically insert one of these fences between abranch instruction and a load instruction that potentiallyleaks sensitive information via a cache side-channel Theproposed fences more effectively and efficiently mitigate the

high impact variant-1 attacks by preventing informationleaks along misspeculated paths through cache-based sidechannels We describe each of them in greater detail below

The LSQ-LFENCE and the LSQ-MFENCE are relaxed fencesenforced at the loadstore queue While LSQ-LFENCE fenceis in effect it does not allow any subsequent load instructionto be issued out of the loadstore queue thereby preventingthe cache state from being changed by load instructions onmisspeculated paths Thus the LSQ-LFENCE mitigates theSpectre variant-1 attack On the other hand the more restric-tive LSQ-MFENCE does not allow any subsequent memoryinstruction (both loads and stores) to be issued out of theloadstore queue until the fence commits In addition tomitigating the variant-1 attack that the LSQ-LFENCE pro-tects against the LSQ-MFENCE mitigates the variant-11and variant-12 attacks that exploit store-to-load forwardingbetween speculative loads and stores

The CFENCE is a relaxed fence and is enforced at the cachecontroller level using the following set of rules First like anyother fence it allows all preceding instructions to proceedSecond since store instructions do not commit the contentsof the write buffer until the instruction retires they are unaf-fected by the CFENCE Finally it labels any subsequent loadas a non-modifying load and allows it to pass through thefence but the load is restricted from modifying the cachestate In particular a non-modifying load that results in acache hit is allowed to read the contents of the cache but isrestricted from changing the LRU and other metadata bitsA non-modifying load that results in a cache miss is markedas uncacheable allowing the memory read request to com-plete without altering the cache state In this way we avoidupdating the cache state upon encountering a speculativeload and donrsquot leave any observable cache footprints alongmisspeculated paths thereby mitigating the variant Spectrevariant-1 attack In addition due to locality of referencesthe miss rate of a reasonable program is typically very lowand therefore using CFENCE results in considerably lowerperformance overhead than other types of fencesNote that this can be applied recursively at each cache

level For example an L2 hit (L1 miss) will bypass the L1cache and not initiate a fill then read from the L2 cachewithout altering the LRU bits

513 Fence Frequency OptimizationThe most naiumlve yet secure way to insert fences is to liberallyinstrument every instruction of a vulnerable type (eg loadinstructions in the case of cache side-channels) in the pro-gram and add the fence micro-op to all of them In fact notevery instance of a vulnerable instruction type is necessarilyvulnerable for example all the loads in the program arenrsquotvulnerable to speculative attacks via cache side-channelsTherefore we can reduce the number of fences insertedHowever since failing to insert fences even in one scenariowould enable a Spectre attacker to read the whole victimrsquos

memory space it is of crucial importance that fence fre-quency optimizations be conducted meticulously In the fol-lowing we introduce two secure optimizations for reducingthe number of fencesBasic Block-Level Fence InsertionThe source of the Spec-tre attack is dynamic control speculation which implies thatthe speculation begins with a branch prediction and theprocessor starts speculatively executing along the predictedpath To fully mitigate this attack we want a fence betweeneach branch and subsequent loads but if one branch is fol-lowed by four loads we only need one fence to protect allfour Thus in this optimization we propose to only instru-ment (provide the alternate decoding for) the first instanceof a vulnerable instruction type (eg first load) of each ba-sic block and it is safe to leave the rest of the instructionsuninstrumented This is simply implemented by setting aflag in hardware whenever a branch is decoded then inserta micro-op in the alternative load decoding that along withthe fence also resets the flag When the flag is not set theoriginal decoding is usedTaint-Based Fence Insertion However even one load perbasic block is likely still too conservative In all known in-stances of the attacks the attacker performs some operations(mostly memory read) based on untrusted data that leadsto the information leak For example in Spectre variant-1the attacker provides an untrusted out-of-bound index to anarray In this work we assume any information that comesfrom the user address space and input devices (eg via thex86 IN instruction) as untrusted In addition we also con-sider DMArsquod pages as untrusted for which we rely on theIOMMU [5] to mark DMArsquod pages as tainted in the pagetable For the taint-based fence insertion optimization wepropose the insertion of fences for only vulnerabletaintedloads that operate on untrusted data by leveraging a noveldecoder-level information flow tracking (DLIFT) strategydescribed in the following section

52 Decoder-Level Information Flow TrackingAs its name indicates the distinguishing feature that makesDecoder-Level Information Flow Tracking (DLIFT) uniquefrom typical hardware taint tracking systems is that it canprovide the taint information at the decoder stage of thepipeline rather than at commit stage This is a critical distinc-tion for two reasons First of course our solution is decoder-based Second and perhaps more important execute- orcommit-based taint tracking comes too late in the pipelinefor any speculation-based attack and is of little useHowever commit is still the only stage where taint in-

formation is guaranteed to be correct so the design of adecode-based taint tracker is challenging In particular sincethe front-end of the pipeline typically runs far ahead in ex-ecution than the rest of the pipeline reading actual taintsfrom register files at the decode stage will read inaccuratevalues Therefore in the proposed DLIFT framework we

Macro-to-microop Translation

Unit

CSD-Enabled Decoder

Dis

patc

her

FetchTaint

Evaluator

Load Store Queue

2

Reg File

Actu

al T

aint

Bits

3

4

5

TLB

Tain

t Bitm

ap1

ADD rax rbx

raxSpecTaint |= rbxSpecTaint

Mistaint Recovery -Flushing the pipeline -Restarting translation and execution -Recovering speculative taint map

EA T

aint

CommitUnit

FetchDecode Execute

Speculative Taint Map

~Spec_Taint ampamp

Evaluated_Taint

True

Commit

Arch TaintMap

Arch RF

Figure 5 DLIFT integration with a CSD-enabled pipeline

separate the taint information into four taint structures ndash (a)a decoder-level taint map that tracks and maintains the taintinformation for architectural registers (b) the physical reg-ister file augmented with taint information that maintainsdynamically computed taint information at execute (c) theTLB and page tables augmented with a taint bitmap to trackcache block-level taint information and (d) a commit-leveltaint map that maintains verified architectural register taintinformation While the latter three structures are typical ofany standard DIFT implementation [79] the first structureis a new addition proposed by this work Instruction trans-lation thus depends on some speculative taint tracking thatmight not be exact potentially leaving vulnerable instruc-tions unfenced (ie translating to non-secure micro-ops) Forthis reason DLIFT relies on a mistaint recovery mechanismthat upon detecting a mistaint at the execution stage redi-rects and restarts the execution from the incorrectly taintedinstructionFigure 5 shows the integration of DLIFT with a CSD-

enabled pipeline The DLIFT engine maintains a taint mapthat stores a speculative taint bit for each architectural reg-ister For each incoming instruction (bull1 ) the DLIFT engineevaluates the taint of the destination register(s) based on thespeculative taint bits of the source registers Taint evaluationfollows the standard DIFT procedure [79] If a source regis-ter is marked as tainted in the speculative taint map DLIFTpropagates this taint to the destination register and furthertriggers context-sensitive translation (bull2 ) of the instruction(eg with speculative fence insertion) depending upon theconfigured levels of performance and security

Tracking taints at the decoder level is not always straight-forward First during speculative execution the DLIFT en-gine continues to propagate taints along the misspeculatedpath potentially leaving the taint map in an inconsistentstate upon recovery from a branch misprediction To thisend the DLIFT engine rolls back the state of the taint map

to its commit-level counterpart as part of the branch mis-prediction recovery Second the effective addresses of loadinstructions is usually not known at the decoder stage dueto which the DLIFT engine cannot accurately compute andpropagate taint information A conservative approach thatmarks all loads with unknown effective addresses as taintedcould severely degrade performance since overtainting typi-cally results in a high frequency of fence insertion Thereforewe take a more optimistic approach by assuming loads withunknown effective addresses are untainted and further relyon our mistaint detection mechanism at the execute stageto validate the predicted taints against the dynamically eval-uated taints (bull3 andbull5 ) In the event a mistaint is detectedwe update the speculative taint map flush the pipeline andrestart the translation and execution of the mistainted in-struction Note that we only perform mistaint recovery forthe undertainting scenario since it could potentially leavevulnerable instructions unfenced

Furthermore the DLIFT engine is capable of following in-struction sequences and execution patterns such as a taintedload followed by a branch allowing for the seamless de-tection of different Spectre gadget variants In this way al-though speculative the DLIFT engine has the potential forfast recovery from misspeculation and further allows themicrocode customization framework to perform a more tar-geted and surgical insertion of fences for particular vulnera-ble targets

53 Mitigations for Control-flow MistrainingTwo Spectre variants (v2 and v5) rely on the mistrainingof the branch predictor and the return stack buffer to influ-ence the victimrsquos branch outcome across different protec-tion domains To mitigate these attacks we examine severalhardening mechanisms In the simplest case where we areprotecting kernel branches from getting influenced we caninstrument syscall instructions to enforce branch predictorstate isolation If we want to protect a wider range of domaincrossings we could rely on a simple hardware mechanismthat identifies control transfer to a domain with a higherprivilege and then sets a flag The first control flow instruc-tion that decodes after that flag is set would then triggeran alternate decoding and reset the flag The alternate de-coding would insert a fence (to protect against this branchbeing mispredicted) then execute micro-ops that enforcebranch predictor state isolation and subsequently resumethe original control flowThis solution then is flexible enough that any region of

code could be protected in this way by configuringmodel spe-cific range registers (MSRRs) to indicate a protected regionthat would always reset the branch predictor upon entrythis then even prevents mistraining of sensitive code ofdifferent threads within the same process To prevent theMSRRs from being tampered by the attacker we only allow

Table 4 Architecture Detail for the Baseline x86 CoreBaseline Processor

Frequency 33 GHz I cache 32 KB 8 wayFetch width 4 fused uops D cache 32 KB 8 wayIssue width 6 unfused uops ROB size 168 entriesINTFP Regfile 160144 regs IQ 54 entriesRAS size 816 32 entries BTB size 256 512 1024 entriesLQSQ size 6436 entries Functional Int ALU(6) Mult(1)Branch Predictor LTAGE Units FP ALUMult(2) SIMD(2)

the operating system to configure MSRRs and we use specu-lative fences to guard x86rsquosWRMSR instruction Furthermorein the case of remapping via mmap() the operating systemis also responsible for reconfiguring MSRRs

The easiest and the most heavyweight way to isolate theBP state is to clear all the states by a series of returnbranchmicro-ops However such a hard reset of the BP can be morequickly and efficiently performed using a special micro-opthat clears some or all state of the branch predictor therebyenabling custom solutions to reset different structures withinthe prediction unit if such a micro-op is available In thiswork we assume a special micro-op to clear the branch targetbuffer (BTB) and the return stack buffer (RSB) the primarytargets of the Spectre attackIt should be noted that while resetting the BTB and the

RSB will induce more mispredictions (something the attackerwants) it does the attacker no good with respect to theseSpectre variants because the attacker can no longer controlthe target of the control flow misspeculation

6 MethodologyThis section details the experimental methodology for theperformance and security evaluation of the context-sensitivefencing frameworkOur baseline processor is modeled after the Intel Sandy-

bridge microarchitecture [43] Table 4 shows the architec-tural configuration of our baseline processor in more detailNote that we evaluate against three different branch pre-dictor configurations (with different BTB and RAS sizes) tomeasure the performance impact of our micro-op flows thatperform branch predictor state isolation across protection do-mains to defend against variant-2Wemodel this architectureusing the gem5 [8] architectural simulator which already fea-tures x86 micro-op translation Furthermore gem5 alreadyfeatures a Spectre test infrastructure and a visualization toolthat allow us to validate our claims regarding the securityguarantees of context-sensitive fencing [63]One of the primary goals of this work is to protect ker-

nel memory from being leaked along misspeculated pathsTherefore we use the full system simulation mode of gem5which allows us to boot an Ubuntu 1804 distribution ofLinux with a kernel version of 4813 Furthermore in orderto provide realistic estimates regarding attack coverage andperformance impact of our proposed techniques we selecta good mix of benchmarks that spend different amounts ofexecution time in kernel mode touch different aspects of the

Table 5 Benchmarks DescriptionBenchmark Description Kernel-Time

nginx HTTP Web Server 66ps Process information query 75ping Sends ICMP ECHO_REQUEST to loopback 95ls Performs two level directory listing 79llu Linked list traversal micro-benchmark 7

bzip2 Compression 34gcc C Language optimizing compiler 11

omnetpp Discrete event simulation 15

sjeng Artificial Intelligence (game tree searchand pattern recognition) 22

operating system and perform different number of systemcalls during execution as illustrated in Table 5 To this endwe include four benchmarks bzip2 gcc sjeng omnet fromthe SPEC CPU2006 suite [41] that exhibit varying degreesof instruction-level parallelism Furthermore we use threecommonly used Unix tools that target different functional-ities of the kernel and use different device drivers ndash (ping)that sends and receives ICMP packets (ps aux) that queriesprocess information and (ls ) that queries the filesystemin order to list multi-level directory information In additionwe also add a memory-intensive program llu that allocatesa large amount of memory for a linked list and traversesthe list making random memory accesses [103] Finally wealso evaluate the impact of our fences on the nginx webserver [74] using the wrk [35] framework to generate HTTPrequests

7 EvaluationIn this section we evaluate both security and performanceimpacts of the proposed strategy starting first with securityassessment and following that with performance evaluation

71 Security DiscussionThe proposed defense strategy can mitigate five variants ofthe Spectre attacks We discuss these in detail below

Variants 1 11 and 12 The goal of these variants is toleak memory contents of a victim along a misspseculatedpath by bypassing a bounds check andor further stitch to-gether such Spectre gadgets via store-to-load forwardingAll variants exploit a data cache-based side channel to leakinformation Context-Sensitive Fencing mitigates these vari-ants primarily by preventing information from being leakedalong misspeculated paths via fences that are surgically in-serted into the instruction stream and strategically placedbetween the conditional branch that performs the boundscheck and the first load in the Spectre gadget This workproposes three new fences ndash LSQ-LFENCE LSQ-MFENCEand CFENCE that each defend against the variant-1 attack bydisallowing loads along misspeculated paths to modify thedata cache state LSQ-MFENCE additionally protects againstthe variants 11 and 12 by preventing both speculative loadsand stores from being issued out of the loadstore queue

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

Injected for All Kernel Loads

100

105

110

115

120

125

130

135

140

145

150

155

160

165

170

175

180

185

190

195

200

Nor

mal

ized

Exe

cution

Tim

e

LFENCE-LC

LSQ-MFENCE-LC

CFENCE-LC

Figure 6 Execution Time of Different Fence EnforcementLevels (normalized to insecure execution)

effectively avoiding store-to-load forwarding between spec-ulative loads and stores

The surgical insertion of these fences is facilitated by thedecoder-level information flow tracker (DLIFT) that can fol-low instruction sequences that could serve as Spectre gad-gets as described in Section 52 While the DLIFT itself per-forms speculative tracking of taint information it has theability to detect mistaints and recover within three stages ofthe pipeline We further experimentally show later in thissection that the speculative nature of the DLIFT engine maysometimes result in overtainting (where trusted operand-sinstructions get marked as tainted) but never results in anundetected and unrecovered undertainting (where untrustedoperandsinstructions remain untainted)

Variants 2 and 5 These variants reverse-engineer andmistrain the branch predictor and the return address stack toinfluence the branch outcomes of a victim which in our caseis the Linux kernel In addition to the surgical fence injec-tion and speculative taint tracking context-sensitive fencingfurther employs mistraining mitigations that circumventsuch attempts More specifically we intercept all protectiondomain crossings via syscall and int 0x80 instructions andperform a full reset of the branch target buffer (BTB) and thereturn stack buffer (RSB) clearing all previous state Thisprevents user code from influencing branch outcomes of thekernel code We record no instance of BTB collision betweenuser and kernel branches and we ensure that we alwaysstart with a clean BTB and RSB upon entering kernel mode

To show that our different fences and frequency optimiza-tions close Spectre attacks we use a proof of concept Spectreimplementation and visualization tool [63] The attacks failin every case when we use context-sensitive fencing to insertfences despite our performance optimizations

72 PerformanceFigure 6 measures the performance impact of three differentfences ndash (a) the standard x86 LFENCE (b) the LSQ-MFENCE

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

Injected for All Kernel Loads

100

105

110

115

120

125

130

135

140

145

Nor

mal

ized

Exe

cution

Tim

e

CFENCE-LC

CFENCE-EC

Figure 7 Execution Time of Early and Late Commit ofCFENCE (normalized to insecure execution)

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

00

01

02

03

04

05

06

07

08

09

10

Rat

ioO

ver

All

Loa

ds

bzip2 gcc omnet llu nginx sjeng ls ping ps

Non-Modifying Misses Non-Modifying Hits Misses Hits

Figure 8 Effects of injecting CFENCE on Cache Miss Rate

and (c) the CFENCE all enforced with the standard latecommit pessimistically inserted for every kernel load in theprogram Clearly from the figure the CFENCE incurs thelowest performance overhead since it is less restrictive thanthe other two and is enforced at a much later pipeline stageWe further study the effect of late and early commit policiesfor the CFENCE as shown in Figure 7 The early commitversion of the CFENCE consistently performs better than thelate commit version saving about 4 in overall executiontime on averageOverall the CFENCE reduces the incurred performance

overhead due to fencing by 23X bringing down the exe-cution time overhead from 48 to 21 Furthermore thisperformance improvement is consistent across all the bench-marks the only exception being llu which performs randommemory accesses due to the linked list traversal and suffersfrom a very high cache miss rateIn Figure 8 we study the effect of the CFENCE on cache

miss rate Recall that when a load passes a CFENCE it getsmarked as non-modifying and as a result if it ends up beinga miss in the cache it is deemed uncacheable and therefore

the fetched cache block isnrsquot brought into cache If the missrate of a program is already high as in the case of llu thepresence of a CFENCE in the instruction stream could po-tentially result in under-utilization of the cache since themissed blocks arenrsquot being filled back into the cache whilethe CFENCE is begin enforced For programs that have un-usually low hit rates we suggest using the standard LFENCEinstead of the CFENCE However in those cases it should benoted that the overhead of the mitigation is low regardlessof the fence usedFurthermore it is important to note that the number of

non-modifying loads in the dynamic instruction stream donot necessarily have a negative impact on performancewhile a CFENCE is being enforced For example while theping program has the most non-modifying loads it doesnot suffer much in terms of overall performance becausethe working set fits well into the cache and most of thenon-modifying accesses end up being hits In general wegain more from employing the CFENCE instead of standardserializing fences when the cache hit rate is lowFigure 9 shows the performance of our proposed fence

frequency optimization techniques In the first optimizationwe inject the CFENCE only for tainted loads and branches asindicated by the DLIFT engine This reduces the performanceoverhead of the defense from 21 to 11 on average We fur-ther optimize the number of fences inserted by performingbasic block-level fence insertion where we only instrumentthe first instance of a vulnerable load in each basic block Thisresults in an additional 4 improvement in performance AsFigure 9 shows the once per basic block opitimization is suc-cessful at improving the performance of all the benchmarksexcept llu That is because CFENCE changes the cache accesspattern usually at some cost but occasionally beneficiallyby bypassing accesses and reducing pressure on the cacheIn such a case like llu with its large working set executingfewer fences would then be slightly less beneficial Overallcompared to the state-of-the-art fence injection scheme thatpessimistically injects an LFENCE for all kernel loads ourDLIFT-Based CFENCE injection reduces the performanceoverhead from 48 to just 77 on averageWe next study the accuracy and coverage of our DLIFT

implementation Figure 10 examines overtainting scenarioswhere the DLIFT engine conservatively marks instructionsas tainted when theyrsquore actually not The results are nor-malized to the total number of dynamic load instructionsexecuted Further we examine two scenarios ndash the bars tothe left represent DLIFT-based fencing without the basic-block level fence insertion and the bars to the right includesthe basic-block level fence insertion optimization In mostcases the percentage of overtainted loads remains low inboth scenariosFigure 11 examines undertainting scenarios where the

DLIFT engine optimistically forgoes instrumenting certaininstructions that are actually vulnerable but recovers from

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

CFENCE

100

105

110

115

120

125

130

135

140

145

150

155

160

Nor

mal

ized

Exe

cution

Tim

e

Always Fencing

DLIFT-Based Fencing

DLIFT + Once per BB

Figure 9 Impact of Fence Frequency Optimizations on Exe-cution Time

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

00

01

02

03

04

05

06

07

08

09

10

Rat

ioO

ver

All

Loa

ds

ping ps nginx bzip2 ls sjeng llu omnet gcc

Over-tainted Loads Tainted Loads

Figure 10 Accuracy of DLIFT Overtainting Rate of Loads

such scenarios later in the pipeline We observe that thepercentage of undertainted loads is low and consistentlybelow 10 with the outliers being the kernel-heavy applica-tions ping and ps which have the highest number of taintedinstructions (instructions with at least one tainted sourceregister) as shown in Figure 12Note that our DLIFT engine performs explicit informa-

tion flow tracking modeled after DIFT [79] that tracks copyload-address store-address and computational dependen-cies However we also evaluate CSF with a more conser-vative implicit information flow tracking model in whichwe taint the program counter when the branch outcomedepends upon a tainted value and further track both sidesof the branch This results in more tainted instructions andincurs 118 extra performance overhead on top of our taint-based CFENCE insertionTo evaluate the effect of our mistraining mitigations we

measure the impact of a full micro-op based BTB and RSBreset at every protection domain crossing We do this ex-periment on three separate branch predictor configurations(a) a small predictor with 256 BTB and 8 RAS entries (b) a

pssje

ngbz

ip2

ping llu gc

c ls

nginx

omne

t

Speculative DIFT ndash Sfence

000

001

002

003

004

005

006

007

008

009

Rat

ioof

Under

-tai

nte

dLoa

ds

Figure 11 Coverage of DLIFT Undertainting Rate of Loads

omnet bzip2 llu ls nginx ping ps sjeng gcc

Speculative DIFT ndash Sfence

00

01

02

03

04

05

06

07

08

09

10

Tai

nte

dIn

tstr

uct

ions

Tainted Untatinted

Figure 12 Ratio of instructions marked as tainted by DLIFT

medium predictor with 512 BTB and 16 RAS entries and(c) a large predictor with 1024 BTB and 32 RAS entries Wemeasure an average performance degradation of 27 on oursmall predictor 66 on our medium predictor and 152 onour large predictor Naturally the overhead is almost com-pletely due to the loss of prediction accuracy rather thanthe cost of micro-op expansionIn summary the proposed defense strategy introduces a

flexible microcode customization framework that performthe surgical insertion of newly introduced speculation fencesthat mitigate five variants of the Spectre class of attacksreducing the fencing overhead of state-of-the-art fence-basedSpectre mitigations by a factor of 6

8 ConclusionIn this work we propose context-sensitive fencing (CSF) aset of architectural techniques that provide high-performancedefense against Spectre-style attacks In particular we showthat we can reduce fencing overhead by a factor of 6 com-pared to a conservative fence insertion method

This is done by injecting fence instructions dynamicallyin the decoder with no recompilation and binary translationrequired This allows us to employ runtime information tostrategically insert fences when needed using taint infor-mation and avoiding redundant fences after branches Thiswork also introduces new fence primitives which protectsensitive structures from speculation-based microarchitec-tural effects with minimal impact on instruction throughputin the pipeline

AcknowledgmentsThe authors would like to thank the anonymous reviewersfor their helpful insights This research was supported inpart by NSF Grant CNS-1652925 NSFIntel FoundationalMicroarchitecture Research Grant CCF-1823444 and DARPAunder the agreement number HR0011-18-C-0020

References[1] Onur Acıiccedilmez Ccediletin Kaya Koccedil and Jean-Pierre Seifert 2007 Predict-

ing secret keys via branch prediction In CryptographersacircĂŹ Track atthe RSA Conference

[2] Onur Acıiccedilmez Werner Schindler and Ccediletin K Koccedil 2007 Cachebased remote timing attack on the AES In CryptographersacircĂŹ Trackat the RSA Conference

[3] Shaizeen Aga and Satish Narayanasamy 2017 InvisiMem SmartMemory Defenses for Memory Bus Side Channel In Proceedings ofthe 44th Annual International Symposium on Computer Architecture(ISCA rsquo17) ACM New York NY USA 94ndash106 httpsdoiorg10114530798563080232

[4] AMD 2018 White paper SOFTWARE TECHNIQUES FOR MANAG-ING SPECULATION ON AMD PROCESSORS Technical Report RE-VISION 12418 httpsdeveloperamdcomwp-contentresourcesManaging-Speculation-on-AMD-Processorspdf

[5] Nadav Amit Muli Ben-Yehuda Dan Tsafrir and Assaf Schuster 2011vIOMMU Efficient IOMMU Emulation In USENIX Annual TechnicalConference

[6] ARM 2018 Whitepaper Cache Speculation Side-channelsTechnical Report Version 22 httpsdeveloperarmcom-mediadeveloperpdfSecurity20update201020July2018CacheSpeculationSide-channels-v22pdfrevision=7de26366-a49f-4c23-85f0-34e8f5e38881

[7] Eli Biham and Adi Shamir 1997 Differential fault analysis of secretkey cryptosystems In Annual International Cryptology Conference

[8] Nathan L Binkert Ronald G Dreslinski Lisa R Hsu Kevin T Lim Ali GSaidi and Steven K Reinhardt 2006 The M5 Simulator ModelingNetworked Systems Micro IEEE (2006)

[9] Darrell Boggs Gary Brown Nathan Tuck and K S Venkatraman2015 Denver Nvidiarsquos First 64-bit ARM Processor IEEE Micro (2015)

[10] Chandler Carruth 2018 Mitigating Speculative Attacks inCrypto httpsgithubcomHACS-workshopspectre-mitigationsblobmastercryptoguidelinesmd Online accessed Jul 2018

[11] Chandler Carruth 2018 RFC Speculative Load Hardening (aSpectre variant1 mitigation) httpsdocsgooglecomdocumentd1wwcfv3UV9ZnZVcGiGuoITT61eKo3TmoCS3uXLcJR0edit Onlineaccessed Jul 2018

[12] Chandler Carruth 2018 RFC Speculative Load Hardening (a Spectrevariant1 mitigation) httpslistsllvmorgpipermailllvm-dev2018-March122085html Online accessed Jul 2018

[13] Guoxing Chen Sanchuan Chen Yuan Xiao Yinqian Zhang ZhiqiangLin and Ten H Lai 2018 SgxPectre Attacks Leaking Enclave

Secrets via Speculative Execution CoRR abs180209085 (2018)arXiv180209085 httparxivorgabs180209085

[14] H Chen X Wu L Yuan B Zang P c Yew and F T Chong 2008From Speculation to Security Practical and Efficient InformationFlow Tracking Using Speculative Hardware In 2008 InternationalSymposium on Computer Architecture 401ndash412 httpsdoiorg101109ISCA200818

[15] Shimin Chen Phillip B GibbonsMichael Kozuch and Todd CMowry2011 Log-based Architectures Using Multicore to Help SoftwareBehave Correctly SIGOPS Oper Syst Rev 45 1 (Feb 2011) 84ndash91httpsdoiorg10114519450231945034

[16] Hari Cherupalli Henry Duwe Weidong Ye Rakesh Kumar and JohnSartori 2017 Software-based Gate-level Information Flow Securityfor IoT Systems In Proceedings of the 50th Annual IEEEACM Interna-tional Symposium on Microarchitecture (MICRO-50 rsquo17) ACM NewYork NY USA 328ndash340 httpsdoiorg10114531239393123955

[17] JonathanCorbet 2017 The current state of kernel page-table isolationhttpslwnnetArticles741878 Online accessed Jan 2018

[18] Marc L Corliss E Christopher Lewis and Amir Roth 2003 DISE Aprogrammable macro engine for customizing applications In Com-puter Architecture 2003 Proceedings 30th Annual International Sym-posium on

[19] Marc L Corliss E Christopher Lewis and Amir Roth 2005 Low-overhead interactive debugging via dynamic instrumentation withDISE In 11th International Symposium on High-Performance ComputerArchitecture

[20] Marc L Corliss E Christopher Lewis and Amir Roth 2005 UsingDISE to Protect Return Addresses from Attack SIGARCH ComputArchit News (March 2005)

[21] Jedidiah R Crandall and Frederic T Chong 2004 Minos Control DataAttack Prevention Orthogonal toMemoryModel In Proceedings of the37th Annual IEEEACM International Symposium on Microarchitecture(MICRO 37) IEEE Computer Society Washington DC USA 221ndash232httpsdoiorg101109MICRO200426

[22] John Criswell Nathan Dautenhahn and Vikram Adve 2014 VirtualGhost Protecting Applications from Hostile Operating Systems InProceedings of the 19th International Conference on Architectural Sup-port for Programming Languages and Operating Systems (ASPLOS rsquo14)ACM New York NY USA 81ndash96 httpsdoiorg10114525419402541986

[23] John Criswell Andrew Lenharth Dinakar Dhurjati and Vikram Adve2007 Secure Virtual Architecture A Safe Execution Environmentfor Commodity Operating Systems In Proceedings of Twenty-firstACM SIGOPS Symposium on Operating Systems Principles (SOSP rsquo07)ACM New York NY USA 351ndash366 httpsdoiorg10114512942611294295

[24] Michael Dalton Hari Kannan and Christos Kozyrakis 2007 RakshaA Flexible Information Flow Architecture for Software Security InProceedings of the 34th Annual International Symposium on ComputerArchitecture (ISCA rsquo07) ACM New York NY USA 482ndash493 httpsdoiorg10114512506621250722

[25] James C Dehnert Brian K Grant John P Banning Richard JohnsonThomas Kistler Alexander Klaiber and Jim Mattson 2003 The Trans-meta Code Morphing Software using speculation recovery andadaptive retranslation to address real-life challenges In Proceedingsof the international symposium on Code generation and optimizationfeedback-directed and runtime optimization

[26] Matthew DeVuyst Ashish Venkat and Dean M Tullsen 2012 Ex-ecution migration in a heterogeneous-isa chip multiprocessor InProceedings of the Seventeenth International Conference on Architec-tural Support for Programming Languages and Operating Systems

[27] Craig Disselkoen David Kohlbrenner Leo Porter and Dean Tullsen2017 Prime+Abort A Timer-Free High-Precision L3 CacheAttack using Intel TSX In 26th USENIX Security Symposium

(USENIX Security 17) USENIX Association Vancouver BC 51ndash67 httpswwwusenixorgconferenceusenixsecurity17technical-sessionspresentationdisselkoen

[28] Xiaowan Dong Zhuojia Shen John Criswell Alan Cox and SandhyaDwarkadas 2018 Spectres Virtual Ghosts and Hardware SupportIn Proceedings of the 7th International Workshop on Hardware andArchitectural Support for Security and Privacy (HASP rsquo18) ACM NewYork NY USA Article 5 9 pages httpsdoiorg10114532142923214297

[29] Xiaowan Dong Zhuojia Shen John Criswell Alan L Cox and Sand-hya Dwarkadas 2018 Shielding Software From Privileged Side-Channel Attacks In 27th USENIX Security Symposium (USENIX Se-curity 18) USENIX Association Baltimore MD 1441ndash1458 httpswwwusenixorgconferenceusenixsecurity18presentationdong

[30] Dmitry Evtyushkin Ryan Riley Nael CSE Abu-Ghazaleh ECE andDmitry Ponomarev 2018 BranchScope A New Side-Channel Attackon Directional Branch Predictor In Proceedings of the Twenty-ThirdInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems (ASPLOS rsquo18) ACM New York NYUSA 693ndash707 httpsdoiorg10114531731623173204

[31] Karine Gandolfi Christophe Mourtel and Francis Olivier 2001 Elec-tromagnetic analysis Concrete results In International Workshop onCryptographic Hardware and Embedded Systems

[32] Qian Ge Yuval Yarom David Cock and Gernot Heiser 2018 Asurvey of microarchitectural timing attacks and countermeasures oncontemporary hardware Journal of Cryptographic Engineering 8 1(2018) 1ndash27

[33] Daniel Genkin Adi Shamir and Eran Tromer 2014 RSA key ex-traction via low-bandwidth acoustic cryptanalysis In InternationalCryptology Conference

[34] Daniel Genkin Adi Shamir and Eran Tromer 2016 Acoustic Crypt-analysis Journal of Cryptology (2016)

[35] Github [n d] wrk - a HTTP benchmarking tool httpsgithubcomwgwrk Online accessed Jun 2018

[36] Daniel Gruss Moritz Lipp Michael Schwarz Richard Fellner Cleacute-mentine Maurice and Stefan Mangard 2017 Kaslr is dead long livekaslr In International Symposium on Engineering Secure Software andSystems Springer 161ndash176

[37] Daniel Gruss Raphael Spreitzer and Stefan Mangard 2015Cache Template Attacks Automating Attacks on InclusiveLast-Level Caches In 24th USENIX Security Symposium (USENIXSecurity 15) USENIX Association Washington DC 897ndash912httpswwwusenixorgconferenceusenixsecurity15technical-sessionspresentationgruss

[38] Maurice CleacutementineWagner Klaus Gruss Daniel and StefanMangard2016 Flush+Flush A Fast and Stealthy Cache Attack

[39] V Haldar D Chandra andM Franz 2005 Dynamic taint propagationfor Java In 21st Annual Computer Security Applications Conference(ACSACrsquo05) 9 ppndash311 httpsdoiorg101109CSAC200521

[40] W Halfond A Orso and P Manolios 2008 WASP Protecting WebApplications Using Positive Tainting and Syntax-Aware EvaluationIEEE Transactions on Software Engineering 34 1 (Jan 2008) 65ndash81httpsdoiorg101109TSE200770748

[41] John L Henning 2006 SPEC CPU2006 Benchmark DescriptionsSIGARCH Comput Archit News (Sept 2006)

[42] Jonathan J Hoch and Adi Shamir 2004 Fault analysis of streamciphers In International Workshop on Cryptographic Hardware andEmbedded Systems

[43] Intel 2008 2nd Generation Intel Core vPro Processor Family Tech-nical Report httpwwwintelcomcontentdamdocwhite-paperperformance-2nd-generation-core-vpro-family-paperpdf

[44] Intel 2014 Software guard extensions programming reference (2014)httpssoftwareintelcomsitesdefaultfiles329298-001pdf

[45] Intel 2018 White paper Intel Analysis of Speculative Exe-cution Side Channels Technical Report 336983-001 Revision10 httpsnewsroomintelcomwp-contentuploadssites11201801Intel-Analysis-of-Speculative-Execution-Side-Channelspdf

[46] Intel 2018 White paper Retpoline A Branch Target In-jection Mitigation Technical Report 337131-003 Revision003 httpssoftwareintelcomsitesdefaultfilesmanaged1d46Retpoline-A-Branch-Target-Injection-Mitigationpdf

[47] Intel Corporation 2009 Intelreg 64 and IA-32 Architectures OptimizationReference Manual

[48] Intel Corporation 2011 Intel 64 and IA-32 Architectures SoftwareDeveloperrsquos Manual Intel Corporation

[49] Project Zero Jann Horn 2018 Reading privileged memory with a side-channel httpsgoogleprojectzeroblogspotcom201801reading-privileged-memory-with-sidehtml Online accessed Jan 2018

[50] Morteza Mohajjel Kafshdooz Mohammadkazem Taram Sepehr As-sadi and Alireza Ejlali 2016 A Compile-Time Optimization MethodforWCET Reduction in Real-Time Embedded Systems Through BlockFormation ACMTrans Archit Code Optim 12 4 Article 66 (Jan 2016)25 pages httpsdoiorg1011452845083

[51] R Karri K Wu P Mishra and Yongkook Kim 2002 Concurrenterror detection schemes for fault-based side-channel cryptanalysisof symmetric block ciphers IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems 21 12 (Dec 2002) 1509ndash1517httpsdoiorg101109TCAD2002804378

[52] Mehmet Kayaalp Khaled N Khasawneh Hodjat Asghari EsfedenJesse Elwell Nael Abu-Ghazaleh Dmitry Ponomarev and AamerJaleel 2017 RIC Relaxed Inclusion Caches for Mitigating LLC Side-Channel Attacks In Proceedings of the 54th Annual Design AutomationConference 2017 (DAC rsquo17) ACM New York NY USA Article 76 pages httpsdoiorg10114530616393062313

[53] Khaled N Khasawneh Esmaeil Mohammadian Koruyeh ChengyuSong Dmitry Evtyushkin Dmitry Ponomarev and Nael Abu-Ghazaleh 2018 SafeSpec Banishing the Spectre of a Meltdown withLeakage-Free Speculation arXiv preprint arXiv180605179 (2018)

[54] Vladimir Kiriansky and Carl Waldspurger 2018 Speculative bufferoverflows Attacks and defenses arXiv preprint arXiv180703757(2018)

[55] Vladimir Kiriansky and Carl Waldspurger 2018 Speculative BufferOverflows Attacks and Defenses arXiv preprint arXiv180703757(2018)

[56] Paul Kocher 2018 Spectre Mitigations in MicrosoftrsquosCC++ Compiler httpswwwpaulkochercomdocMicrosoftCompilerSpectreMitigationhtml Online accessedJul 2018

[57] Paul Kocher Daniel Genkin Daniel Gruss Werner Haas MikeHamburg Moritz Lipp Stefan Mangard Thomas Prescher MichaelSchwarz and Yuval Yarom 2018 Spectre Attacks Exploiting Specu-lative Execution ArXiv e-prints (Jan 2018) arXiv180101203

[58] Esmaeil Mohammadian Koruyeh Khaled N Khasawneh ChengyuSong and Nael Abu-Ghazaleh 2018 Spectre Returns SpeculationAttacks using the Return Stack Buffer In 12th USENIXWorkshop on Of-fensive Technologies (WOOT 18) USENIX Association Baltimore MDhttpswwwusenixorgconferencewoot18presentationkoruyeh

[59] Esmaeil Mohammadian Koruyeh Khaled N Khasawneh ChengyuSong and Nael Abu-Ghazaleh 2018 Spectre Returns SpeculationAttacks using the Return Stack Buffer In 12th USENIXWorkshop on Of-fensive Technologies (WOOT 18) USENIX Association Baltimore MDhttpswwwusenixorgconferencewoot18presentationkoruyeh

[60] Monica S Lam Michael Martin Benjamin Livshits and John Wha-ley 2008 Securing Web Applications with Static and DynamicInformation Flow Tracking In Proceedings of the 2008 ACM SIG-PLAN Symposium on Partial Evaluation and Semantics-based Pro-gram Manipulation (PEPM rsquo08) ACM New York NY USA 3ndash12

httpsdoiorg10114513284081328410[61] Moritz Lipp Michael Schwarz Daniel Gruss Thomas Prescher

Werner Haas Anders Fogh Jann Horn Stefan Mangard Paul KocherDaniel Genkin Yuval Yarom and Mike Hamburg 2018 MeltdownReading Kernel Memory from User Space In 27th USENIX SecuritySymposium (USENIX Security 18)

[62] Fangfei Liu Yuval Yarom Qian Ge Gernot Heiser and Ruby BLee 2015 Last-Level Cache Side-Channel Attacks Are PracticalIn Proceedings of the 2015 IEEE Symposium on Security and Privacy(SP rsquo15) IEEE Computer Society Washington DC USA 605ndash622httpsdoiorg101109SP201543

[63] Jason Lowe-Power 2018 Visualizing Spectre with gem5 httpwwwlowepowercomjasonvisualizing-spectre-with-gem5html Onlineaccessed Jun 2018

[64] Jason Lowe-Power Venkatesh Akella Matthew K Farrens Samuel TKing and Christopher J Nitta 2018 Position Paper A Case forExposing Extra-architectural State in the ISA In Proceedings of the7th International Workshop on Hardware and Architectural Support forSecurity and Privacy (HASP rsquo18) ACM New York NY USA Article 86 pages httpsdoiorg10114532142923214300

[65] Scott A Mahlke David C Lin William Y Chen Richard E Hank andRoger A Bringmann 1992 Effective compiler support for predicatedexecution using the hyperblock In MICRO

[66] Giorgi Maisuradze and Christian Rossow 2018 Ret2Spec SpeculativeExecution Using Return Stack Buffers In Proceedings of the 2018 ACMSIGSAC Conference on Computer and Communications Security (CCSrsquo18) ACM New York NY USA 2109ndash2122 httpsdoiorg10114532437343243761

[67] Microsoft Security Response Center (MSRC) Matt Miller 2018Analysis and mitigation of speculative store bypass (CVE-2018-3639) httpsblogstechnetmicrosoftcomsrd20180521analysis-and-mitigation-of-speculative-store-bypass-cve-2018-3639 Onlineaccessed Jul 2018

[68] Keaton Mowery Sriram Keelveedhi and Hovav Shacham 2012 AreAES x86 cache timing attacks still feasible In Proceedings of the 2012ACM Workshop on Cloud computing security workshop

[69] Oleksii Oleksenko Bohdan Trach Tobias Reiher Mark Silbersteinand Christof Fetzer 2018 You Shall Not Bypass Employing datadependencies to prevent Bounds Check Bypass CoRR abs180508506(2018) arXiv180508506 httparxivorgabs180508506

[70] Yossef Oren Vasileios P Kemerlis Simha Sethumadhavan and An-gelos D Keromytis 2015 The Spy in the Sandbox Practical CacheAttacks in JavaScript and Their Implications In Proceedings of the22Nd ACM SIGSAC Conference on Computer and CommunicationsSecurity

[71] Guilherme Ottoni Thomas Hartin ChristopherWeaver Jason BrandtBelliappa Kuttanna and HongWang 2011 Harmonia A TransparentEfficient and Harmonious Dynamic Binary Translator Targeting theIntelampReg Architecture In Proceedings of the 8th ACM InternationalConference on Computing Frontiers (CF rsquo11) ACM New York NYUSA Article 26 10 pages httpsdoiorg10114520166042016635

[72] Andrew Pardoe 2018 Spectre mitigations in MSVC httpsblogsmsdnmicrosoftcomvcblog20180115spectre-mitigations-in-msvc Online accessed Jul 2018

[73] Joseph CH Park and Mike Schlansker 1991 On predicated executionHewlett-Packard Laboratories Palo Alto California

[74] Will Reese 2008 Nginx The High-performance Web Server andReverse Proxy Linux J 2008 173 Article 2 (Sept 2008) httpdlacmorgcitationcfmid=14122021412204

[75] Ryan Roemer Erik Buchanan Hovav Shacham and Stefan Savage2012 Return-oriented programming Systems languages and ap-plications ACM Transactions on Information and System Security(2012)

[76] Michael Schwarz Martin Schwarzl Moritz Lipp and Daniel Gruss2018 NetSpectre Read Arbitrary Memory over Network arXivpreprint arXiv180710535 (2018)

[77] Hovav Shacham Matthew Page Ben Pfaff Eu-Jin Goh NagendraModadugu and Dan Boneh 2004 On the Effectiveness of Address-space Randomization In Proceedings of the 11th ACM Conference onComputer and Communications Security (CCS rsquo04) ACM New YorkNY USA 298ndash307 httpsdoiorg10114510300831030124

[78] Julian Stecklina and Thomas Prescher 2018 LazyFP Leaking FPURegister State using Microarchitectural Side-Channels arXiv preprintarXiv180607480 (2018)

[79] G Edward Suh Jae W Lee David Zhang and Srinivas Devadas2004 Secure Program Execution via Dynamic Information FlowTracking In Proceedings of the 11th International Conference onArchitectural Support for Programming Languages and OperatingSystems (ASPLOS XI) ACM New York NY USA 85ndash96 httpsdoiorg10114510243931024404

[80] Mohammadkazem Taram Ashish Venkat and Dean Tullsen 2018Mobilizing the Micro-Ops Exploiting Context Sensitive Decodingfor Security and Energy Efficiency In Proceedings of the 45th AnnualInternational Symposium on Computer Architecture (ISCA rsquo18)

[81] Mohit Tiwari Hassan MG Wassel Bita Mazloom ShashidharMysore Frederic T Chong and Timothy Sherwood 2009 Com-plete Information Flow Tracking from the Gates Up In Proceedings ofthe 14th International Conference on Architectural Support for Program-ming Languages and Operating Systems (ASPLOS XIV) ACM NewYork NY USA 109ndash120 httpsdoiorg10114515082441508258

[82] Caroline Trippel Daniel Lustig and Margaret Martonosi 2018MeltdownPrime and SpectrePrime Automatically-Synthesized At-tacks Exploiting Invalidation-Based Coherence Protocols CoRRabs180203802 (2018) arXiv180203802 httparxivorgabs180203802

[83] DeanM Tullsen Susan J Eggers Joel S Emer Henry M Levy Jack L Loand Rebecca L Stamm 1996 Exploiting choice Instruction fetch andissue on an implementable simultaneous multithreading processor InACM SIGARCH Computer Architecture News Vol 24 ACM 191ndash202

[84] Dean M Tullsen Susan J Eggers and Henry M Levy 1995 Si-multaneous Multithreading Maximizing On-chip Parallelism InProceedings of the 22Nd Annual International Symposium on Com-puter Architecture (ISCA rsquo95) ACM New York NY USA 392ndash403httpsdoiorg101145223982224449

[85] Paul Turner 2018 Retpoline a software construct for preventingbranch-target-injection httpssupportgooglecomfaqsanswer7625886 Online accessed Jul 2018

[86] N Vachharajani M J Bridges J Chang R Rangan G Ottoni J ABlome G A Reis M Vachharajani and D I August 2004 RIFLEAn Architectural Framework for User-Centric Information-Flow Se-curity In Microarchitecture 2004 MICRO-37 2004 37th InternationalSymposium on 243ndash254 httpsdoiorg101109MICRO200431

[87] Jo Van Bulck Marina Minkin Ofir Weisse Daniel Genkin BarisKasikci Frank Piessens Mark Silberstein Thomas F Wenisch Yu-val Yarom and Raoul Strackx 2018 Foreshadow Extracting theKeys to the Intel SGX Kingdom with Transient Out-of-Order Execu-tion In Proceedings of the 27th USENIX Security Symposium USENIXAssociation See also technical report Foreshadow-NG [95]

[88] Ashish Venkat 2018 Breaking the ISA Barrier in Modern ComputingPhD Dissertation UC San Diego

[89] Ashish Venkat Arvind Krishnaswamy Koichi Yamada and RajanPalanivel 2015 Binary Translation driven Program State RelocationIn United States Patent Grant US009135435B2

[90] Ashish Venkat S Shamasunder Hovav Shacham and DeanM Tullsen2016 HIPStR Heterogeneous-ISA Program State Relocation In Pro-ceedings of the International Symposium on Architectural Support forProgramming Languages and Operating Systems

[91] Ashish Venkat and Dean M Tullsen 2014 Harnessing ISA diversityDesign of a heterogeneous-ISA chip multiprocessor In Proceedingsof the International Symposium on Computer Architecture

[92] G Venkataramani I Doudalis Y Solihin and M Prvulovic 2008FlexiTaint A programmable accelerator for dynamic taint propaga-tion In 2008 IEEE 14th International Symposium on High PerformanceComputer Architecture 173ndash184 httpsdoiorg101109HPCA20084658637

[93] Philipp Vogt Florian Nentwich Nenad Jovanovic Engin KirdaChristopher Kruegel and Giovanni Vigna 2007 Cross Site Script-ing Prevention with Dynamic Data Tainting and Static Analysis InNDSS Vol 2007 12

[94] Zhenghong Wang and Ruby B Lee 2006 Covert and side channelsdue to processor architecture In Computer Security Applications Con-ference 2006 ACSACrsquo06 22nd Annual IEEE 473ndash482

[95] Ofir Weisse Jo Van Bulck Marina Minkin Daniel Genkin BarisKasikci Frank Piessens Mark Silberstein Raoul Strackx Thomas FWenisch and Yuval Yarom 2018 Foreshadow-NG Breaking theVirtual Memory Abstraction with Transient Out-of-Order Execu-tion Technical report (2018) See also USENIX Security paper Fore-shadow [87]

[96] Troy Wolverton 2018 Spectre and Meltdown are now a le-gal pain for Intel the chip maker faces 35 lawsuits over the at-tacks httpswwwbusinessinsidercom35-lawsuits-have-been-filed-against-intel-over-spectre-and-meltdown-2018-2 Online ac-cessed Aug 2018

[97] Zhenyu Wu Zhang Xu and Haining Wang 2012 Whispers in theHyper-space High-speed Covert Channel Attacks in the Cloud InPresented as part of the 21st USENIX Security Symposium (USENIX Se-curity 12) USENIX Bellevue WA 159ndash173 httpswwwusenixorgconferenceusenixsecurity12technical-sessionspresentationwu

[98] Wei Xu Sandeep Bhatkar and Ramachandran Sekar 2006 Taint-Enhanced Policy Enforcement A Practical Approach to Defeat aWide Range of Attacks In USENIX Security Symposium 121ndash136

[99] M Yan J Choi D Skarlatos A Morrison C Fletcher and J Torrellas2018 InvisiSpec Making Speculative Execution Invisible in the CacheHierarchy In 2018 51st Annual IEEEACM International Symposium onMicroarchitecture (MICRO) 428ndash441 httpsdoiorg101109MICRO201800042

[100] Yuval Yarom and Katrina Falkner 2014 FLUSH+ RELOAD A HighResolution Low Noise L3 Cache Side-Channel Attack In USENIXSecurity

[101] Yuval Yarom Daniel Genkin and Nadia Heninger 2016 CacheBleedA timing attack on OpenSSL constant time RSA In InternationalConference on Cryptographic Hardware and Embedded Systems

[102] Alexander Yip Xi Wang Nickolai Zeldovich and M Frans Kaashoek2009 Improving Application Security with Data Flow AssertionsIn Proceedings of the ACM SIGOPS 22Nd Symposium on OperatingSystems Principles (SOSP rsquo09) ACM New York NY USA 291ndash304httpsdoiorg10114516295751629604

[103] Craig Zilles 2001 Linked List Traversal Micro-Benchmark httpzillescsillinoisedullubenchmarkhtml Online accessed Jun 2018

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
  • 3 Assumptions and Threat Model
  • 4 Architectural Overview
  • 5 Design and Implementation
    • 51 Microcode Customization
    • 52 Decoder-Level Information Flow Tracking
    • 53 Mitigations for Control-flow Mistraining
      • 6 Methodology
      • 7 Evaluation
        • 71 Security Discussion
        • 72 Performance
          • 8 Conclusion
          • Acknowledgments
          • References
Page 8: Context-Sensitive Fencing: Securing Speculative Execution ...av6ds/papers/asplos2019.pdf · Context-Sensitive Fencing: Securing Speculative Execution via Microcode Customization Mohammadkazem

high impact variant-1 attacks by preventing informationleaks along misspeculated paths through cache-based sidechannels We describe each of them in greater detail below

The LSQ-LFENCE and the LSQ-MFENCE are relaxed fencesenforced at the loadstore queue While LSQ-LFENCE fenceis in effect it does not allow any subsequent load instructionto be issued out of the loadstore queue thereby preventingthe cache state from being changed by load instructions onmisspeculated paths Thus the LSQ-LFENCE mitigates theSpectre variant-1 attack On the other hand the more restric-tive LSQ-MFENCE does not allow any subsequent memoryinstruction (both loads and stores) to be issued out of theloadstore queue until the fence commits In addition tomitigating the variant-1 attack that the LSQ-LFENCE pro-tects against the LSQ-MFENCE mitigates the variant-11and variant-12 attacks that exploit store-to-load forwardingbetween speculative loads and stores

The CFENCE is a relaxed fence and is enforced at the cachecontroller level using the following set of rules First like anyother fence it allows all preceding instructions to proceedSecond since store instructions do not commit the contentsof the write buffer until the instruction retires they are unaf-fected by the CFENCE Finally it labels any subsequent loadas a non-modifying load and allows it to pass through thefence but the load is restricted from modifying the cachestate In particular a non-modifying load that results in acache hit is allowed to read the contents of the cache but isrestricted from changing the LRU and other metadata bitsA non-modifying load that results in a cache miss is markedas uncacheable allowing the memory read request to com-plete without altering the cache state In this way we avoidupdating the cache state upon encountering a speculativeload and donrsquot leave any observable cache footprints alongmisspeculated paths thereby mitigating the variant Spectrevariant-1 attack In addition due to locality of referencesthe miss rate of a reasonable program is typically very lowand therefore using CFENCE results in considerably lowerperformance overhead than other types of fencesNote that this can be applied recursively at each cache

level For example an L2 hit (L1 miss) will bypass the L1cache and not initiate a fill then read from the L2 cachewithout altering the LRU bits

513 Fence Frequency OptimizationThe most naiumlve yet secure way to insert fences is to liberallyinstrument every instruction of a vulnerable type (eg loadinstructions in the case of cache side-channels) in the pro-gram and add the fence micro-op to all of them In fact notevery instance of a vulnerable instruction type is necessarilyvulnerable for example all the loads in the program arenrsquotvulnerable to speculative attacks via cache side-channelsTherefore we can reduce the number of fences insertedHowever since failing to insert fences even in one scenariowould enable a Spectre attacker to read the whole victimrsquos

memory space it is of crucial importance that fence fre-quency optimizations be conducted meticulously In the fol-lowing we introduce two secure optimizations for reducingthe number of fencesBasic Block-Level Fence InsertionThe source of the Spec-tre attack is dynamic control speculation which implies thatthe speculation begins with a branch prediction and theprocessor starts speculatively executing along the predictedpath To fully mitigate this attack we want a fence betweeneach branch and subsequent loads but if one branch is fol-lowed by four loads we only need one fence to protect allfour Thus in this optimization we propose to only instru-ment (provide the alternate decoding for) the first instanceof a vulnerable instruction type (eg first load) of each ba-sic block and it is safe to leave the rest of the instructionsuninstrumented This is simply implemented by setting aflag in hardware whenever a branch is decoded then inserta micro-op in the alternative load decoding that along withthe fence also resets the flag When the flag is not set theoriginal decoding is usedTaint-Based Fence Insertion However even one load perbasic block is likely still too conservative In all known in-stances of the attacks the attacker performs some operations(mostly memory read) based on untrusted data that leadsto the information leak For example in Spectre variant-1the attacker provides an untrusted out-of-bound index to anarray In this work we assume any information that comesfrom the user address space and input devices (eg via thex86 IN instruction) as untrusted In addition we also con-sider DMArsquod pages as untrusted for which we rely on theIOMMU [5] to mark DMArsquod pages as tainted in the pagetable For the taint-based fence insertion optimization wepropose the insertion of fences for only vulnerabletaintedloads that operate on untrusted data by leveraging a noveldecoder-level information flow tracking (DLIFT) strategydescribed in the following section

52 Decoder-Level Information Flow TrackingAs its name indicates the distinguishing feature that makesDecoder-Level Information Flow Tracking (DLIFT) uniquefrom typical hardware taint tracking systems is that it canprovide the taint information at the decoder stage of thepipeline rather than at commit stage This is a critical distinc-tion for two reasons First of course our solution is decoder-based Second and perhaps more important execute- orcommit-based taint tracking comes too late in the pipelinefor any speculation-based attack and is of little useHowever commit is still the only stage where taint in-

formation is guaranteed to be correct so the design of adecode-based taint tracker is challenging In particular sincethe front-end of the pipeline typically runs far ahead in ex-ecution than the rest of the pipeline reading actual taintsfrom register files at the decode stage will read inaccuratevalues Therefore in the proposed DLIFT framework we

Macro-to-microop Translation

Unit

CSD-Enabled Decoder

Dis

patc

her

FetchTaint

Evaluator

Load Store Queue

2

Reg File

Actu

al T

aint

Bits

3

4

5

TLB

Tain

t Bitm

ap1

ADD rax rbx

raxSpecTaint |= rbxSpecTaint

Mistaint Recovery -Flushing the pipeline -Restarting translation and execution -Recovering speculative taint map

EA T

aint

CommitUnit

FetchDecode Execute

Speculative Taint Map

~Spec_Taint ampamp

Evaluated_Taint

True

Commit

Arch TaintMap

Arch RF

Figure 5 DLIFT integration with a CSD-enabled pipeline

separate the taint information into four taint structures ndash (a)a decoder-level taint map that tracks and maintains the taintinformation for architectural registers (b) the physical reg-ister file augmented with taint information that maintainsdynamically computed taint information at execute (c) theTLB and page tables augmented with a taint bitmap to trackcache block-level taint information and (d) a commit-leveltaint map that maintains verified architectural register taintinformation While the latter three structures are typical ofany standard DIFT implementation [79] the first structureis a new addition proposed by this work Instruction trans-lation thus depends on some speculative taint tracking thatmight not be exact potentially leaving vulnerable instruc-tions unfenced (ie translating to non-secure micro-ops) Forthis reason DLIFT relies on a mistaint recovery mechanismthat upon detecting a mistaint at the execution stage redi-rects and restarts the execution from the incorrectly taintedinstructionFigure 5 shows the integration of DLIFT with a CSD-

enabled pipeline The DLIFT engine maintains a taint mapthat stores a speculative taint bit for each architectural reg-ister For each incoming instruction (bull1 ) the DLIFT engineevaluates the taint of the destination register(s) based on thespeculative taint bits of the source registers Taint evaluationfollows the standard DIFT procedure [79] If a source regis-ter is marked as tainted in the speculative taint map DLIFTpropagates this taint to the destination register and furthertriggers context-sensitive translation (bull2 ) of the instruction(eg with speculative fence insertion) depending upon theconfigured levels of performance and security

Tracking taints at the decoder level is not always straight-forward First during speculative execution the DLIFT en-gine continues to propagate taints along the misspeculatedpath potentially leaving the taint map in an inconsistentstate upon recovery from a branch misprediction To thisend the DLIFT engine rolls back the state of the taint map

to its commit-level counterpart as part of the branch mis-prediction recovery Second the effective addresses of loadinstructions is usually not known at the decoder stage dueto which the DLIFT engine cannot accurately compute andpropagate taint information A conservative approach thatmarks all loads with unknown effective addresses as taintedcould severely degrade performance since overtainting typi-cally results in a high frequency of fence insertion Thereforewe take a more optimistic approach by assuming loads withunknown effective addresses are untainted and further relyon our mistaint detection mechanism at the execute stageto validate the predicted taints against the dynamically eval-uated taints (bull3 andbull5 ) In the event a mistaint is detectedwe update the speculative taint map flush the pipeline andrestart the translation and execution of the mistainted in-struction Note that we only perform mistaint recovery forthe undertainting scenario since it could potentially leavevulnerable instructions unfenced

Furthermore the DLIFT engine is capable of following in-struction sequences and execution patterns such as a taintedload followed by a branch allowing for the seamless de-tection of different Spectre gadget variants In this way al-though speculative the DLIFT engine has the potential forfast recovery from misspeculation and further allows themicrocode customization framework to perform a more tar-geted and surgical insertion of fences for particular vulnera-ble targets

53 Mitigations for Control-flow MistrainingTwo Spectre variants (v2 and v5) rely on the mistrainingof the branch predictor and the return stack buffer to influ-ence the victimrsquos branch outcome across different protec-tion domains To mitigate these attacks we examine severalhardening mechanisms In the simplest case where we areprotecting kernel branches from getting influenced we caninstrument syscall instructions to enforce branch predictorstate isolation If we want to protect a wider range of domaincrossings we could rely on a simple hardware mechanismthat identifies control transfer to a domain with a higherprivilege and then sets a flag The first control flow instruc-tion that decodes after that flag is set would then triggeran alternate decoding and reset the flag The alternate de-coding would insert a fence (to protect against this branchbeing mispredicted) then execute micro-ops that enforcebranch predictor state isolation and subsequently resumethe original control flowThis solution then is flexible enough that any region of

code could be protected in this way by configuringmodel spe-cific range registers (MSRRs) to indicate a protected regionthat would always reset the branch predictor upon entrythis then even prevents mistraining of sensitive code ofdifferent threads within the same process To prevent theMSRRs from being tampered by the attacker we only allow

Table 4 Architecture Detail for the Baseline x86 CoreBaseline Processor

Frequency 33 GHz I cache 32 KB 8 wayFetch width 4 fused uops D cache 32 KB 8 wayIssue width 6 unfused uops ROB size 168 entriesINTFP Regfile 160144 regs IQ 54 entriesRAS size 816 32 entries BTB size 256 512 1024 entriesLQSQ size 6436 entries Functional Int ALU(6) Mult(1)Branch Predictor LTAGE Units FP ALUMult(2) SIMD(2)

the operating system to configure MSRRs and we use specu-lative fences to guard x86rsquosWRMSR instruction Furthermorein the case of remapping via mmap() the operating systemis also responsible for reconfiguring MSRRs

The easiest and the most heavyweight way to isolate theBP state is to clear all the states by a series of returnbranchmicro-ops However such a hard reset of the BP can be morequickly and efficiently performed using a special micro-opthat clears some or all state of the branch predictor therebyenabling custom solutions to reset different structures withinthe prediction unit if such a micro-op is available In thiswork we assume a special micro-op to clear the branch targetbuffer (BTB) and the return stack buffer (RSB) the primarytargets of the Spectre attackIt should be noted that while resetting the BTB and the

RSB will induce more mispredictions (something the attackerwants) it does the attacker no good with respect to theseSpectre variants because the attacker can no longer controlthe target of the control flow misspeculation

6 MethodologyThis section details the experimental methodology for theperformance and security evaluation of the context-sensitivefencing frameworkOur baseline processor is modeled after the Intel Sandy-

bridge microarchitecture [43] Table 4 shows the architec-tural configuration of our baseline processor in more detailNote that we evaluate against three different branch pre-dictor configurations (with different BTB and RAS sizes) tomeasure the performance impact of our micro-op flows thatperform branch predictor state isolation across protection do-mains to defend against variant-2Wemodel this architectureusing the gem5 [8] architectural simulator which already fea-tures x86 micro-op translation Furthermore gem5 alreadyfeatures a Spectre test infrastructure and a visualization toolthat allow us to validate our claims regarding the securityguarantees of context-sensitive fencing [63]One of the primary goals of this work is to protect ker-

nel memory from being leaked along misspeculated pathsTherefore we use the full system simulation mode of gem5which allows us to boot an Ubuntu 1804 distribution ofLinux with a kernel version of 4813 Furthermore in orderto provide realistic estimates regarding attack coverage andperformance impact of our proposed techniques we selecta good mix of benchmarks that spend different amounts ofexecution time in kernel mode touch different aspects of the

Table 5 Benchmarks DescriptionBenchmark Description Kernel-Time

nginx HTTP Web Server 66ps Process information query 75ping Sends ICMP ECHO_REQUEST to loopback 95ls Performs two level directory listing 79llu Linked list traversal micro-benchmark 7

bzip2 Compression 34gcc C Language optimizing compiler 11

omnetpp Discrete event simulation 15

sjeng Artificial Intelligence (game tree searchand pattern recognition) 22

operating system and perform different number of systemcalls during execution as illustrated in Table 5 To this endwe include four benchmarks bzip2 gcc sjeng omnet fromthe SPEC CPU2006 suite [41] that exhibit varying degreesof instruction-level parallelism Furthermore we use threecommonly used Unix tools that target different functional-ities of the kernel and use different device drivers ndash (ping)that sends and receives ICMP packets (ps aux) that queriesprocess information and (ls ) that queries the filesystemin order to list multi-level directory information In additionwe also add a memory-intensive program llu that allocatesa large amount of memory for a linked list and traversesthe list making random memory accesses [103] Finally wealso evaluate the impact of our fences on the nginx webserver [74] using the wrk [35] framework to generate HTTPrequests

7 EvaluationIn this section we evaluate both security and performanceimpacts of the proposed strategy starting first with securityassessment and following that with performance evaluation

71 Security DiscussionThe proposed defense strategy can mitigate five variants ofthe Spectre attacks We discuss these in detail below

Variants 1 11 and 12 The goal of these variants is toleak memory contents of a victim along a misspseculatedpath by bypassing a bounds check andor further stitch to-gether such Spectre gadgets via store-to-load forwardingAll variants exploit a data cache-based side channel to leakinformation Context-Sensitive Fencing mitigates these vari-ants primarily by preventing information from being leakedalong misspeculated paths via fences that are surgically in-serted into the instruction stream and strategically placedbetween the conditional branch that performs the boundscheck and the first load in the Spectre gadget This workproposes three new fences ndash LSQ-LFENCE LSQ-MFENCEand CFENCE that each defend against the variant-1 attack bydisallowing loads along misspeculated paths to modify thedata cache state LSQ-MFENCE additionally protects againstthe variants 11 and 12 by preventing both speculative loadsand stores from being issued out of the loadstore queue

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

Injected for All Kernel Loads

100

105

110

115

120

125

130

135

140

145

150

155

160

165

170

175

180

185

190

195

200

Nor

mal

ized

Exe

cution

Tim

e

LFENCE-LC

LSQ-MFENCE-LC

CFENCE-LC

Figure 6 Execution Time of Different Fence EnforcementLevels (normalized to insecure execution)

effectively avoiding store-to-load forwarding between spec-ulative loads and stores

The surgical insertion of these fences is facilitated by thedecoder-level information flow tracker (DLIFT) that can fol-low instruction sequences that could serve as Spectre gad-gets as described in Section 52 While the DLIFT itself per-forms speculative tracking of taint information it has theability to detect mistaints and recover within three stages ofthe pipeline We further experimentally show later in thissection that the speculative nature of the DLIFT engine maysometimes result in overtainting (where trusted operand-sinstructions get marked as tainted) but never results in anundetected and unrecovered undertainting (where untrustedoperandsinstructions remain untainted)

Variants 2 and 5 These variants reverse-engineer andmistrain the branch predictor and the return address stack toinfluence the branch outcomes of a victim which in our caseis the Linux kernel In addition to the surgical fence injec-tion and speculative taint tracking context-sensitive fencingfurther employs mistraining mitigations that circumventsuch attempts More specifically we intercept all protectiondomain crossings via syscall and int 0x80 instructions andperform a full reset of the branch target buffer (BTB) and thereturn stack buffer (RSB) clearing all previous state Thisprevents user code from influencing branch outcomes of thekernel code We record no instance of BTB collision betweenuser and kernel branches and we ensure that we alwaysstart with a clean BTB and RSB upon entering kernel mode

To show that our different fences and frequency optimiza-tions close Spectre attacks we use a proof of concept Spectreimplementation and visualization tool [63] The attacks failin every case when we use context-sensitive fencing to insertfences despite our performance optimizations

72 PerformanceFigure 6 measures the performance impact of three differentfences ndash (a) the standard x86 LFENCE (b) the LSQ-MFENCE

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

Injected for All Kernel Loads

100

105

110

115

120

125

130

135

140

145

Nor

mal

ized

Exe

cution

Tim

e

CFENCE-LC

CFENCE-EC

Figure 7 Execution Time of Early and Late Commit ofCFENCE (normalized to insecure execution)

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

00

01

02

03

04

05

06

07

08

09

10

Rat

ioO

ver

All

Loa

ds

bzip2 gcc omnet llu nginx sjeng ls ping ps

Non-Modifying Misses Non-Modifying Hits Misses Hits

Figure 8 Effects of injecting CFENCE on Cache Miss Rate

and (c) the CFENCE all enforced with the standard latecommit pessimistically inserted for every kernel load in theprogram Clearly from the figure the CFENCE incurs thelowest performance overhead since it is less restrictive thanthe other two and is enforced at a much later pipeline stageWe further study the effect of late and early commit policiesfor the CFENCE as shown in Figure 7 The early commitversion of the CFENCE consistently performs better than thelate commit version saving about 4 in overall executiontime on averageOverall the CFENCE reduces the incurred performance

overhead due to fencing by 23X bringing down the exe-cution time overhead from 48 to 21 Furthermore thisperformance improvement is consistent across all the bench-marks the only exception being llu which performs randommemory accesses due to the linked list traversal and suffersfrom a very high cache miss rateIn Figure 8 we study the effect of the CFENCE on cache

miss rate Recall that when a load passes a CFENCE it getsmarked as non-modifying and as a result if it ends up beinga miss in the cache it is deemed uncacheable and therefore

the fetched cache block isnrsquot brought into cache If the missrate of a program is already high as in the case of llu thepresence of a CFENCE in the instruction stream could po-tentially result in under-utilization of the cache since themissed blocks arenrsquot being filled back into the cache whilethe CFENCE is begin enforced For programs that have un-usually low hit rates we suggest using the standard LFENCEinstead of the CFENCE However in those cases it should benoted that the overhead of the mitigation is low regardlessof the fence usedFurthermore it is important to note that the number of

non-modifying loads in the dynamic instruction stream donot necessarily have a negative impact on performancewhile a CFENCE is being enforced For example while theping program has the most non-modifying loads it doesnot suffer much in terms of overall performance becausethe working set fits well into the cache and most of thenon-modifying accesses end up being hits In general wegain more from employing the CFENCE instead of standardserializing fences when the cache hit rate is lowFigure 9 shows the performance of our proposed fence

frequency optimization techniques In the first optimizationwe inject the CFENCE only for tainted loads and branches asindicated by the DLIFT engine This reduces the performanceoverhead of the defense from 21 to 11 on average We fur-ther optimize the number of fences inserted by performingbasic block-level fence insertion where we only instrumentthe first instance of a vulnerable load in each basic block Thisresults in an additional 4 improvement in performance AsFigure 9 shows the once per basic block opitimization is suc-cessful at improving the performance of all the benchmarksexcept llu That is because CFENCE changes the cache accesspattern usually at some cost but occasionally beneficiallyby bypassing accesses and reducing pressure on the cacheIn such a case like llu with its large working set executingfewer fences would then be slightly less beneficial Overallcompared to the state-of-the-art fence injection scheme thatpessimistically injects an LFENCE for all kernel loads ourDLIFT-Based CFENCE injection reduces the performanceoverhead from 48 to just 77 on averageWe next study the accuracy and coverage of our DLIFT

implementation Figure 10 examines overtainting scenarioswhere the DLIFT engine conservatively marks instructionsas tainted when theyrsquore actually not The results are nor-malized to the total number of dynamic load instructionsexecuted Further we examine two scenarios ndash the bars tothe left represent DLIFT-based fencing without the basic-block level fence insertion and the bars to the right includesthe basic-block level fence insertion optimization In mostcases the percentage of overtainted loads remains low inboth scenariosFigure 11 examines undertainting scenarios where the

DLIFT engine optimistically forgoes instrumenting certaininstructions that are actually vulnerable but recovers from

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

CFENCE

100

105

110

115

120

125

130

135

140

145

150

155

160

Nor

mal

ized

Exe

cution

Tim

e

Always Fencing

DLIFT-Based Fencing

DLIFT + Once per BB

Figure 9 Impact of Fence Frequency Optimizations on Exe-cution Time

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

00

01

02

03

04

05

06

07

08

09

10

Rat

ioO

ver

All

Loa

ds

ping ps nginx bzip2 ls sjeng llu omnet gcc

Over-tainted Loads Tainted Loads

Figure 10 Accuracy of DLIFT Overtainting Rate of Loads

such scenarios later in the pipeline We observe that thepercentage of undertainted loads is low and consistentlybelow 10 with the outliers being the kernel-heavy applica-tions ping and ps which have the highest number of taintedinstructions (instructions with at least one tainted sourceregister) as shown in Figure 12Note that our DLIFT engine performs explicit informa-

tion flow tracking modeled after DIFT [79] that tracks copyload-address store-address and computational dependen-cies However we also evaluate CSF with a more conser-vative implicit information flow tracking model in whichwe taint the program counter when the branch outcomedepends upon a tainted value and further track both sidesof the branch This results in more tainted instructions andincurs 118 extra performance overhead on top of our taint-based CFENCE insertionTo evaluate the effect of our mistraining mitigations we

measure the impact of a full micro-op based BTB and RSBreset at every protection domain crossing We do this ex-periment on three separate branch predictor configurations(a) a small predictor with 256 BTB and 8 RAS entries (b) a

pssje

ngbz

ip2

ping llu gc

c ls

nginx

omne

t

Speculative DIFT ndash Sfence

000

001

002

003

004

005

006

007

008

009

Rat

ioof

Under

-tai

nte

dLoa

ds

Figure 11 Coverage of DLIFT Undertainting Rate of Loads

omnet bzip2 llu ls nginx ping ps sjeng gcc

Speculative DIFT ndash Sfence

00

01

02

03

04

05

06

07

08

09

10

Tai

nte

dIn

tstr

uct

ions

Tainted Untatinted

Figure 12 Ratio of instructions marked as tainted by DLIFT

medium predictor with 512 BTB and 16 RAS entries and(c) a large predictor with 1024 BTB and 32 RAS entries Wemeasure an average performance degradation of 27 on oursmall predictor 66 on our medium predictor and 152 onour large predictor Naturally the overhead is almost com-pletely due to the loss of prediction accuracy rather thanthe cost of micro-op expansionIn summary the proposed defense strategy introduces a

flexible microcode customization framework that performthe surgical insertion of newly introduced speculation fencesthat mitigate five variants of the Spectre class of attacksreducing the fencing overhead of state-of-the-art fence-basedSpectre mitigations by a factor of 6

8 ConclusionIn this work we propose context-sensitive fencing (CSF) aset of architectural techniques that provide high-performancedefense against Spectre-style attacks In particular we showthat we can reduce fencing overhead by a factor of 6 com-pared to a conservative fence insertion method

This is done by injecting fence instructions dynamicallyin the decoder with no recompilation and binary translationrequired This allows us to employ runtime information tostrategically insert fences when needed using taint infor-mation and avoiding redundant fences after branches Thiswork also introduces new fence primitives which protectsensitive structures from speculation-based microarchitec-tural effects with minimal impact on instruction throughputin the pipeline

AcknowledgmentsThe authors would like to thank the anonymous reviewersfor their helpful insights This research was supported inpart by NSF Grant CNS-1652925 NSFIntel FoundationalMicroarchitecture Research Grant CCF-1823444 and DARPAunder the agreement number HR0011-18-C-0020

References[1] Onur Acıiccedilmez Ccediletin Kaya Koccedil and Jean-Pierre Seifert 2007 Predict-

ing secret keys via branch prediction In CryptographersacircĂŹ Track atthe RSA Conference

[2] Onur Acıiccedilmez Werner Schindler and Ccediletin K Koccedil 2007 Cachebased remote timing attack on the AES In CryptographersacircĂŹ Trackat the RSA Conference

[3] Shaizeen Aga and Satish Narayanasamy 2017 InvisiMem SmartMemory Defenses for Memory Bus Side Channel In Proceedings ofthe 44th Annual International Symposium on Computer Architecture(ISCA rsquo17) ACM New York NY USA 94ndash106 httpsdoiorg10114530798563080232

[4] AMD 2018 White paper SOFTWARE TECHNIQUES FOR MANAG-ING SPECULATION ON AMD PROCESSORS Technical Report RE-VISION 12418 httpsdeveloperamdcomwp-contentresourcesManaging-Speculation-on-AMD-Processorspdf

[5] Nadav Amit Muli Ben-Yehuda Dan Tsafrir and Assaf Schuster 2011vIOMMU Efficient IOMMU Emulation In USENIX Annual TechnicalConference

[6] ARM 2018 Whitepaper Cache Speculation Side-channelsTechnical Report Version 22 httpsdeveloperarmcom-mediadeveloperpdfSecurity20update201020July2018CacheSpeculationSide-channels-v22pdfrevision=7de26366-a49f-4c23-85f0-34e8f5e38881

[7] Eli Biham and Adi Shamir 1997 Differential fault analysis of secretkey cryptosystems In Annual International Cryptology Conference

[8] Nathan L Binkert Ronald G Dreslinski Lisa R Hsu Kevin T Lim Ali GSaidi and Steven K Reinhardt 2006 The M5 Simulator ModelingNetworked Systems Micro IEEE (2006)

[9] Darrell Boggs Gary Brown Nathan Tuck and K S Venkatraman2015 Denver Nvidiarsquos First 64-bit ARM Processor IEEE Micro (2015)

[10] Chandler Carruth 2018 Mitigating Speculative Attacks inCrypto httpsgithubcomHACS-workshopspectre-mitigationsblobmastercryptoguidelinesmd Online accessed Jul 2018

[11] Chandler Carruth 2018 RFC Speculative Load Hardening (aSpectre variant1 mitigation) httpsdocsgooglecomdocumentd1wwcfv3UV9ZnZVcGiGuoITT61eKo3TmoCS3uXLcJR0edit Onlineaccessed Jul 2018

[12] Chandler Carruth 2018 RFC Speculative Load Hardening (a Spectrevariant1 mitigation) httpslistsllvmorgpipermailllvm-dev2018-March122085html Online accessed Jul 2018

[13] Guoxing Chen Sanchuan Chen Yuan Xiao Yinqian Zhang ZhiqiangLin and Ten H Lai 2018 SgxPectre Attacks Leaking Enclave

Secrets via Speculative Execution CoRR abs180209085 (2018)arXiv180209085 httparxivorgabs180209085

[14] H Chen X Wu L Yuan B Zang P c Yew and F T Chong 2008From Speculation to Security Practical and Efficient InformationFlow Tracking Using Speculative Hardware In 2008 InternationalSymposium on Computer Architecture 401ndash412 httpsdoiorg101109ISCA200818

[15] Shimin Chen Phillip B GibbonsMichael Kozuch and Todd CMowry2011 Log-based Architectures Using Multicore to Help SoftwareBehave Correctly SIGOPS Oper Syst Rev 45 1 (Feb 2011) 84ndash91httpsdoiorg10114519450231945034

[16] Hari Cherupalli Henry Duwe Weidong Ye Rakesh Kumar and JohnSartori 2017 Software-based Gate-level Information Flow Securityfor IoT Systems In Proceedings of the 50th Annual IEEEACM Interna-tional Symposium on Microarchitecture (MICRO-50 rsquo17) ACM NewYork NY USA 328ndash340 httpsdoiorg10114531239393123955

[17] JonathanCorbet 2017 The current state of kernel page-table isolationhttpslwnnetArticles741878 Online accessed Jan 2018

[18] Marc L Corliss E Christopher Lewis and Amir Roth 2003 DISE Aprogrammable macro engine for customizing applications In Com-puter Architecture 2003 Proceedings 30th Annual International Sym-posium on

[19] Marc L Corliss E Christopher Lewis and Amir Roth 2005 Low-overhead interactive debugging via dynamic instrumentation withDISE In 11th International Symposium on High-Performance ComputerArchitecture

[20] Marc L Corliss E Christopher Lewis and Amir Roth 2005 UsingDISE to Protect Return Addresses from Attack SIGARCH ComputArchit News (March 2005)

[21] Jedidiah R Crandall and Frederic T Chong 2004 Minos Control DataAttack Prevention Orthogonal toMemoryModel In Proceedings of the37th Annual IEEEACM International Symposium on Microarchitecture(MICRO 37) IEEE Computer Society Washington DC USA 221ndash232httpsdoiorg101109MICRO200426

[22] John Criswell Nathan Dautenhahn and Vikram Adve 2014 VirtualGhost Protecting Applications from Hostile Operating Systems InProceedings of the 19th International Conference on Architectural Sup-port for Programming Languages and Operating Systems (ASPLOS rsquo14)ACM New York NY USA 81ndash96 httpsdoiorg10114525419402541986

[23] John Criswell Andrew Lenharth Dinakar Dhurjati and Vikram Adve2007 Secure Virtual Architecture A Safe Execution Environmentfor Commodity Operating Systems In Proceedings of Twenty-firstACM SIGOPS Symposium on Operating Systems Principles (SOSP rsquo07)ACM New York NY USA 351ndash366 httpsdoiorg10114512942611294295

[24] Michael Dalton Hari Kannan and Christos Kozyrakis 2007 RakshaA Flexible Information Flow Architecture for Software Security InProceedings of the 34th Annual International Symposium on ComputerArchitecture (ISCA rsquo07) ACM New York NY USA 482ndash493 httpsdoiorg10114512506621250722

[25] James C Dehnert Brian K Grant John P Banning Richard JohnsonThomas Kistler Alexander Klaiber and Jim Mattson 2003 The Trans-meta Code Morphing Software using speculation recovery andadaptive retranslation to address real-life challenges In Proceedingsof the international symposium on Code generation and optimizationfeedback-directed and runtime optimization

[26] Matthew DeVuyst Ashish Venkat and Dean M Tullsen 2012 Ex-ecution migration in a heterogeneous-isa chip multiprocessor InProceedings of the Seventeenth International Conference on Architec-tural Support for Programming Languages and Operating Systems

[27] Craig Disselkoen David Kohlbrenner Leo Porter and Dean Tullsen2017 Prime+Abort A Timer-Free High-Precision L3 CacheAttack using Intel TSX In 26th USENIX Security Symposium

(USENIX Security 17) USENIX Association Vancouver BC 51ndash67 httpswwwusenixorgconferenceusenixsecurity17technical-sessionspresentationdisselkoen

[28] Xiaowan Dong Zhuojia Shen John Criswell Alan Cox and SandhyaDwarkadas 2018 Spectres Virtual Ghosts and Hardware SupportIn Proceedings of the 7th International Workshop on Hardware andArchitectural Support for Security and Privacy (HASP rsquo18) ACM NewYork NY USA Article 5 9 pages httpsdoiorg10114532142923214297

[29] Xiaowan Dong Zhuojia Shen John Criswell Alan L Cox and Sand-hya Dwarkadas 2018 Shielding Software From Privileged Side-Channel Attacks In 27th USENIX Security Symposium (USENIX Se-curity 18) USENIX Association Baltimore MD 1441ndash1458 httpswwwusenixorgconferenceusenixsecurity18presentationdong

[30] Dmitry Evtyushkin Ryan Riley Nael CSE Abu-Ghazaleh ECE andDmitry Ponomarev 2018 BranchScope A New Side-Channel Attackon Directional Branch Predictor In Proceedings of the Twenty-ThirdInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems (ASPLOS rsquo18) ACM New York NYUSA 693ndash707 httpsdoiorg10114531731623173204

[31] Karine Gandolfi Christophe Mourtel and Francis Olivier 2001 Elec-tromagnetic analysis Concrete results In International Workshop onCryptographic Hardware and Embedded Systems

[32] Qian Ge Yuval Yarom David Cock and Gernot Heiser 2018 Asurvey of microarchitectural timing attacks and countermeasures oncontemporary hardware Journal of Cryptographic Engineering 8 1(2018) 1ndash27

[33] Daniel Genkin Adi Shamir and Eran Tromer 2014 RSA key ex-traction via low-bandwidth acoustic cryptanalysis In InternationalCryptology Conference

[34] Daniel Genkin Adi Shamir and Eran Tromer 2016 Acoustic Crypt-analysis Journal of Cryptology (2016)

[35] Github [n d] wrk - a HTTP benchmarking tool httpsgithubcomwgwrk Online accessed Jun 2018

[36] Daniel Gruss Moritz Lipp Michael Schwarz Richard Fellner Cleacute-mentine Maurice and Stefan Mangard 2017 Kaslr is dead long livekaslr In International Symposium on Engineering Secure Software andSystems Springer 161ndash176

[37] Daniel Gruss Raphael Spreitzer and Stefan Mangard 2015Cache Template Attacks Automating Attacks on InclusiveLast-Level Caches In 24th USENIX Security Symposium (USENIXSecurity 15) USENIX Association Washington DC 897ndash912httpswwwusenixorgconferenceusenixsecurity15technical-sessionspresentationgruss

[38] Maurice CleacutementineWagner Klaus Gruss Daniel and StefanMangard2016 Flush+Flush A Fast and Stealthy Cache Attack

[39] V Haldar D Chandra andM Franz 2005 Dynamic taint propagationfor Java In 21st Annual Computer Security Applications Conference(ACSACrsquo05) 9 ppndash311 httpsdoiorg101109CSAC200521

[40] W Halfond A Orso and P Manolios 2008 WASP Protecting WebApplications Using Positive Tainting and Syntax-Aware EvaluationIEEE Transactions on Software Engineering 34 1 (Jan 2008) 65ndash81httpsdoiorg101109TSE200770748

[41] John L Henning 2006 SPEC CPU2006 Benchmark DescriptionsSIGARCH Comput Archit News (Sept 2006)

[42] Jonathan J Hoch and Adi Shamir 2004 Fault analysis of streamciphers In International Workshop on Cryptographic Hardware andEmbedded Systems

[43] Intel 2008 2nd Generation Intel Core vPro Processor Family Tech-nical Report httpwwwintelcomcontentdamdocwhite-paperperformance-2nd-generation-core-vpro-family-paperpdf

[44] Intel 2014 Software guard extensions programming reference (2014)httpssoftwareintelcomsitesdefaultfiles329298-001pdf

[45] Intel 2018 White paper Intel Analysis of Speculative Exe-cution Side Channels Technical Report 336983-001 Revision10 httpsnewsroomintelcomwp-contentuploadssites11201801Intel-Analysis-of-Speculative-Execution-Side-Channelspdf

[46] Intel 2018 White paper Retpoline A Branch Target In-jection Mitigation Technical Report 337131-003 Revision003 httpssoftwareintelcomsitesdefaultfilesmanaged1d46Retpoline-A-Branch-Target-Injection-Mitigationpdf

[47] Intel Corporation 2009 Intelreg 64 and IA-32 Architectures OptimizationReference Manual

[48] Intel Corporation 2011 Intel 64 and IA-32 Architectures SoftwareDeveloperrsquos Manual Intel Corporation

[49] Project Zero Jann Horn 2018 Reading privileged memory with a side-channel httpsgoogleprojectzeroblogspotcom201801reading-privileged-memory-with-sidehtml Online accessed Jan 2018

[50] Morteza Mohajjel Kafshdooz Mohammadkazem Taram Sepehr As-sadi and Alireza Ejlali 2016 A Compile-Time Optimization MethodforWCET Reduction in Real-Time Embedded Systems Through BlockFormation ACMTrans Archit Code Optim 12 4 Article 66 (Jan 2016)25 pages httpsdoiorg1011452845083

[51] R Karri K Wu P Mishra and Yongkook Kim 2002 Concurrenterror detection schemes for fault-based side-channel cryptanalysisof symmetric block ciphers IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems 21 12 (Dec 2002) 1509ndash1517httpsdoiorg101109TCAD2002804378

[52] Mehmet Kayaalp Khaled N Khasawneh Hodjat Asghari EsfedenJesse Elwell Nael Abu-Ghazaleh Dmitry Ponomarev and AamerJaleel 2017 RIC Relaxed Inclusion Caches for Mitigating LLC Side-Channel Attacks In Proceedings of the 54th Annual Design AutomationConference 2017 (DAC rsquo17) ACM New York NY USA Article 76 pages httpsdoiorg10114530616393062313

[53] Khaled N Khasawneh Esmaeil Mohammadian Koruyeh ChengyuSong Dmitry Evtyushkin Dmitry Ponomarev and Nael Abu-Ghazaleh 2018 SafeSpec Banishing the Spectre of a Meltdown withLeakage-Free Speculation arXiv preprint arXiv180605179 (2018)

[54] Vladimir Kiriansky and Carl Waldspurger 2018 Speculative bufferoverflows Attacks and defenses arXiv preprint arXiv180703757(2018)

[55] Vladimir Kiriansky and Carl Waldspurger 2018 Speculative BufferOverflows Attacks and Defenses arXiv preprint arXiv180703757(2018)

[56] Paul Kocher 2018 Spectre Mitigations in MicrosoftrsquosCC++ Compiler httpswwwpaulkochercomdocMicrosoftCompilerSpectreMitigationhtml Online accessedJul 2018

[57] Paul Kocher Daniel Genkin Daniel Gruss Werner Haas MikeHamburg Moritz Lipp Stefan Mangard Thomas Prescher MichaelSchwarz and Yuval Yarom 2018 Spectre Attacks Exploiting Specu-lative Execution ArXiv e-prints (Jan 2018) arXiv180101203

[58] Esmaeil Mohammadian Koruyeh Khaled N Khasawneh ChengyuSong and Nael Abu-Ghazaleh 2018 Spectre Returns SpeculationAttacks using the Return Stack Buffer In 12th USENIXWorkshop on Of-fensive Technologies (WOOT 18) USENIX Association Baltimore MDhttpswwwusenixorgconferencewoot18presentationkoruyeh

[59] Esmaeil Mohammadian Koruyeh Khaled N Khasawneh ChengyuSong and Nael Abu-Ghazaleh 2018 Spectre Returns SpeculationAttacks using the Return Stack Buffer In 12th USENIXWorkshop on Of-fensive Technologies (WOOT 18) USENIX Association Baltimore MDhttpswwwusenixorgconferencewoot18presentationkoruyeh

[60] Monica S Lam Michael Martin Benjamin Livshits and John Wha-ley 2008 Securing Web Applications with Static and DynamicInformation Flow Tracking In Proceedings of the 2008 ACM SIG-PLAN Symposium on Partial Evaluation and Semantics-based Pro-gram Manipulation (PEPM rsquo08) ACM New York NY USA 3ndash12

httpsdoiorg10114513284081328410[61] Moritz Lipp Michael Schwarz Daniel Gruss Thomas Prescher

Werner Haas Anders Fogh Jann Horn Stefan Mangard Paul KocherDaniel Genkin Yuval Yarom and Mike Hamburg 2018 MeltdownReading Kernel Memory from User Space In 27th USENIX SecuritySymposium (USENIX Security 18)

[62] Fangfei Liu Yuval Yarom Qian Ge Gernot Heiser and Ruby BLee 2015 Last-Level Cache Side-Channel Attacks Are PracticalIn Proceedings of the 2015 IEEE Symposium on Security and Privacy(SP rsquo15) IEEE Computer Society Washington DC USA 605ndash622httpsdoiorg101109SP201543

[63] Jason Lowe-Power 2018 Visualizing Spectre with gem5 httpwwwlowepowercomjasonvisualizing-spectre-with-gem5html Onlineaccessed Jun 2018

[64] Jason Lowe-Power Venkatesh Akella Matthew K Farrens Samuel TKing and Christopher J Nitta 2018 Position Paper A Case forExposing Extra-architectural State in the ISA In Proceedings of the7th International Workshop on Hardware and Architectural Support forSecurity and Privacy (HASP rsquo18) ACM New York NY USA Article 86 pages httpsdoiorg10114532142923214300

[65] Scott A Mahlke David C Lin William Y Chen Richard E Hank andRoger A Bringmann 1992 Effective compiler support for predicatedexecution using the hyperblock In MICRO

[66] Giorgi Maisuradze and Christian Rossow 2018 Ret2Spec SpeculativeExecution Using Return Stack Buffers In Proceedings of the 2018 ACMSIGSAC Conference on Computer and Communications Security (CCSrsquo18) ACM New York NY USA 2109ndash2122 httpsdoiorg10114532437343243761

[67] Microsoft Security Response Center (MSRC) Matt Miller 2018Analysis and mitigation of speculative store bypass (CVE-2018-3639) httpsblogstechnetmicrosoftcomsrd20180521analysis-and-mitigation-of-speculative-store-bypass-cve-2018-3639 Onlineaccessed Jul 2018

[68] Keaton Mowery Sriram Keelveedhi and Hovav Shacham 2012 AreAES x86 cache timing attacks still feasible In Proceedings of the 2012ACM Workshop on Cloud computing security workshop

[69] Oleksii Oleksenko Bohdan Trach Tobias Reiher Mark Silbersteinand Christof Fetzer 2018 You Shall Not Bypass Employing datadependencies to prevent Bounds Check Bypass CoRR abs180508506(2018) arXiv180508506 httparxivorgabs180508506

[70] Yossef Oren Vasileios P Kemerlis Simha Sethumadhavan and An-gelos D Keromytis 2015 The Spy in the Sandbox Practical CacheAttacks in JavaScript and Their Implications In Proceedings of the22Nd ACM SIGSAC Conference on Computer and CommunicationsSecurity

[71] Guilherme Ottoni Thomas Hartin ChristopherWeaver Jason BrandtBelliappa Kuttanna and HongWang 2011 Harmonia A TransparentEfficient and Harmonious Dynamic Binary Translator Targeting theIntelampReg Architecture In Proceedings of the 8th ACM InternationalConference on Computing Frontiers (CF rsquo11) ACM New York NYUSA Article 26 10 pages httpsdoiorg10114520166042016635

[72] Andrew Pardoe 2018 Spectre mitigations in MSVC httpsblogsmsdnmicrosoftcomvcblog20180115spectre-mitigations-in-msvc Online accessed Jul 2018

[73] Joseph CH Park and Mike Schlansker 1991 On predicated executionHewlett-Packard Laboratories Palo Alto California

[74] Will Reese 2008 Nginx The High-performance Web Server andReverse Proxy Linux J 2008 173 Article 2 (Sept 2008) httpdlacmorgcitationcfmid=14122021412204

[75] Ryan Roemer Erik Buchanan Hovav Shacham and Stefan Savage2012 Return-oriented programming Systems languages and ap-plications ACM Transactions on Information and System Security(2012)

[76] Michael Schwarz Martin Schwarzl Moritz Lipp and Daniel Gruss2018 NetSpectre Read Arbitrary Memory over Network arXivpreprint arXiv180710535 (2018)

[77] Hovav Shacham Matthew Page Ben Pfaff Eu-Jin Goh NagendraModadugu and Dan Boneh 2004 On the Effectiveness of Address-space Randomization In Proceedings of the 11th ACM Conference onComputer and Communications Security (CCS rsquo04) ACM New YorkNY USA 298ndash307 httpsdoiorg10114510300831030124

[78] Julian Stecklina and Thomas Prescher 2018 LazyFP Leaking FPURegister State using Microarchitectural Side-Channels arXiv preprintarXiv180607480 (2018)

[79] G Edward Suh Jae W Lee David Zhang and Srinivas Devadas2004 Secure Program Execution via Dynamic Information FlowTracking In Proceedings of the 11th International Conference onArchitectural Support for Programming Languages and OperatingSystems (ASPLOS XI) ACM New York NY USA 85ndash96 httpsdoiorg10114510243931024404

[80] Mohammadkazem Taram Ashish Venkat and Dean Tullsen 2018Mobilizing the Micro-Ops Exploiting Context Sensitive Decodingfor Security and Energy Efficiency In Proceedings of the 45th AnnualInternational Symposium on Computer Architecture (ISCA rsquo18)

[81] Mohit Tiwari Hassan MG Wassel Bita Mazloom ShashidharMysore Frederic T Chong and Timothy Sherwood 2009 Com-plete Information Flow Tracking from the Gates Up In Proceedings ofthe 14th International Conference on Architectural Support for Program-ming Languages and Operating Systems (ASPLOS XIV) ACM NewYork NY USA 109ndash120 httpsdoiorg10114515082441508258

[82] Caroline Trippel Daniel Lustig and Margaret Martonosi 2018MeltdownPrime and SpectrePrime Automatically-Synthesized At-tacks Exploiting Invalidation-Based Coherence Protocols CoRRabs180203802 (2018) arXiv180203802 httparxivorgabs180203802

[83] DeanM Tullsen Susan J Eggers Joel S Emer Henry M Levy Jack L Loand Rebecca L Stamm 1996 Exploiting choice Instruction fetch andissue on an implementable simultaneous multithreading processor InACM SIGARCH Computer Architecture News Vol 24 ACM 191ndash202

[84] Dean M Tullsen Susan J Eggers and Henry M Levy 1995 Si-multaneous Multithreading Maximizing On-chip Parallelism InProceedings of the 22Nd Annual International Symposium on Com-puter Architecture (ISCA rsquo95) ACM New York NY USA 392ndash403httpsdoiorg101145223982224449

[85] Paul Turner 2018 Retpoline a software construct for preventingbranch-target-injection httpssupportgooglecomfaqsanswer7625886 Online accessed Jul 2018

[86] N Vachharajani M J Bridges J Chang R Rangan G Ottoni J ABlome G A Reis M Vachharajani and D I August 2004 RIFLEAn Architectural Framework for User-Centric Information-Flow Se-curity In Microarchitecture 2004 MICRO-37 2004 37th InternationalSymposium on 243ndash254 httpsdoiorg101109MICRO200431

[87] Jo Van Bulck Marina Minkin Ofir Weisse Daniel Genkin BarisKasikci Frank Piessens Mark Silberstein Thomas F Wenisch Yu-val Yarom and Raoul Strackx 2018 Foreshadow Extracting theKeys to the Intel SGX Kingdom with Transient Out-of-Order Execu-tion In Proceedings of the 27th USENIX Security Symposium USENIXAssociation See also technical report Foreshadow-NG [95]

[88] Ashish Venkat 2018 Breaking the ISA Barrier in Modern ComputingPhD Dissertation UC San Diego

[89] Ashish Venkat Arvind Krishnaswamy Koichi Yamada and RajanPalanivel 2015 Binary Translation driven Program State RelocationIn United States Patent Grant US009135435B2

[90] Ashish Venkat S Shamasunder Hovav Shacham and DeanM Tullsen2016 HIPStR Heterogeneous-ISA Program State Relocation In Pro-ceedings of the International Symposium on Architectural Support forProgramming Languages and Operating Systems

[91] Ashish Venkat and Dean M Tullsen 2014 Harnessing ISA diversityDesign of a heterogeneous-ISA chip multiprocessor In Proceedingsof the International Symposium on Computer Architecture

[92] G Venkataramani I Doudalis Y Solihin and M Prvulovic 2008FlexiTaint A programmable accelerator for dynamic taint propaga-tion In 2008 IEEE 14th International Symposium on High PerformanceComputer Architecture 173ndash184 httpsdoiorg101109HPCA20084658637

[93] Philipp Vogt Florian Nentwich Nenad Jovanovic Engin KirdaChristopher Kruegel and Giovanni Vigna 2007 Cross Site Script-ing Prevention with Dynamic Data Tainting and Static Analysis InNDSS Vol 2007 12

[94] Zhenghong Wang and Ruby B Lee 2006 Covert and side channelsdue to processor architecture In Computer Security Applications Con-ference 2006 ACSACrsquo06 22nd Annual IEEE 473ndash482

[95] Ofir Weisse Jo Van Bulck Marina Minkin Daniel Genkin BarisKasikci Frank Piessens Mark Silberstein Raoul Strackx Thomas FWenisch and Yuval Yarom 2018 Foreshadow-NG Breaking theVirtual Memory Abstraction with Transient Out-of-Order Execu-tion Technical report (2018) See also USENIX Security paper Fore-shadow [87]

[96] Troy Wolverton 2018 Spectre and Meltdown are now a le-gal pain for Intel the chip maker faces 35 lawsuits over the at-tacks httpswwwbusinessinsidercom35-lawsuits-have-been-filed-against-intel-over-spectre-and-meltdown-2018-2 Online ac-cessed Aug 2018

[97] Zhenyu Wu Zhang Xu and Haining Wang 2012 Whispers in theHyper-space High-speed Covert Channel Attacks in the Cloud InPresented as part of the 21st USENIX Security Symposium (USENIX Se-curity 12) USENIX Bellevue WA 159ndash173 httpswwwusenixorgconferenceusenixsecurity12technical-sessionspresentationwu

[98] Wei Xu Sandeep Bhatkar and Ramachandran Sekar 2006 Taint-Enhanced Policy Enforcement A Practical Approach to Defeat aWide Range of Attacks In USENIX Security Symposium 121ndash136

[99] M Yan J Choi D Skarlatos A Morrison C Fletcher and J Torrellas2018 InvisiSpec Making Speculative Execution Invisible in the CacheHierarchy In 2018 51st Annual IEEEACM International Symposium onMicroarchitecture (MICRO) 428ndash441 httpsdoiorg101109MICRO201800042

[100] Yuval Yarom and Katrina Falkner 2014 FLUSH+ RELOAD A HighResolution Low Noise L3 Cache Side-Channel Attack In USENIXSecurity

[101] Yuval Yarom Daniel Genkin and Nadia Heninger 2016 CacheBleedA timing attack on OpenSSL constant time RSA In InternationalConference on Cryptographic Hardware and Embedded Systems

[102] Alexander Yip Xi Wang Nickolai Zeldovich and M Frans Kaashoek2009 Improving Application Security with Data Flow AssertionsIn Proceedings of the ACM SIGOPS 22Nd Symposium on OperatingSystems Principles (SOSP rsquo09) ACM New York NY USA 291ndash304httpsdoiorg10114516295751629604

[103] Craig Zilles 2001 Linked List Traversal Micro-Benchmark httpzillescsillinoisedullubenchmarkhtml Online accessed Jun 2018

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
  • 3 Assumptions and Threat Model
  • 4 Architectural Overview
  • 5 Design and Implementation
    • 51 Microcode Customization
    • 52 Decoder-Level Information Flow Tracking
    • 53 Mitigations for Control-flow Mistraining
      • 6 Methodology
      • 7 Evaluation
        • 71 Security Discussion
        • 72 Performance
          • 8 Conclusion
          • Acknowledgments
          • References
Page 9: Context-Sensitive Fencing: Securing Speculative Execution ...av6ds/papers/asplos2019.pdf · Context-Sensitive Fencing: Securing Speculative Execution via Microcode Customization Mohammadkazem

Macro-to-microop Translation

Unit

CSD-Enabled Decoder

Dis

patc

her

FetchTaint

Evaluator

Load Store Queue

2

Reg File

Actu

al T

aint

Bits

3

4

5

TLB

Tain

t Bitm

ap1

ADD rax rbx

raxSpecTaint |= rbxSpecTaint

Mistaint Recovery -Flushing the pipeline -Restarting translation and execution -Recovering speculative taint map

EA T

aint

CommitUnit

FetchDecode Execute

Speculative Taint Map

~Spec_Taint ampamp

Evaluated_Taint

True

Commit

Arch TaintMap

Arch RF

Figure 5 DLIFT integration with a CSD-enabled pipeline

separate the taint information into four taint structures ndash (a)a decoder-level taint map that tracks and maintains the taintinformation for architectural registers (b) the physical reg-ister file augmented with taint information that maintainsdynamically computed taint information at execute (c) theTLB and page tables augmented with a taint bitmap to trackcache block-level taint information and (d) a commit-leveltaint map that maintains verified architectural register taintinformation While the latter three structures are typical ofany standard DIFT implementation [79] the first structureis a new addition proposed by this work Instruction trans-lation thus depends on some speculative taint tracking thatmight not be exact potentially leaving vulnerable instruc-tions unfenced (ie translating to non-secure micro-ops) Forthis reason DLIFT relies on a mistaint recovery mechanismthat upon detecting a mistaint at the execution stage redi-rects and restarts the execution from the incorrectly taintedinstructionFigure 5 shows the integration of DLIFT with a CSD-

enabled pipeline The DLIFT engine maintains a taint mapthat stores a speculative taint bit for each architectural reg-ister For each incoming instruction (bull1 ) the DLIFT engineevaluates the taint of the destination register(s) based on thespeculative taint bits of the source registers Taint evaluationfollows the standard DIFT procedure [79] If a source regis-ter is marked as tainted in the speculative taint map DLIFTpropagates this taint to the destination register and furthertriggers context-sensitive translation (bull2 ) of the instruction(eg with speculative fence insertion) depending upon theconfigured levels of performance and security

Tracking taints at the decoder level is not always straight-forward First during speculative execution the DLIFT en-gine continues to propagate taints along the misspeculatedpath potentially leaving the taint map in an inconsistentstate upon recovery from a branch misprediction To thisend the DLIFT engine rolls back the state of the taint map

to its commit-level counterpart as part of the branch mis-prediction recovery Second the effective addresses of loadinstructions is usually not known at the decoder stage dueto which the DLIFT engine cannot accurately compute andpropagate taint information A conservative approach thatmarks all loads with unknown effective addresses as taintedcould severely degrade performance since overtainting typi-cally results in a high frequency of fence insertion Thereforewe take a more optimistic approach by assuming loads withunknown effective addresses are untainted and further relyon our mistaint detection mechanism at the execute stageto validate the predicted taints against the dynamically eval-uated taints (bull3 andbull5 ) In the event a mistaint is detectedwe update the speculative taint map flush the pipeline andrestart the translation and execution of the mistainted in-struction Note that we only perform mistaint recovery forthe undertainting scenario since it could potentially leavevulnerable instructions unfenced

Furthermore the DLIFT engine is capable of following in-struction sequences and execution patterns such as a taintedload followed by a branch allowing for the seamless de-tection of different Spectre gadget variants In this way al-though speculative the DLIFT engine has the potential forfast recovery from misspeculation and further allows themicrocode customization framework to perform a more tar-geted and surgical insertion of fences for particular vulnera-ble targets

53 Mitigations for Control-flow MistrainingTwo Spectre variants (v2 and v5) rely on the mistrainingof the branch predictor and the return stack buffer to influ-ence the victimrsquos branch outcome across different protec-tion domains To mitigate these attacks we examine severalhardening mechanisms In the simplest case where we areprotecting kernel branches from getting influenced we caninstrument syscall instructions to enforce branch predictorstate isolation If we want to protect a wider range of domaincrossings we could rely on a simple hardware mechanismthat identifies control transfer to a domain with a higherprivilege and then sets a flag The first control flow instruc-tion that decodes after that flag is set would then triggeran alternate decoding and reset the flag The alternate de-coding would insert a fence (to protect against this branchbeing mispredicted) then execute micro-ops that enforcebranch predictor state isolation and subsequently resumethe original control flowThis solution then is flexible enough that any region of

code could be protected in this way by configuringmodel spe-cific range registers (MSRRs) to indicate a protected regionthat would always reset the branch predictor upon entrythis then even prevents mistraining of sensitive code ofdifferent threads within the same process To prevent theMSRRs from being tampered by the attacker we only allow

Table 4 Architecture Detail for the Baseline x86 CoreBaseline Processor

Frequency 33 GHz I cache 32 KB 8 wayFetch width 4 fused uops D cache 32 KB 8 wayIssue width 6 unfused uops ROB size 168 entriesINTFP Regfile 160144 regs IQ 54 entriesRAS size 816 32 entries BTB size 256 512 1024 entriesLQSQ size 6436 entries Functional Int ALU(6) Mult(1)Branch Predictor LTAGE Units FP ALUMult(2) SIMD(2)

the operating system to configure MSRRs and we use specu-lative fences to guard x86rsquosWRMSR instruction Furthermorein the case of remapping via mmap() the operating systemis also responsible for reconfiguring MSRRs

The easiest and the most heavyweight way to isolate theBP state is to clear all the states by a series of returnbranchmicro-ops However such a hard reset of the BP can be morequickly and efficiently performed using a special micro-opthat clears some or all state of the branch predictor therebyenabling custom solutions to reset different structures withinthe prediction unit if such a micro-op is available In thiswork we assume a special micro-op to clear the branch targetbuffer (BTB) and the return stack buffer (RSB) the primarytargets of the Spectre attackIt should be noted that while resetting the BTB and the

RSB will induce more mispredictions (something the attackerwants) it does the attacker no good with respect to theseSpectre variants because the attacker can no longer controlthe target of the control flow misspeculation

6 MethodologyThis section details the experimental methodology for theperformance and security evaluation of the context-sensitivefencing frameworkOur baseline processor is modeled after the Intel Sandy-

bridge microarchitecture [43] Table 4 shows the architec-tural configuration of our baseline processor in more detailNote that we evaluate against three different branch pre-dictor configurations (with different BTB and RAS sizes) tomeasure the performance impact of our micro-op flows thatperform branch predictor state isolation across protection do-mains to defend against variant-2Wemodel this architectureusing the gem5 [8] architectural simulator which already fea-tures x86 micro-op translation Furthermore gem5 alreadyfeatures a Spectre test infrastructure and a visualization toolthat allow us to validate our claims regarding the securityguarantees of context-sensitive fencing [63]One of the primary goals of this work is to protect ker-

nel memory from being leaked along misspeculated pathsTherefore we use the full system simulation mode of gem5which allows us to boot an Ubuntu 1804 distribution ofLinux with a kernel version of 4813 Furthermore in orderto provide realistic estimates regarding attack coverage andperformance impact of our proposed techniques we selecta good mix of benchmarks that spend different amounts ofexecution time in kernel mode touch different aspects of the

Table 5 Benchmarks DescriptionBenchmark Description Kernel-Time

nginx HTTP Web Server 66ps Process information query 75ping Sends ICMP ECHO_REQUEST to loopback 95ls Performs two level directory listing 79llu Linked list traversal micro-benchmark 7

bzip2 Compression 34gcc C Language optimizing compiler 11

omnetpp Discrete event simulation 15

sjeng Artificial Intelligence (game tree searchand pattern recognition) 22

operating system and perform different number of systemcalls during execution as illustrated in Table 5 To this endwe include four benchmarks bzip2 gcc sjeng omnet fromthe SPEC CPU2006 suite [41] that exhibit varying degreesof instruction-level parallelism Furthermore we use threecommonly used Unix tools that target different functional-ities of the kernel and use different device drivers ndash (ping)that sends and receives ICMP packets (ps aux) that queriesprocess information and (ls ) that queries the filesystemin order to list multi-level directory information In additionwe also add a memory-intensive program llu that allocatesa large amount of memory for a linked list and traversesthe list making random memory accesses [103] Finally wealso evaluate the impact of our fences on the nginx webserver [74] using the wrk [35] framework to generate HTTPrequests

7 EvaluationIn this section we evaluate both security and performanceimpacts of the proposed strategy starting first with securityassessment and following that with performance evaluation

71 Security DiscussionThe proposed defense strategy can mitigate five variants ofthe Spectre attacks We discuss these in detail below

Variants 1 11 and 12 The goal of these variants is toleak memory contents of a victim along a misspseculatedpath by bypassing a bounds check andor further stitch to-gether such Spectre gadgets via store-to-load forwardingAll variants exploit a data cache-based side channel to leakinformation Context-Sensitive Fencing mitigates these vari-ants primarily by preventing information from being leakedalong misspeculated paths via fences that are surgically in-serted into the instruction stream and strategically placedbetween the conditional branch that performs the boundscheck and the first load in the Spectre gadget This workproposes three new fences ndash LSQ-LFENCE LSQ-MFENCEand CFENCE that each defend against the variant-1 attack bydisallowing loads along misspeculated paths to modify thedata cache state LSQ-MFENCE additionally protects againstthe variants 11 and 12 by preventing both speculative loadsand stores from being issued out of the loadstore queue

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

Injected for All Kernel Loads

100

105

110

115

120

125

130

135

140

145

150

155

160

165

170

175

180

185

190

195

200

Nor

mal

ized

Exe

cution

Tim

e

LFENCE-LC

LSQ-MFENCE-LC

CFENCE-LC

Figure 6 Execution Time of Different Fence EnforcementLevels (normalized to insecure execution)

effectively avoiding store-to-load forwarding between spec-ulative loads and stores

The surgical insertion of these fences is facilitated by thedecoder-level information flow tracker (DLIFT) that can fol-low instruction sequences that could serve as Spectre gad-gets as described in Section 52 While the DLIFT itself per-forms speculative tracking of taint information it has theability to detect mistaints and recover within three stages ofthe pipeline We further experimentally show later in thissection that the speculative nature of the DLIFT engine maysometimes result in overtainting (where trusted operand-sinstructions get marked as tainted) but never results in anundetected and unrecovered undertainting (where untrustedoperandsinstructions remain untainted)

Variants 2 and 5 These variants reverse-engineer andmistrain the branch predictor and the return address stack toinfluence the branch outcomes of a victim which in our caseis the Linux kernel In addition to the surgical fence injec-tion and speculative taint tracking context-sensitive fencingfurther employs mistraining mitigations that circumventsuch attempts More specifically we intercept all protectiondomain crossings via syscall and int 0x80 instructions andperform a full reset of the branch target buffer (BTB) and thereturn stack buffer (RSB) clearing all previous state Thisprevents user code from influencing branch outcomes of thekernel code We record no instance of BTB collision betweenuser and kernel branches and we ensure that we alwaysstart with a clean BTB and RSB upon entering kernel mode

To show that our different fences and frequency optimiza-tions close Spectre attacks we use a proof of concept Spectreimplementation and visualization tool [63] The attacks failin every case when we use context-sensitive fencing to insertfences despite our performance optimizations

72 PerformanceFigure 6 measures the performance impact of three differentfences ndash (a) the standard x86 LFENCE (b) the LSQ-MFENCE

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

Injected for All Kernel Loads

100

105

110

115

120

125

130

135

140

145

Nor

mal

ized

Exe

cution

Tim

e

CFENCE-LC

CFENCE-EC

Figure 7 Execution Time of Early and Late Commit ofCFENCE (normalized to insecure execution)

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

00

01

02

03

04

05

06

07

08

09

10

Rat

ioO

ver

All

Loa

ds

bzip2 gcc omnet llu nginx sjeng ls ping ps

Non-Modifying Misses Non-Modifying Hits Misses Hits

Figure 8 Effects of injecting CFENCE on Cache Miss Rate

and (c) the CFENCE all enforced with the standard latecommit pessimistically inserted for every kernel load in theprogram Clearly from the figure the CFENCE incurs thelowest performance overhead since it is less restrictive thanthe other two and is enforced at a much later pipeline stageWe further study the effect of late and early commit policiesfor the CFENCE as shown in Figure 7 The early commitversion of the CFENCE consistently performs better than thelate commit version saving about 4 in overall executiontime on averageOverall the CFENCE reduces the incurred performance

overhead due to fencing by 23X bringing down the exe-cution time overhead from 48 to 21 Furthermore thisperformance improvement is consistent across all the bench-marks the only exception being llu which performs randommemory accesses due to the linked list traversal and suffersfrom a very high cache miss rateIn Figure 8 we study the effect of the CFENCE on cache

miss rate Recall that when a load passes a CFENCE it getsmarked as non-modifying and as a result if it ends up beinga miss in the cache it is deemed uncacheable and therefore

the fetched cache block isnrsquot brought into cache If the missrate of a program is already high as in the case of llu thepresence of a CFENCE in the instruction stream could po-tentially result in under-utilization of the cache since themissed blocks arenrsquot being filled back into the cache whilethe CFENCE is begin enforced For programs that have un-usually low hit rates we suggest using the standard LFENCEinstead of the CFENCE However in those cases it should benoted that the overhead of the mitigation is low regardlessof the fence usedFurthermore it is important to note that the number of

non-modifying loads in the dynamic instruction stream donot necessarily have a negative impact on performancewhile a CFENCE is being enforced For example while theping program has the most non-modifying loads it doesnot suffer much in terms of overall performance becausethe working set fits well into the cache and most of thenon-modifying accesses end up being hits In general wegain more from employing the CFENCE instead of standardserializing fences when the cache hit rate is lowFigure 9 shows the performance of our proposed fence

frequency optimization techniques In the first optimizationwe inject the CFENCE only for tainted loads and branches asindicated by the DLIFT engine This reduces the performanceoverhead of the defense from 21 to 11 on average We fur-ther optimize the number of fences inserted by performingbasic block-level fence insertion where we only instrumentthe first instance of a vulnerable load in each basic block Thisresults in an additional 4 improvement in performance AsFigure 9 shows the once per basic block opitimization is suc-cessful at improving the performance of all the benchmarksexcept llu That is because CFENCE changes the cache accesspattern usually at some cost but occasionally beneficiallyby bypassing accesses and reducing pressure on the cacheIn such a case like llu with its large working set executingfewer fences would then be slightly less beneficial Overallcompared to the state-of-the-art fence injection scheme thatpessimistically injects an LFENCE for all kernel loads ourDLIFT-Based CFENCE injection reduces the performanceoverhead from 48 to just 77 on averageWe next study the accuracy and coverage of our DLIFT

implementation Figure 10 examines overtainting scenarioswhere the DLIFT engine conservatively marks instructionsas tainted when theyrsquore actually not The results are nor-malized to the total number of dynamic load instructionsexecuted Further we examine two scenarios ndash the bars tothe left represent DLIFT-based fencing without the basic-block level fence insertion and the bars to the right includesthe basic-block level fence insertion optimization In mostcases the percentage of overtainted loads remains low inboth scenariosFigure 11 examines undertainting scenarios where the

DLIFT engine optimistically forgoes instrumenting certaininstructions that are actually vulnerable but recovers from

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

CFENCE

100

105

110

115

120

125

130

135

140

145

150

155

160

Nor

mal

ized

Exe

cution

Tim

e

Always Fencing

DLIFT-Based Fencing

DLIFT + Once per BB

Figure 9 Impact of Fence Frequency Optimizations on Exe-cution Time

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

00

01

02

03

04

05

06

07

08

09

10

Rat

ioO

ver

All

Loa

ds

ping ps nginx bzip2 ls sjeng llu omnet gcc

Over-tainted Loads Tainted Loads

Figure 10 Accuracy of DLIFT Overtainting Rate of Loads

such scenarios later in the pipeline We observe that thepercentage of undertainted loads is low and consistentlybelow 10 with the outliers being the kernel-heavy applica-tions ping and ps which have the highest number of taintedinstructions (instructions with at least one tainted sourceregister) as shown in Figure 12Note that our DLIFT engine performs explicit informa-

tion flow tracking modeled after DIFT [79] that tracks copyload-address store-address and computational dependen-cies However we also evaluate CSF with a more conser-vative implicit information flow tracking model in whichwe taint the program counter when the branch outcomedepends upon a tainted value and further track both sidesof the branch This results in more tainted instructions andincurs 118 extra performance overhead on top of our taint-based CFENCE insertionTo evaluate the effect of our mistraining mitigations we

measure the impact of a full micro-op based BTB and RSBreset at every protection domain crossing We do this ex-periment on three separate branch predictor configurations(a) a small predictor with 256 BTB and 8 RAS entries (b) a

pssje

ngbz

ip2

ping llu gc

c ls

nginx

omne

t

Speculative DIFT ndash Sfence

000

001

002

003

004

005

006

007

008

009

Rat

ioof

Under

-tai

nte

dLoa

ds

Figure 11 Coverage of DLIFT Undertainting Rate of Loads

omnet bzip2 llu ls nginx ping ps sjeng gcc

Speculative DIFT ndash Sfence

00

01

02

03

04

05

06

07

08

09

10

Tai

nte

dIn

tstr

uct

ions

Tainted Untatinted

Figure 12 Ratio of instructions marked as tainted by DLIFT

medium predictor with 512 BTB and 16 RAS entries and(c) a large predictor with 1024 BTB and 32 RAS entries Wemeasure an average performance degradation of 27 on oursmall predictor 66 on our medium predictor and 152 onour large predictor Naturally the overhead is almost com-pletely due to the loss of prediction accuracy rather thanthe cost of micro-op expansionIn summary the proposed defense strategy introduces a

flexible microcode customization framework that performthe surgical insertion of newly introduced speculation fencesthat mitigate five variants of the Spectre class of attacksreducing the fencing overhead of state-of-the-art fence-basedSpectre mitigations by a factor of 6

8 ConclusionIn this work we propose context-sensitive fencing (CSF) aset of architectural techniques that provide high-performancedefense against Spectre-style attacks In particular we showthat we can reduce fencing overhead by a factor of 6 com-pared to a conservative fence insertion method

This is done by injecting fence instructions dynamicallyin the decoder with no recompilation and binary translationrequired This allows us to employ runtime information tostrategically insert fences when needed using taint infor-mation and avoiding redundant fences after branches Thiswork also introduces new fence primitives which protectsensitive structures from speculation-based microarchitec-tural effects with minimal impact on instruction throughputin the pipeline

AcknowledgmentsThe authors would like to thank the anonymous reviewersfor their helpful insights This research was supported inpart by NSF Grant CNS-1652925 NSFIntel FoundationalMicroarchitecture Research Grant CCF-1823444 and DARPAunder the agreement number HR0011-18-C-0020

References[1] Onur Acıiccedilmez Ccediletin Kaya Koccedil and Jean-Pierre Seifert 2007 Predict-

ing secret keys via branch prediction In CryptographersacircĂŹ Track atthe RSA Conference

[2] Onur Acıiccedilmez Werner Schindler and Ccediletin K Koccedil 2007 Cachebased remote timing attack on the AES In CryptographersacircĂŹ Trackat the RSA Conference

[3] Shaizeen Aga and Satish Narayanasamy 2017 InvisiMem SmartMemory Defenses for Memory Bus Side Channel In Proceedings ofthe 44th Annual International Symposium on Computer Architecture(ISCA rsquo17) ACM New York NY USA 94ndash106 httpsdoiorg10114530798563080232

[4] AMD 2018 White paper SOFTWARE TECHNIQUES FOR MANAG-ING SPECULATION ON AMD PROCESSORS Technical Report RE-VISION 12418 httpsdeveloperamdcomwp-contentresourcesManaging-Speculation-on-AMD-Processorspdf

[5] Nadav Amit Muli Ben-Yehuda Dan Tsafrir and Assaf Schuster 2011vIOMMU Efficient IOMMU Emulation In USENIX Annual TechnicalConference

[6] ARM 2018 Whitepaper Cache Speculation Side-channelsTechnical Report Version 22 httpsdeveloperarmcom-mediadeveloperpdfSecurity20update201020July2018CacheSpeculationSide-channels-v22pdfrevision=7de26366-a49f-4c23-85f0-34e8f5e38881

[7] Eli Biham and Adi Shamir 1997 Differential fault analysis of secretkey cryptosystems In Annual International Cryptology Conference

[8] Nathan L Binkert Ronald G Dreslinski Lisa R Hsu Kevin T Lim Ali GSaidi and Steven K Reinhardt 2006 The M5 Simulator ModelingNetworked Systems Micro IEEE (2006)

[9] Darrell Boggs Gary Brown Nathan Tuck and K S Venkatraman2015 Denver Nvidiarsquos First 64-bit ARM Processor IEEE Micro (2015)

[10] Chandler Carruth 2018 Mitigating Speculative Attacks inCrypto httpsgithubcomHACS-workshopspectre-mitigationsblobmastercryptoguidelinesmd Online accessed Jul 2018

[11] Chandler Carruth 2018 RFC Speculative Load Hardening (aSpectre variant1 mitigation) httpsdocsgooglecomdocumentd1wwcfv3UV9ZnZVcGiGuoITT61eKo3TmoCS3uXLcJR0edit Onlineaccessed Jul 2018

[12] Chandler Carruth 2018 RFC Speculative Load Hardening (a Spectrevariant1 mitigation) httpslistsllvmorgpipermailllvm-dev2018-March122085html Online accessed Jul 2018

[13] Guoxing Chen Sanchuan Chen Yuan Xiao Yinqian Zhang ZhiqiangLin and Ten H Lai 2018 SgxPectre Attacks Leaking Enclave

Secrets via Speculative Execution CoRR abs180209085 (2018)arXiv180209085 httparxivorgabs180209085

[14] H Chen X Wu L Yuan B Zang P c Yew and F T Chong 2008From Speculation to Security Practical and Efficient InformationFlow Tracking Using Speculative Hardware In 2008 InternationalSymposium on Computer Architecture 401ndash412 httpsdoiorg101109ISCA200818

[15] Shimin Chen Phillip B GibbonsMichael Kozuch and Todd CMowry2011 Log-based Architectures Using Multicore to Help SoftwareBehave Correctly SIGOPS Oper Syst Rev 45 1 (Feb 2011) 84ndash91httpsdoiorg10114519450231945034

[16] Hari Cherupalli Henry Duwe Weidong Ye Rakesh Kumar and JohnSartori 2017 Software-based Gate-level Information Flow Securityfor IoT Systems In Proceedings of the 50th Annual IEEEACM Interna-tional Symposium on Microarchitecture (MICRO-50 rsquo17) ACM NewYork NY USA 328ndash340 httpsdoiorg10114531239393123955

[17] JonathanCorbet 2017 The current state of kernel page-table isolationhttpslwnnetArticles741878 Online accessed Jan 2018

[18] Marc L Corliss E Christopher Lewis and Amir Roth 2003 DISE Aprogrammable macro engine for customizing applications In Com-puter Architecture 2003 Proceedings 30th Annual International Sym-posium on

[19] Marc L Corliss E Christopher Lewis and Amir Roth 2005 Low-overhead interactive debugging via dynamic instrumentation withDISE In 11th International Symposium on High-Performance ComputerArchitecture

[20] Marc L Corliss E Christopher Lewis and Amir Roth 2005 UsingDISE to Protect Return Addresses from Attack SIGARCH ComputArchit News (March 2005)

[21] Jedidiah R Crandall and Frederic T Chong 2004 Minos Control DataAttack Prevention Orthogonal toMemoryModel In Proceedings of the37th Annual IEEEACM International Symposium on Microarchitecture(MICRO 37) IEEE Computer Society Washington DC USA 221ndash232httpsdoiorg101109MICRO200426

[22] John Criswell Nathan Dautenhahn and Vikram Adve 2014 VirtualGhost Protecting Applications from Hostile Operating Systems InProceedings of the 19th International Conference on Architectural Sup-port for Programming Languages and Operating Systems (ASPLOS rsquo14)ACM New York NY USA 81ndash96 httpsdoiorg10114525419402541986

[23] John Criswell Andrew Lenharth Dinakar Dhurjati and Vikram Adve2007 Secure Virtual Architecture A Safe Execution Environmentfor Commodity Operating Systems In Proceedings of Twenty-firstACM SIGOPS Symposium on Operating Systems Principles (SOSP rsquo07)ACM New York NY USA 351ndash366 httpsdoiorg10114512942611294295

[24] Michael Dalton Hari Kannan and Christos Kozyrakis 2007 RakshaA Flexible Information Flow Architecture for Software Security InProceedings of the 34th Annual International Symposium on ComputerArchitecture (ISCA rsquo07) ACM New York NY USA 482ndash493 httpsdoiorg10114512506621250722

[25] James C Dehnert Brian K Grant John P Banning Richard JohnsonThomas Kistler Alexander Klaiber and Jim Mattson 2003 The Trans-meta Code Morphing Software using speculation recovery andadaptive retranslation to address real-life challenges In Proceedingsof the international symposium on Code generation and optimizationfeedback-directed and runtime optimization

[26] Matthew DeVuyst Ashish Venkat and Dean M Tullsen 2012 Ex-ecution migration in a heterogeneous-isa chip multiprocessor InProceedings of the Seventeenth International Conference on Architec-tural Support for Programming Languages and Operating Systems

[27] Craig Disselkoen David Kohlbrenner Leo Porter and Dean Tullsen2017 Prime+Abort A Timer-Free High-Precision L3 CacheAttack using Intel TSX In 26th USENIX Security Symposium

(USENIX Security 17) USENIX Association Vancouver BC 51ndash67 httpswwwusenixorgconferenceusenixsecurity17technical-sessionspresentationdisselkoen

[28] Xiaowan Dong Zhuojia Shen John Criswell Alan Cox and SandhyaDwarkadas 2018 Spectres Virtual Ghosts and Hardware SupportIn Proceedings of the 7th International Workshop on Hardware andArchitectural Support for Security and Privacy (HASP rsquo18) ACM NewYork NY USA Article 5 9 pages httpsdoiorg10114532142923214297

[29] Xiaowan Dong Zhuojia Shen John Criswell Alan L Cox and Sand-hya Dwarkadas 2018 Shielding Software From Privileged Side-Channel Attacks In 27th USENIX Security Symposium (USENIX Se-curity 18) USENIX Association Baltimore MD 1441ndash1458 httpswwwusenixorgconferenceusenixsecurity18presentationdong

[30] Dmitry Evtyushkin Ryan Riley Nael CSE Abu-Ghazaleh ECE andDmitry Ponomarev 2018 BranchScope A New Side-Channel Attackon Directional Branch Predictor In Proceedings of the Twenty-ThirdInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems (ASPLOS rsquo18) ACM New York NYUSA 693ndash707 httpsdoiorg10114531731623173204

[31] Karine Gandolfi Christophe Mourtel and Francis Olivier 2001 Elec-tromagnetic analysis Concrete results In International Workshop onCryptographic Hardware and Embedded Systems

[32] Qian Ge Yuval Yarom David Cock and Gernot Heiser 2018 Asurvey of microarchitectural timing attacks and countermeasures oncontemporary hardware Journal of Cryptographic Engineering 8 1(2018) 1ndash27

[33] Daniel Genkin Adi Shamir and Eran Tromer 2014 RSA key ex-traction via low-bandwidth acoustic cryptanalysis In InternationalCryptology Conference

[34] Daniel Genkin Adi Shamir and Eran Tromer 2016 Acoustic Crypt-analysis Journal of Cryptology (2016)

[35] Github [n d] wrk - a HTTP benchmarking tool httpsgithubcomwgwrk Online accessed Jun 2018

[36] Daniel Gruss Moritz Lipp Michael Schwarz Richard Fellner Cleacute-mentine Maurice and Stefan Mangard 2017 Kaslr is dead long livekaslr In International Symposium on Engineering Secure Software andSystems Springer 161ndash176

[37] Daniel Gruss Raphael Spreitzer and Stefan Mangard 2015Cache Template Attacks Automating Attacks on InclusiveLast-Level Caches In 24th USENIX Security Symposium (USENIXSecurity 15) USENIX Association Washington DC 897ndash912httpswwwusenixorgconferenceusenixsecurity15technical-sessionspresentationgruss

[38] Maurice CleacutementineWagner Klaus Gruss Daniel and StefanMangard2016 Flush+Flush A Fast and Stealthy Cache Attack

[39] V Haldar D Chandra andM Franz 2005 Dynamic taint propagationfor Java In 21st Annual Computer Security Applications Conference(ACSACrsquo05) 9 ppndash311 httpsdoiorg101109CSAC200521

[40] W Halfond A Orso and P Manolios 2008 WASP Protecting WebApplications Using Positive Tainting and Syntax-Aware EvaluationIEEE Transactions on Software Engineering 34 1 (Jan 2008) 65ndash81httpsdoiorg101109TSE200770748

[41] John L Henning 2006 SPEC CPU2006 Benchmark DescriptionsSIGARCH Comput Archit News (Sept 2006)

[42] Jonathan J Hoch and Adi Shamir 2004 Fault analysis of streamciphers In International Workshop on Cryptographic Hardware andEmbedded Systems

[43] Intel 2008 2nd Generation Intel Core vPro Processor Family Tech-nical Report httpwwwintelcomcontentdamdocwhite-paperperformance-2nd-generation-core-vpro-family-paperpdf

[44] Intel 2014 Software guard extensions programming reference (2014)httpssoftwareintelcomsitesdefaultfiles329298-001pdf

[45] Intel 2018 White paper Intel Analysis of Speculative Exe-cution Side Channels Technical Report 336983-001 Revision10 httpsnewsroomintelcomwp-contentuploadssites11201801Intel-Analysis-of-Speculative-Execution-Side-Channelspdf

[46] Intel 2018 White paper Retpoline A Branch Target In-jection Mitigation Technical Report 337131-003 Revision003 httpssoftwareintelcomsitesdefaultfilesmanaged1d46Retpoline-A-Branch-Target-Injection-Mitigationpdf

[47] Intel Corporation 2009 Intelreg 64 and IA-32 Architectures OptimizationReference Manual

[48] Intel Corporation 2011 Intel 64 and IA-32 Architectures SoftwareDeveloperrsquos Manual Intel Corporation

[49] Project Zero Jann Horn 2018 Reading privileged memory with a side-channel httpsgoogleprojectzeroblogspotcom201801reading-privileged-memory-with-sidehtml Online accessed Jan 2018

[50] Morteza Mohajjel Kafshdooz Mohammadkazem Taram Sepehr As-sadi and Alireza Ejlali 2016 A Compile-Time Optimization MethodforWCET Reduction in Real-Time Embedded Systems Through BlockFormation ACMTrans Archit Code Optim 12 4 Article 66 (Jan 2016)25 pages httpsdoiorg1011452845083

[51] R Karri K Wu P Mishra and Yongkook Kim 2002 Concurrenterror detection schemes for fault-based side-channel cryptanalysisof symmetric block ciphers IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems 21 12 (Dec 2002) 1509ndash1517httpsdoiorg101109TCAD2002804378

[52] Mehmet Kayaalp Khaled N Khasawneh Hodjat Asghari EsfedenJesse Elwell Nael Abu-Ghazaleh Dmitry Ponomarev and AamerJaleel 2017 RIC Relaxed Inclusion Caches for Mitigating LLC Side-Channel Attacks In Proceedings of the 54th Annual Design AutomationConference 2017 (DAC rsquo17) ACM New York NY USA Article 76 pages httpsdoiorg10114530616393062313

[53] Khaled N Khasawneh Esmaeil Mohammadian Koruyeh ChengyuSong Dmitry Evtyushkin Dmitry Ponomarev and Nael Abu-Ghazaleh 2018 SafeSpec Banishing the Spectre of a Meltdown withLeakage-Free Speculation arXiv preprint arXiv180605179 (2018)

[54] Vladimir Kiriansky and Carl Waldspurger 2018 Speculative bufferoverflows Attacks and defenses arXiv preprint arXiv180703757(2018)

[55] Vladimir Kiriansky and Carl Waldspurger 2018 Speculative BufferOverflows Attacks and Defenses arXiv preprint arXiv180703757(2018)

[56] Paul Kocher 2018 Spectre Mitigations in MicrosoftrsquosCC++ Compiler httpswwwpaulkochercomdocMicrosoftCompilerSpectreMitigationhtml Online accessedJul 2018

[57] Paul Kocher Daniel Genkin Daniel Gruss Werner Haas MikeHamburg Moritz Lipp Stefan Mangard Thomas Prescher MichaelSchwarz and Yuval Yarom 2018 Spectre Attacks Exploiting Specu-lative Execution ArXiv e-prints (Jan 2018) arXiv180101203

[58] Esmaeil Mohammadian Koruyeh Khaled N Khasawneh ChengyuSong and Nael Abu-Ghazaleh 2018 Spectre Returns SpeculationAttacks using the Return Stack Buffer In 12th USENIXWorkshop on Of-fensive Technologies (WOOT 18) USENIX Association Baltimore MDhttpswwwusenixorgconferencewoot18presentationkoruyeh

[59] Esmaeil Mohammadian Koruyeh Khaled N Khasawneh ChengyuSong and Nael Abu-Ghazaleh 2018 Spectre Returns SpeculationAttacks using the Return Stack Buffer In 12th USENIXWorkshop on Of-fensive Technologies (WOOT 18) USENIX Association Baltimore MDhttpswwwusenixorgconferencewoot18presentationkoruyeh

[60] Monica S Lam Michael Martin Benjamin Livshits and John Wha-ley 2008 Securing Web Applications with Static and DynamicInformation Flow Tracking In Proceedings of the 2008 ACM SIG-PLAN Symposium on Partial Evaluation and Semantics-based Pro-gram Manipulation (PEPM rsquo08) ACM New York NY USA 3ndash12

httpsdoiorg10114513284081328410[61] Moritz Lipp Michael Schwarz Daniel Gruss Thomas Prescher

Werner Haas Anders Fogh Jann Horn Stefan Mangard Paul KocherDaniel Genkin Yuval Yarom and Mike Hamburg 2018 MeltdownReading Kernel Memory from User Space In 27th USENIX SecuritySymposium (USENIX Security 18)

[62] Fangfei Liu Yuval Yarom Qian Ge Gernot Heiser and Ruby BLee 2015 Last-Level Cache Side-Channel Attacks Are PracticalIn Proceedings of the 2015 IEEE Symposium on Security and Privacy(SP rsquo15) IEEE Computer Society Washington DC USA 605ndash622httpsdoiorg101109SP201543

[63] Jason Lowe-Power 2018 Visualizing Spectre with gem5 httpwwwlowepowercomjasonvisualizing-spectre-with-gem5html Onlineaccessed Jun 2018

[64] Jason Lowe-Power Venkatesh Akella Matthew K Farrens Samuel TKing and Christopher J Nitta 2018 Position Paper A Case forExposing Extra-architectural State in the ISA In Proceedings of the7th International Workshop on Hardware and Architectural Support forSecurity and Privacy (HASP rsquo18) ACM New York NY USA Article 86 pages httpsdoiorg10114532142923214300

[65] Scott A Mahlke David C Lin William Y Chen Richard E Hank andRoger A Bringmann 1992 Effective compiler support for predicatedexecution using the hyperblock In MICRO

[66] Giorgi Maisuradze and Christian Rossow 2018 Ret2Spec SpeculativeExecution Using Return Stack Buffers In Proceedings of the 2018 ACMSIGSAC Conference on Computer and Communications Security (CCSrsquo18) ACM New York NY USA 2109ndash2122 httpsdoiorg10114532437343243761

[67] Microsoft Security Response Center (MSRC) Matt Miller 2018Analysis and mitigation of speculative store bypass (CVE-2018-3639) httpsblogstechnetmicrosoftcomsrd20180521analysis-and-mitigation-of-speculative-store-bypass-cve-2018-3639 Onlineaccessed Jul 2018

[68] Keaton Mowery Sriram Keelveedhi and Hovav Shacham 2012 AreAES x86 cache timing attacks still feasible In Proceedings of the 2012ACM Workshop on Cloud computing security workshop

[69] Oleksii Oleksenko Bohdan Trach Tobias Reiher Mark Silbersteinand Christof Fetzer 2018 You Shall Not Bypass Employing datadependencies to prevent Bounds Check Bypass CoRR abs180508506(2018) arXiv180508506 httparxivorgabs180508506

[70] Yossef Oren Vasileios P Kemerlis Simha Sethumadhavan and An-gelos D Keromytis 2015 The Spy in the Sandbox Practical CacheAttacks in JavaScript and Their Implications In Proceedings of the22Nd ACM SIGSAC Conference on Computer and CommunicationsSecurity

[71] Guilherme Ottoni Thomas Hartin ChristopherWeaver Jason BrandtBelliappa Kuttanna and HongWang 2011 Harmonia A TransparentEfficient and Harmonious Dynamic Binary Translator Targeting theIntelampReg Architecture In Proceedings of the 8th ACM InternationalConference on Computing Frontiers (CF rsquo11) ACM New York NYUSA Article 26 10 pages httpsdoiorg10114520166042016635

[72] Andrew Pardoe 2018 Spectre mitigations in MSVC httpsblogsmsdnmicrosoftcomvcblog20180115spectre-mitigations-in-msvc Online accessed Jul 2018

[73] Joseph CH Park and Mike Schlansker 1991 On predicated executionHewlett-Packard Laboratories Palo Alto California

[74] Will Reese 2008 Nginx The High-performance Web Server andReverse Proxy Linux J 2008 173 Article 2 (Sept 2008) httpdlacmorgcitationcfmid=14122021412204

[75] Ryan Roemer Erik Buchanan Hovav Shacham and Stefan Savage2012 Return-oriented programming Systems languages and ap-plications ACM Transactions on Information and System Security(2012)

[76] Michael Schwarz Martin Schwarzl Moritz Lipp and Daniel Gruss2018 NetSpectre Read Arbitrary Memory over Network arXivpreprint arXiv180710535 (2018)

[77] Hovav Shacham Matthew Page Ben Pfaff Eu-Jin Goh NagendraModadugu and Dan Boneh 2004 On the Effectiveness of Address-space Randomization In Proceedings of the 11th ACM Conference onComputer and Communications Security (CCS rsquo04) ACM New YorkNY USA 298ndash307 httpsdoiorg10114510300831030124

[78] Julian Stecklina and Thomas Prescher 2018 LazyFP Leaking FPURegister State using Microarchitectural Side-Channels arXiv preprintarXiv180607480 (2018)

[79] G Edward Suh Jae W Lee David Zhang and Srinivas Devadas2004 Secure Program Execution via Dynamic Information FlowTracking In Proceedings of the 11th International Conference onArchitectural Support for Programming Languages and OperatingSystems (ASPLOS XI) ACM New York NY USA 85ndash96 httpsdoiorg10114510243931024404

[80] Mohammadkazem Taram Ashish Venkat and Dean Tullsen 2018Mobilizing the Micro-Ops Exploiting Context Sensitive Decodingfor Security and Energy Efficiency In Proceedings of the 45th AnnualInternational Symposium on Computer Architecture (ISCA rsquo18)

[81] Mohit Tiwari Hassan MG Wassel Bita Mazloom ShashidharMysore Frederic T Chong and Timothy Sherwood 2009 Com-plete Information Flow Tracking from the Gates Up In Proceedings ofthe 14th International Conference on Architectural Support for Program-ming Languages and Operating Systems (ASPLOS XIV) ACM NewYork NY USA 109ndash120 httpsdoiorg10114515082441508258

[82] Caroline Trippel Daniel Lustig and Margaret Martonosi 2018MeltdownPrime and SpectrePrime Automatically-Synthesized At-tacks Exploiting Invalidation-Based Coherence Protocols CoRRabs180203802 (2018) arXiv180203802 httparxivorgabs180203802

[83] DeanM Tullsen Susan J Eggers Joel S Emer Henry M Levy Jack L Loand Rebecca L Stamm 1996 Exploiting choice Instruction fetch andissue on an implementable simultaneous multithreading processor InACM SIGARCH Computer Architecture News Vol 24 ACM 191ndash202

[84] Dean M Tullsen Susan J Eggers and Henry M Levy 1995 Si-multaneous Multithreading Maximizing On-chip Parallelism InProceedings of the 22Nd Annual International Symposium on Com-puter Architecture (ISCA rsquo95) ACM New York NY USA 392ndash403httpsdoiorg101145223982224449

[85] Paul Turner 2018 Retpoline a software construct for preventingbranch-target-injection httpssupportgooglecomfaqsanswer7625886 Online accessed Jul 2018

[86] N Vachharajani M J Bridges J Chang R Rangan G Ottoni J ABlome G A Reis M Vachharajani and D I August 2004 RIFLEAn Architectural Framework for User-Centric Information-Flow Se-curity In Microarchitecture 2004 MICRO-37 2004 37th InternationalSymposium on 243ndash254 httpsdoiorg101109MICRO200431

[87] Jo Van Bulck Marina Minkin Ofir Weisse Daniel Genkin BarisKasikci Frank Piessens Mark Silberstein Thomas F Wenisch Yu-val Yarom and Raoul Strackx 2018 Foreshadow Extracting theKeys to the Intel SGX Kingdom with Transient Out-of-Order Execu-tion In Proceedings of the 27th USENIX Security Symposium USENIXAssociation See also technical report Foreshadow-NG [95]

[88] Ashish Venkat 2018 Breaking the ISA Barrier in Modern ComputingPhD Dissertation UC San Diego

[89] Ashish Venkat Arvind Krishnaswamy Koichi Yamada and RajanPalanivel 2015 Binary Translation driven Program State RelocationIn United States Patent Grant US009135435B2

[90] Ashish Venkat S Shamasunder Hovav Shacham and DeanM Tullsen2016 HIPStR Heterogeneous-ISA Program State Relocation In Pro-ceedings of the International Symposium on Architectural Support forProgramming Languages and Operating Systems

[91] Ashish Venkat and Dean M Tullsen 2014 Harnessing ISA diversityDesign of a heterogeneous-ISA chip multiprocessor In Proceedingsof the International Symposium on Computer Architecture

[92] G Venkataramani I Doudalis Y Solihin and M Prvulovic 2008FlexiTaint A programmable accelerator for dynamic taint propaga-tion In 2008 IEEE 14th International Symposium on High PerformanceComputer Architecture 173ndash184 httpsdoiorg101109HPCA20084658637

[93] Philipp Vogt Florian Nentwich Nenad Jovanovic Engin KirdaChristopher Kruegel and Giovanni Vigna 2007 Cross Site Script-ing Prevention with Dynamic Data Tainting and Static Analysis InNDSS Vol 2007 12

[94] Zhenghong Wang and Ruby B Lee 2006 Covert and side channelsdue to processor architecture In Computer Security Applications Con-ference 2006 ACSACrsquo06 22nd Annual IEEE 473ndash482

[95] Ofir Weisse Jo Van Bulck Marina Minkin Daniel Genkin BarisKasikci Frank Piessens Mark Silberstein Raoul Strackx Thomas FWenisch and Yuval Yarom 2018 Foreshadow-NG Breaking theVirtual Memory Abstraction with Transient Out-of-Order Execu-tion Technical report (2018) See also USENIX Security paper Fore-shadow [87]

[96] Troy Wolverton 2018 Spectre and Meltdown are now a le-gal pain for Intel the chip maker faces 35 lawsuits over the at-tacks httpswwwbusinessinsidercom35-lawsuits-have-been-filed-against-intel-over-spectre-and-meltdown-2018-2 Online ac-cessed Aug 2018

[97] Zhenyu Wu Zhang Xu and Haining Wang 2012 Whispers in theHyper-space High-speed Covert Channel Attacks in the Cloud InPresented as part of the 21st USENIX Security Symposium (USENIX Se-curity 12) USENIX Bellevue WA 159ndash173 httpswwwusenixorgconferenceusenixsecurity12technical-sessionspresentationwu

[98] Wei Xu Sandeep Bhatkar and Ramachandran Sekar 2006 Taint-Enhanced Policy Enforcement A Practical Approach to Defeat aWide Range of Attacks In USENIX Security Symposium 121ndash136

[99] M Yan J Choi D Skarlatos A Morrison C Fletcher and J Torrellas2018 InvisiSpec Making Speculative Execution Invisible in the CacheHierarchy In 2018 51st Annual IEEEACM International Symposium onMicroarchitecture (MICRO) 428ndash441 httpsdoiorg101109MICRO201800042

[100] Yuval Yarom and Katrina Falkner 2014 FLUSH+ RELOAD A HighResolution Low Noise L3 Cache Side-Channel Attack In USENIXSecurity

[101] Yuval Yarom Daniel Genkin and Nadia Heninger 2016 CacheBleedA timing attack on OpenSSL constant time RSA In InternationalConference on Cryptographic Hardware and Embedded Systems

[102] Alexander Yip Xi Wang Nickolai Zeldovich and M Frans Kaashoek2009 Improving Application Security with Data Flow AssertionsIn Proceedings of the ACM SIGOPS 22Nd Symposium on OperatingSystems Principles (SOSP rsquo09) ACM New York NY USA 291ndash304httpsdoiorg10114516295751629604

[103] Craig Zilles 2001 Linked List Traversal Micro-Benchmark httpzillescsillinoisedullubenchmarkhtml Online accessed Jun 2018

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
  • 3 Assumptions and Threat Model
  • 4 Architectural Overview
  • 5 Design and Implementation
    • 51 Microcode Customization
    • 52 Decoder-Level Information Flow Tracking
    • 53 Mitigations for Control-flow Mistraining
      • 6 Methodology
      • 7 Evaluation
        • 71 Security Discussion
        • 72 Performance
          • 8 Conclusion
          • Acknowledgments
          • References
Page 10: Context-Sensitive Fencing: Securing Speculative Execution ...av6ds/papers/asplos2019.pdf · Context-Sensitive Fencing: Securing Speculative Execution via Microcode Customization Mohammadkazem

Table 4 Architecture Detail for the Baseline x86 CoreBaseline Processor

Frequency 33 GHz I cache 32 KB 8 wayFetch width 4 fused uops D cache 32 KB 8 wayIssue width 6 unfused uops ROB size 168 entriesINTFP Regfile 160144 regs IQ 54 entriesRAS size 816 32 entries BTB size 256 512 1024 entriesLQSQ size 6436 entries Functional Int ALU(6) Mult(1)Branch Predictor LTAGE Units FP ALUMult(2) SIMD(2)

the operating system to configure MSRRs and we use specu-lative fences to guard x86rsquosWRMSR instruction Furthermorein the case of remapping via mmap() the operating systemis also responsible for reconfiguring MSRRs

The easiest and the most heavyweight way to isolate theBP state is to clear all the states by a series of returnbranchmicro-ops However such a hard reset of the BP can be morequickly and efficiently performed using a special micro-opthat clears some or all state of the branch predictor therebyenabling custom solutions to reset different structures withinthe prediction unit if such a micro-op is available In thiswork we assume a special micro-op to clear the branch targetbuffer (BTB) and the return stack buffer (RSB) the primarytargets of the Spectre attackIt should be noted that while resetting the BTB and the

RSB will induce more mispredictions (something the attackerwants) it does the attacker no good with respect to theseSpectre variants because the attacker can no longer controlthe target of the control flow misspeculation

6 MethodologyThis section details the experimental methodology for theperformance and security evaluation of the context-sensitivefencing frameworkOur baseline processor is modeled after the Intel Sandy-

bridge microarchitecture [43] Table 4 shows the architec-tural configuration of our baseline processor in more detailNote that we evaluate against three different branch pre-dictor configurations (with different BTB and RAS sizes) tomeasure the performance impact of our micro-op flows thatperform branch predictor state isolation across protection do-mains to defend against variant-2Wemodel this architectureusing the gem5 [8] architectural simulator which already fea-tures x86 micro-op translation Furthermore gem5 alreadyfeatures a Spectre test infrastructure and a visualization toolthat allow us to validate our claims regarding the securityguarantees of context-sensitive fencing [63]One of the primary goals of this work is to protect ker-

nel memory from being leaked along misspeculated pathsTherefore we use the full system simulation mode of gem5which allows us to boot an Ubuntu 1804 distribution ofLinux with a kernel version of 4813 Furthermore in orderto provide realistic estimates regarding attack coverage andperformance impact of our proposed techniques we selecta good mix of benchmarks that spend different amounts ofexecution time in kernel mode touch different aspects of the

Table 5 Benchmarks DescriptionBenchmark Description Kernel-Time

nginx HTTP Web Server 66ps Process information query 75ping Sends ICMP ECHO_REQUEST to loopback 95ls Performs two level directory listing 79llu Linked list traversal micro-benchmark 7

bzip2 Compression 34gcc C Language optimizing compiler 11

omnetpp Discrete event simulation 15

sjeng Artificial Intelligence (game tree searchand pattern recognition) 22

operating system and perform different number of systemcalls during execution as illustrated in Table 5 To this endwe include four benchmarks bzip2 gcc sjeng omnet fromthe SPEC CPU2006 suite [41] that exhibit varying degreesof instruction-level parallelism Furthermore we use threecommonly used Unix tools that target different functional-ities of the kernel and use different device drivers ndash (ping)that sends and receives ICMP packets (ps aux) that queriesprocess information and (ls ) that queries the filesystemin order to list multi-level directory information In additionwe also add a memory-intensive program llu that allocatesa large amount of memory for a linked list and traversesthe list making random memory accesses [103] Finally wealso evaluate the impact of our fences on the nginx webserver [74] using the wrk [35] framework to generate HTTPrequests

7 EvaluationIn this section we evaluate both security and performanceimpacts of the proposed strategy starting first with securityassessment and following that with performance evaluation

71 Security DiscussionThe proposed defense strategy can mitigate five variants ofthe Spectre attacks We discuss these in detail below

Variants 1 11 and 12 The goal of these variants is toleak memory contents of a victim along a misspseculatedpath by bypassing a bounds check andor further stitch to-gether such Spectre gadgets via store-to-load forwardingAll variants exploit a data cache-based side channel to leakinformation Context-Sensitive Fencing mitigates these vari-ants primarily by preventing information from being leakedalong misspeculated paths via fences that are surgically in-serted into the instruction stream and strategically placedbetween the conditional branch that performs the boundscheck and the first load in the Spectre gadget This workproposes three new fences ndash LSQ-LFENCE LSQ-MFENCEand CFENCE that each defend against the variant-1 attack bydisallowing loads along misspeculated paths to modify thedata cache state LSQ-MFENCE additionally protects againstthe variants 11 and 12 by preventing both speculative loadsand stores from being issued out of the loadstore queue

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

Injected for All Kernel Loads

100

105

110

115

120

125

130

135

140

145

150

155

160

165

170

175

180

185

190

195

200

Nor

mal

ized

Exe

cution

Tim

e

LFENCE-LC

LSQ-MFENCE-LC

CFENCE-LC

Figure 6 Execution Time of Different Fence EnforcementLevels (normalized to insecure execution)

effectively avoiding store-to-load forwarding between spec-ulative loads and stores

The surgical insertion of these fences is facilitated by thedecoder-level information flow tracker (DLIFT) that can fol-low instruction sequences that could serve as Spectre gad-gets as described in Section 52 While the DLIFT itself per-forms speculative tracking of taint information it has theability to detect mistaints and recover within three stages ofthe pipeline We further experimentally show later in thissection that the speculative nature of the DLIFT engine maysometimes result in overtainting (where trusted operand-sinstructions get marked as tainted) but never results in anundetected and unrecovered undertainting (where untrustedoperandsinstructions remain untainted)

Variants 2 and 5 These variants reverse-engineer andmistrain the branch predictor and the return address stack toinfluence the branch outcomes of a victim which in our caseis the Linux kernel In addition to the surgical fence injec-tion and speculative taint tracking context-sensitive fencingfurther employs mistraining mitigations that circumventsuch attempts More specifically we intercept all protectiondomain crossings via syscall and int 0x80 instructions andperform a full reset of the branch target buffer (BTB) and thereturn stack buffer (RSB) clearing all previous state Thisprevents user code from influencing branch outcomes of thekernel code We record no instance of BTB collision betweenuser and kernel branches and we ensure that we alwaysstart with a clean BTB and RSB upon entering kernel mode

To show that our different fences and frequency optimiza-tions close Spectre attacks we use a proof of concept Spectreimplementation and visualization tool [63] The attacks failin every case when we use context-sensitive fencing to insertfences despite our performance optimizations

72 PerformanceFigure 6 measures the performance impact of three differentfences ndash (a) the standard x86 LFENCE (b) the LSQ-MFENCE

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

Injected for All Kernel Loads

100

105

110

115

120

125

130

135

140

145

Nor

mal

ized

Exe

cution

Tim

e

CFENCE-LC

CFENCE-EC

Figure 7 Execution Time of Early and Late Commit ofCFENCE (normalized to insecure execution)

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

00

01

02

03

04

05

06

07

08

09

10

Rat

ioO

ver

All

Loa

ds

bzip2 gcc omnet llu nginx sjeng ls ping ps

Non-Modifying Misses Non-Modifying Hits Misses Hits

Figure 8 Effects of injecting CFENCE on Cache Miss Rate

and (c) the CFENCE all enforced with the standard latecommit pessimistically inserted for every kernel load in theprogram Clearly from the figure the CFENCE incurs thelowest performance overhead since it is less restrictive thanthe other two and is enforced at a much later pipeline stageWe further study the effect of late and early commit policiesfor the CFENCE as shown in Figure 7 The early commitversion of the CFENCE consistently performs better than thelate commit version saving about 4 in overall executiontime on averageOverall the CFENCE reduces the incurred performance

overhead due to fencing by 23X bringing down the exe-cution time overhead from 48 to 21 Furthermore thisperformance improvement is consistent across all the bench-marks the only exception being llu which performs randommemory accesses due to the linked list traversal and suffersfrom a very high cache miss rateIn Figure 8 we study the effect of the CFENCE on cache

miss rate Recall that when a load passes a CFENCE it getsmarked as non-modifying and as a result if it ends up beinga miss in the cache it is deemed uncacheable and therefore

the fetched cache block isnrsquot brought into cache If the missrate of a program is already high as in the case of llu thepresence of a CFENCE in the instruction stream could po-tentially result in under-utilization of the cache since themissed blocks arenrsquot being filled back into the cache whilethe CFENCE is begin enforced For programs that have un-usually low hit rates we suggest using the standard LFENCEinstead of the CFENCE However in those cases it should benoted that the overhead of the mitigation is low regardlessof the fence usedFurthermore it is important to note that the number of

non-modifying loads in the dynamic instruction stream donot necessarily have a negative impact on performancewhile a CFENCE is being enforced For example while theping program has the most non-modifying loads it doesnot suffer much in terms of overall performance becausethe working set fits well into the cache and most of thenon-modifying accesses end up being hits In general wegain more from employing the CFENCE instead of standardserializing fences when the cache hit rate is lowFigure 9 shows the performance of our proposed fence

frequency optimization techniques In the first optimizationwe inject the CFENCE only for tainted loads and branches asindicated by the DLIFT engine This reduces the performanceoverhead of the defense from 21 to 11 on average We fur-ther optimize the number of fences inserted by performingbasic block-level fence insertion where we only instrumentthe first instance of a vulnerable load in each basic block Thisresults in an additional 4 improvement in performance AsFigure 9 shows the once per basic block opitimization is suc-cessful at improving the performance of all the benchmarksexcept llu That is because CFENCE changes the cache accesspattern usually at some cost but occasionally beneficiallyby bypassing accesses and reducing pressure on the cacheIn such a case like llu with its large working set executingfewer fences would then be slightly less beneficial Overallcompared to the state-of-the-art fence injection scheme thatpessimistically injects an LFENCE for all kernel loads ourDLIFT-Based CFENCE injection reduces the performanceoverhead from 48 to just 77 on averageWe next study the accuracy and coverage of our DLIFT

implementation Figure 10 examines overtainting scenarioswhere the DLIFT engine conservatively marks instructionsas tainted when theyrsquore actually not The results are nor-malized to the total number of dynamic load instructionsexecuted Further we examine two scenarios ndash the bars tothe left represent DLIFT-based fencing without the basic-block level fence insertion and the bars to the right includesthe basic-block level fence insertion optimization In mostcases the percentage of overtainted loads remains low inboth scenariosFigure 11 examines undertainting scenarios where the

DLIFT engine optimistically forgoes instrumenting certaininstructions that are actually vulnerable but recovers from

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

CFENCE

100

105

110

115

120

125

130

135

140

145

150

155

160

Nor

mal

ized

Exe

cution

Tim

e

Always Fencing

DLIFT-Based Fencing

DLIFT + Once per BB

Figure 9 Impact of Fence Frequency Optimizations on Exe-cution Time

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

00

01

02

03

04

05

06

07

08

09

10

Rat

ioO

ver

All

Loa

ds

ping ps nginx bzip2 ls sjeng llu omnet gcc

Over-tainted Loads Tainted Loads

Figure 10 Accuracy of DLIFT Overtainting Rate of Loads

such scenarios later in the pipeline We observe that thepercentage of undertainted loads is low and consistentlybelow 10 with the outliers being the kernel-heavy applica-tions ping and ps which have the highest number of taintedinstructions (instructions with at least one tainted sourceregister) as shown in Figure 12Note that our DLIFT engine performs explicit informa-

tion flow tracking modeled after DIFT [79] that tracks copyload-address store-address and computational dependen-cies However we also evaluate CSF with a more conser-vative implicit information flow tracking model in whichwe taint the program counter when the branch outcomedepends upon a tainted value and further track both sidesof the branch This results in more tainted instructions andincurs 118 extra performance overhead on top of our taint-based CFENCE insertionTo evaluate the effect of our mistraining mitigations we

measure the impact of a full micro-op based BTB and RSBreset at every protection domain crossing We do this ex-periment on three separate branch predictor configurations(a) a small predictor with 256 BTB and 8 RAS entries (b) a

pssje

ngbz

ip2

ping llu gc

c ls

nginx

omne

t

Speculative DIFT ndash Sfence

000

001

002

003

004

005

006

007

008

009

Rat

ioof

Under

-tai

nte

dLoa

ds

Figure 11 Coverage of DLIFT Undertainting Rate of Loads

omnet bzip2 llu ls nginx ping ps sjeng gcc

Speculative DIFT ndash Sfence

00

01

02

03

04

05

06

07

08

09

10

Tai

nte

dIn

tstr

uct

ions

Tainted Untatinted

Figure 12 Ratio of instructions marked as tainted by DLIFT

medium predictor with 512 BTB and 16 RAS entries and(c) a large predictor with 1024 BTB and 32 RAS entries Wemeasure an average performance degradation of 27 on oursmall predictor 66 on our medium predictor and 152 onour large predictor Naturally the overhead is almost com-pletely due to the loss of prediction accuracy rather thanthe cost of micro-op expansionIn summary the proposed defense strategy introduces a

flexible microcode customization framework that performthe surgical insertion of newly introduced speculation fencesthat mitigate five variants of the Spectre class of attacksreducing the fencing overhead of state-of-the-art fence-basedSpectre mitigations by a factor of 6

8 ConclusionIn this work we propose context-sensitive fencing (CSF) aset of architectural techniques that provide high-performancedefense against Spectre-style attacks In particular we showthat we can reduce fencing overhead by a factor of 6 com-pared to a conservative fence insertion method

This is done by injecting fence instructions dynamicallyin the decoder with no recompilation and binary translationrequired This allows us to employ runtime information tostrategically insert fences when needed using taint infor-mation and avoiding redundant fences after branches Thiswork also introduces new fence primitives which protectsensitive structures from speculation-based microarchitec-tural effects with minimal impact on instruction throughputin the pipeline

AcknowledgmentsThe authors would like to thank the anonymous reviewersfor their helpful insights This research was supported inpart by NSF Grant CNS-1652925 NSFIntel FoundationalMicroarchitecture Research Grant CCF-1823444 and DARPAunder the agreement number HR0011-18-C-0020

References[1] Onur Acıiccedilmez Ccediletin Kaya Koccedil and Jean-Pierre Seifert 2007 Predict-

ing secret keys via branch prediction In CryptographersacircĂŹ Track atthe RSA Conference

[2] Onur Acıiccedilmez Werner Schindler and Ccediletin K Koccedil 2007 Cachebased remote timing attack on the AES In CryptographersacircĂŹ Trackat the RSA Conference

[3] Shaizeen Aga and Satish Narayanasamy 2017 InvisiMem SmartMemory Defenses for Memory Bus Side Channel In Proceedings ofthe 44th Annual International Symposium on Computer Architecture(ISCA rsquo17) ACM New York NY USA 94ndash106 httpsdoiorg10114530798563080232

[4] AMD 2018 White paper SOFTWARE TECHNIQUES FOR MANAG-ING SPECULATION ON AMD PROCESSORS Technical Report RE-VISION 12418 httpsdeveloperamdcomwp-contentresourcesManaging-Speculation-on-AMD-Processorspdf

[5] Nadav Amit Muli Ben-Yehuda Dan Tsafrir and Assaf Schuster 2011vIOMMU Efficient IOMMU Emulation In USENIX Annual TechnicalConference

[6] ARM 2018 Whitepaper Cache Speculation Side-channelsTechnical Report Version 22 httpsdeveloperarmcom-mediadeveloperpdfSecurity20update201020July2018CacheSpeculationSide-channels-v22pdfrevision=7de26366-a49f-4c23-85f0-34e8f5e38881

[7] Eli Biham and Adi Shamir 1997 Differential fault analysis of secretkey cryptosystems In Annual International Cryptology Conference

[8] Nathan L Binkert Ronald G Dreslinski Lisa R Hsu Kevin T Lim Ali GSaidi and Steven K Reinhardt 2006 The M5 Simulator ModelingNetworked Systems Micro IEEE (2006)

[9] Darrell Boggs Gary Brown Nathan Tuck and K S Venkatraman2015 Denver Nvidiarsquos First 64-bit ARM Processor IEEE Micro (2015)

[10] Chandler Carruth 2018 Mitigating Speculative Attacks inCrypto httpsgithubcomHACS-workshopspectre-mitigationsblobmastercryptoguidelinesmd Online accessed Jul 2018

[11] Chandler Carruth 2018 RFC Speculative Load Hardening (aSpectre variant1 mitigation) httpsdocsgooglecomdocumentd1wwcfv3UV9ZnZVcGiGuoITT61eKo3TmoCS3uXLcJR0edit Onlineaccessed Jul 2018

[12] Chandler Carruth 2018 RFC Speculative Load Hardening (a Spectrevariant1 mitigation) httpslistsllvmorgpipermailllvm-dev2018-March122085html Online accessed Jul 2018

[13] Guoxing Chen Sanchuan Chen Yuan Xiao Yinqian Zhang ZhiqiangLin and Ten H Lai 2018 SgxPectre Attacks Leaking Enclave

Secrets via Speculative Execution CoRR abs180209085 (2018)arXiv180209085 httparxivorgabs180209085

[14] H Chen X Wu L Yuan B Zang P c Yew and F T Chong 2008From Speculation to Security Practical and Efficient InformationFlow Tracking Using Speculative Hardware In 2008 InternationalSymposium on Computer Architecture 401ndash412 httpsdoiorg101109ISCA200818

[15] Shimin Chen Phillip B GibbonsMichael Kozuch and Todd CMowry2011 Log-based Architectures Using Multicore to Help SoftwareBehave Correctly SIGOPS Oper Syst Rev 45 1 (Feb 2011) 84ndash91httpsdoiorg10114519450231945034

[16] Hari Cherupalli Henry Duwe Weidong Ye Rakesh Kumar and JohnSartori 2017 Software-based Gate-level Information Flow Securityfor IoT Systems In Proceedings of the 50th Annual IEEEACM Interna-tional Symposium on Microarchitecture (MICRO-50 rsquo17) ACM NewYork NY USA 328ndash340 httpsdoiorg10114531239393123955

[17] JonathanCorbet 2017 The current state of kernel page-table isolationhttpslwnnetArticles741878 Online accessed Jan 2018

[18] Marc L Corliss E Christopher Lewis and Amir Roth 2003 DISE Aprogrammable macro engine for customizing applications In Com-puter Architecture 2003 Proceedings 30th Annual International Sym-posium on

[19] Marc L Corliss E Christopher Lewis and Amir Roth 2005 Low-overhead interactive debugging via dynamic instrumentation withDISE In 11th International Symposium on High-Performance ComputerArchitecture

[20] Marc L Corliss E Christopher Lewis and Amir Roth 2005 UsingDISE to Protect Return Addresses from Attack SIGARCH ComputArchit News (March 2005)

[21] Jedidiah R Crandall and Frederic T Chong 2004 Minos Control DataAttack Prevention Orthogonal toMemoryModel In Proceedings of the37th Annual IEEEACM International Symposium on Microarchitecture(MICRO 37) IEEE Computer Society Washington DC USA 221ndash232httpsdoiorg101109MICRO200426

[22] John Criswell Nathan Dautenhahn and Vikram Adve 2014 VirtualGhost Protecting Applications from Hostile Operating Systems InProceedings of the 19th International Conference on Architectural Sup-port for Programming Languages and Operating Systems (ASPLOS rsquo14)ACM New York NY USA 81ndash96 httpsdoiorg10114525419402541986

[23] John Criswell Andrew Lenharth Dinakar Dhurjati and Vikram Adve2007 Secure Virtual Architecture A Safe Execution Environmentfor Commodity Operating Systems In Proceedings of Twenty-firstACM SIGOPS Symposium on Operating Systems Principles (SOSP rsquo07)ACM New York NY USA 351ndash366 httpsdoiorg10114512942611294295

[24] Michael Dalton Hari Kannan and Christos Kozyrakis 2007 RakshaA Flexible Information Flow Architecture for Software Security InProceedings of the 34th Annual International Symposium on ComputerArchitecture (ISCA rsquo07) ACM New York NY USA 482ndash493 httpsdoiorg10114512506621250722

[25] James C Dehnert Brian K Grant John P Banning Richard JohnsonThomas Kistler Alexander Klaiber and Jim Mattson 2003 The Trans-meta Code Morphing Software using speculation recovery andadaptive retranslation to address real-life challenges In Proceedingsof the international symposium on Code generation and optimizationfeedback-directed and runtime optimization

[26] Matthew DeVuyst Ashish Venkat and Dean M Tullsen 2012 Ex-ecution migration in a heterogeneous-isa chip multiprocessor InProceedings of the Seventeenth International Conference on Architec-tural Support for Programming Languages and Operating Systems

[27] Craig Disselkoen David Kohlbrenner Leo Porter and Dean Tullsen2017 Prime+Abort A Timer-Free High-Precision L3 CacheAttack using Intel TSX In 26th USENIX Security Symposium

(USENIX Security 17) USENIX Association Vancouver BC 51ndash67 httpswwwusenixorgconferenceusenixsecurity17technical-sessionspresentationdisselkoen

[28] Xiaowan Dong Zhuojia Shen John Criswell Alan Cox and SandhyaDwarkadas 2018 Spectres Virtual Ghosts and Hardware SupportIn Proceedings of the 7th International Workshop on Hardware andArchitectural Support for Security and Privacy (HASP rsquo18) ACM NewYork NY USA Article 5 9 pages httpsdoiorg10114532142923214297

[29] Xiaowan Dong Zhuojia Shen John Criswell Alan L Cox and Sand-hya Dwarkadas 2018 Shielding Software From Privileged Side-Channel Attacks In 27th USENIX Security Symposium (USENIX Se-curity 18) USENIX Association Baltimore MD 1441ndash1458 httpswwwusenixorgconferenceusenixsecurity18presentationdong

[30] Dmitry Evtyushkin Ryan Riley Nael CSE Abu-Ghazaleh ECE andDmitry Ponomarev 2018 BranchScope A New Side-Channel Attackon Directional Branch Predictor In Proceedings of the Twenty-ThirdInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems (ASPLOS rsquo18) ACM New York NYUSA 693ndash707 httpsdoiorg10114531731623173204

[31] Karine Gandolfi Christophe Mourtel and Francis Olivier 2001 Elec-tromagnetic analysis Concrete results In International Workshop onCryptographic Hardware and Embedded Systems

[32] Qian Ge Yuval Yarom David Cock and Gernot Heiser 2018 Asurvey of microarchitectural timing attacks and countermeasures oncontemporary hardware Journal of Cryptographic Engineering 8 1(2018) 1ndash27

[33] Daniel Genkin Adi Shamir and Eran Tromer 2014 RSA key ex-traction via low-bandwidth acoustic cryptanalysis In InternationalCryptology Conference

[34] Daniel Genkin Adi Shamir and Eran Tromer 2016 Acoustic Crypt-analysis Journal of Cryptology (2016)

[35] Github [n d] wrk - a HTTP benchmarking tool httpsgithubcomwgwrk Online accessed Jun 2018

[36] Daniel Gruss Moritz Lipp Michael Schwarz Richard Fellner Cleacute-mentine Maurice and Stefan Mangard 2017 Kaslr is dead long livekaslr In International Symposium on Engineering Secure Software andSystems Springer 161ndash176

[37] Daniel Gruss Raphael Spreitzer and Stefan Mangard 2015Cache Template Attacks Automating Attacks on InclusiveLast-Level Caches In 24th USENIX Security Symposium (USENIXSecurity 15) USENIX Association Washington DC 897ndash912httpswwwusenixorgconferenceusenixsecurity15technical-sessionspresentationgruss

[38] Maurice CleacutementineWagner Klaus Gruss Daniel and StefanMangard2016 Flush+Flush A Fast and Stealthy Cache Attack

[39] V Haldar D Chandra andM Franz 2005 Dynamic taint propagationfor Java In 21st Annual Computer Security Applications Conference(ACSACrsquo05) 9 ppndash311 httpsdoiorg101109CSAC200521

[40] W Halfond A Orso and P Manolios 2008 WASP Protecting WebApplications Using Positive Tainting and Syntax-Aware EvaluationIEEE Transactions on Software Engineering 34 1 (Jan 2008) 65ndash81httpsdoiorg101109TSE200770748

[41] John L Henning 2006 SPEC CPU2006 Benchmark DescriptionsSIGARCH Comput Archit News (Sept 2006)

[42] Jonathan J Hoch and Adi Shamir 2004 Fault analysis of streamciphers In International Workshop on Cryptographic Hardware andEmbedded Systems

[43] Intel 2008 2nd Generation Intel Core vPro Processor Family Tech-nical Report httpwwwintelcomcontentdamdocwhite-paperperformance-2nd-generation-core-vpro-family-paperpdf

[44] Intel 2014 Software guard extensions programming reference (2014)httpssoftwareintelcomsitesdefaultfiles329298-001pdf

[45] Intel 2018 White paper Intel Analysis of Speculative Exe-cution Side Channels Technical Report 336983-001 Revision10 httpsnewsroomintelcomwp-contentuploadssites11201801Intel-Analysis-of-Speculative-Execution-Side-Channelspdf

[46] Intel 2018 White paper Retpoline A Branch Target In-jection Mitigation Technical Report 337131-003 Revision003 httpssoftwareintelcomsitesdefaultfilesmanaged1d46Retpoline-A-Branch-Target-Injection-Mitigationpdf

[47] Intel Corporation 2009 Intelreg 64 and IA-32 Architectures OptimizationReference Manual

[48] Intel Corporation 2011 Intel 64 and IA-32 Architectures SoftwareDeveloperrsquos Manual Intel Corporation

[49] Project Zero Jann Horn 2018 Reading privileged memory with a side-channel httpsgoogleprojectzeroblogspotcom201801reading-privileged-memory-with-sidehtml Online accessed Jan 2018

[50] Morteza Mohajjel Kafshdooz Mohammadkazem Taram Sepehr As-sadi and Alireza Ejlali 2016 A Compile-Time Optimization MethodforWCET Reduction in Real-Time Embedded Systems Through BlockFormation ACMTrans Archit Code Optim 12 4 Article 66 (Jan 2016)25 pages httpsdoiorg1011452845083

[51] R Karri K Wu P Mishra and Yongkook Kim 2002 Concurrenterror detection schemes for fault-based side-channel cryptanalysisof symmetric block ciphers IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems 21 12 (Dec 2002) 1509ndash1517httpsdoiorg101109TCAD2002804378

[52] Mehmet Kayaalp Khaled N Khasawneh Hodjat Asghari EsfedenJesse Elwell Nael Abu-Ghazaleh Dmitry Ponomarev and AamerJaleel 2017 RIC Relaxed Inclusion Caches for Mitigating LLC Side-Channel Attacks In Proceedings of the 54th Annual Design AutomationConference 2017 (DAC rsquo17) ACM New York NY USA Article 76 pages httpsdoiorg10114530616393062313

[53] Khaled N Khasawneh Esmaeil Mohammadian Koruyeh ChengyuSong Dmitry Evtyushkin Dmitry Ponomarev and Nael Abu-Ghazaleh 2018 SafeSpec Banishing the Spectre of a Meltdown withLeakage-Free Speculation arXiv preprint arXiv180605179 (2018)

[54] Vladimir Kiriansky and Carl Waldspurger 2018 Speculative bufferoverflows Attacks and defenses arXiv preprint arXiv180703757(2018)

[55] Vladimir Kiriansky and Carl Waldspurger 2018 Speculative BufferOverflows Attacks and Defenses arXiv preprint arXiv180703757(2018)

[56] Paul Kocher 2018 Spectre Mitigations in MicrosoftrsquosCC++ Compiler httpswwwpaulkochercomdocMicrosoftCompilerSpectreMitigationhtml Online accessedJul 2018

[57] Paul Kocher Daniel Genkin Daniel Gruss Werner Haas MikeHamburg Moritz Lipp Stefan Mangard Thomas Prescher MichaelSchwarz and Yuval Yarom 2018 Spectre Attacks Exploiting Specu-lative Execution ArXiv e-prints (Jan 2018) arXiv180101203

[58] Esmaeil Mohammadian Koruyeh Khaled N Khasawneh ChengyuSong and Nael Abu-Ghazaleh 2018 Spectre Returns SpeculationAttacks using the Return Stack Buffer In 12th USENIXWorkshop on Of-fensive Technologies (WOOT 18) USENIX Association Baltimore MDhttpswwwusenixorgconferencewoot18presentationkoruyeh

[59] Esmaeil Mohammadian Koruyeh Khaled N Khasawneh ChengyuSong and Nael Abu-Ghazaleh 2018 Spectre Returns SpeculationAttacks using the Return Stack Buffer In 12th USENIXWorkshop on Of-fensive Technologies (WOOT 18) USENIX Association Baltimore MDhttpswwwusenixorgconferencewoot18presentationkoruyeh

[60] Monica S Lam Michael Martin Benjamin Livshits and John Wha-ley 2008 Securing Web Applications with Static and DynamicInformation Flow Tracking In Proceedings of the 2008 ACM SIG-PLAN Symposium on Partial Evaluation and Semantics-based Pro-gram Manipulation (PEPM rsquo08) ACM New York NY USA 3ndash12

httpsdoiorg10114513284081328410[61] Moritz Lipp Michael Schwarz Daniel Gruss Thomas Prescher

Werner Haas Anders Fogh Jann Horn Stefan Mangard Paul KocherDaniel Genkin Yuval Yarom and Mike Hamburg 2018 MeltdownReading Kernel Memory from User Space In 27th USENIX SecuritySymposium (USENIX Security 18)

[62] Fangfei Liu Yuval Yarom Qian Ge Gernot Heiser and Ruby BLee 2015 Last-Level Cache Side-Channel Attacks Are PracticalIn Proceedings of the 2015 IEEE Symposium on Security and Privacy(SP rsquo15) IEEE Computer Society Washington DC USA 605ndash622httpsdoiorg101109SP201543

[63] Jason Lowe-Power 2018 Visualizing Spectre with gem5 httpwwwlowepowercomjasonvisualizing-spectre-with-gem5html Onlineaccessed Jun 2018

[64] Jason Lowe-Power Venkatesh Akella Matthew K Farrens Samuel TKing and Christopher J Nitta 2018 Position Paper A Case forExposing Extra-architectural State in the ISA In Proceedings of the7th International Workshop on Hardware and Architectural Support forSecurity and Privacy (HASP rsquo18) ACM New York NY USA Article 86 pages httpsdoiorg10114532142923214300

[65] Scott A Mahlke David C Lin William Y Chen Richard E Hank andRoger A Bringmann 1992 Effective compiler support for predicatedexecution using the hyperblock In MICRO

[66] Giorgi Maisuradze and Christian Rossow 2018 Ret2Spec SpeculativeExecution Using Return Stack Buffers In Proceedings of the 2018 ACMSIGSAC Conference on Computer and Communications Security (CCSrsquo18) ACM New York NY USA 2109ndash2122 httpsdoiorg10114532437343243761

[67] Microsoft Security Response Center (MSRC) Matt Miller 2018Analysis and mitigation of speculative store bypass (CVE-2018-3639) httpsblogstechnetmicrosoftcomsrd20180521analysis-and-mitigation-of-speculative-store-bypass-cve-2018-3639 Onlineaccessed Jul 2018

[68] Keaton Mowery Sriram Keelveedhi and Hovav Shacham 2012 AreAES x86 cache timing attacks still feasible In Proceedings of the 2012ACM Workshop on Cloud computing security workshop

[69] Oleksii Oleksenko Bohdan Trach Tobias Reiher Mark Silbersteinand Christof Fetzer 2018 You Shall Not Bypass Employing datadependencies to prevent Bounds Check Bypass CoRR abs180508506(2018) arXiv180508506 httparxivorgabs180508506

[70] Yossef Oren Vasileios P Kemerlis Simha Sethumadhavan and An-gelos D Keromytis 2015 The Spy in the Sandbox Practical CacheAttacks in JavaScript and Their Implications In Proceedings of the22Nd ACM SIGSAC Conference on Computer and CommunicationsSecurity

[71] Guilherme Ottoni Thomas Hartin ChristopherWeaver Jason BrandtBelliappa Kuttanna and HongWang 2011 Harmonia A TransparentEfficient and Harmonious Dynamic Binary Translator Targeting theIntelampReg Architecture In Proceedings of the 8th ACM InternationalConference on Computing Frontiers (CF rsquo11) ACM New York NYUSA Article 26 10 pages httpsdoiorg10114520166042016635

[72] Andrew Pardoe 2018 Spectre mitigations in MSVC httpsblogsmsdnmicrosoftcomvcblog20180115spectre-mitigations-in-msvc Online accessed Jul 2018

[73] Joseph CH Park and Mike Schlansker 1991 On predicated executionHewlett-Packard Laboratories Palo Alto California

[74] Will Reese 2008 Nginx The High-performance Web Server andReverse Proxy Linux J 2008 173 Article 2 (Sept 2008) httpdlacmorgcitationcfmid=14122021412204

[75] Ryan Roemer Erik Buchanan Hovav Shacham and Stefan Savage2012 Return-oriented programming Systems languages and ap-plications ACM Transactions on Information and System Security(2012)

[76] Michael Schwarz Martin Schwarzl Moritz Lipp and Daniel Gruss2018 NetSpectre Read Arbitrary Memory over Network arXivpreprint arXiv180710535 (2018)

[77] Hovav Shacham Matthew Page Ben Pfaff Eu-Jin Goh NagendraModadugu and Dan Boneh 2004 On the Effectiveness of Address-space Randomization In Proceedings of the 11th ACM Conference onComputer and Communications Security (CCS rsquo04) ACM New YorkNY USA 298ndash307 httpsdoiorg10114510300831030124

[78] Julian Stecklina and Thomas Prescher 2018 LazyFP Leaking FPURegister State using Microarchitectural Side-Channels arXiv preprintarXiv180607480 (2018)

[79] G Edward Suh Jae W Lee David Zhang and Srinivas Devadas2004 Secure Program Execution via Dynamic Information FlowTracking In Proceedings of the 11th International Conference onArchitectural Support for Programming Languages and OperatingSystems (ASPLOS XI) ACM New York NY USA 85ndash96 httpsdoiorg10114510243931024404

[80] Mohammadkazem Taram Ashish Venkat and Dean Tullsen 2018Mobilizing the Micro-Ops Exploiting Context Sensitive Decodingfor Security and Energy Efficiency In Proceedings of the 45th AnnualInternational Symposium on Computer Architecture (ISCA rsquo18)

[81] Mohit Tiwari Hassan MG Wassel Bita Mazloom ShashidharMysore Frederic T Chong and Timothy Sherwood 2009 Com-plete Information Flow Tracking from the Gates Up In Proceedings ofthe 14th International Conference on Architectural Support for Program-ming Languages and Operating Systems (ASPLOS XIV) ACM NewYork NY USA 109ndash120 httpsdoiorg10114515082441508258

[82] Caroline Trippel Daniel Lustig and Margaret Martonosi 2018MeltdownPrime and SpectrePrime Automatically-Synthesized At-tacks Exploiting Invalidation-Based Coherence Protocols CoRRabs180203802 (2018) arXiv180203802 httparxivorgabs180203802

[83] DeanM Tullsen Susan J Eggers Joel S Emer Henry M Levy Jack L Loand Rebecca L Stamm 1996 Exploiting choice Instruction fetch andissue on an implementable simultaneous multithreading processor InACM SIGARCH Computer Architecture News Vol 24 ACM 191ndash202

[84] Dean M Tullsen Susan J Eggers and Henry M Levy 1995 Si-multaneous Multithreading Maximizing On-chip Parallelism InProceedings of the 22Nd Annual International Symposium on Com-puter Architecture (ISCA rsquo95) ACM New York NY USA 392ndash403httpsdoiorg101145223982224449

[85] Paul Turner 2018 Retpoline a software construct for preventingbranch-target-injection httpssupportgooglecomfaqsanswer7625886 Online accessed Jul 2018

[86] N Vachharajani M J Bridges J Chang R Rangan G Ottoni J ABlome G A Reis M Vachharajani and D I August 2004 RIFLEAn Architectural Framework for User-Centric Information-Flow Se-curity In Microarchitecture 2004 MICRO-37 2004 37th InternationalSymposium on 243ndash254 httpsdoiorg101109MICRO200431

[87] Jo Van Bulck Marina Minkin Ofir Weisse Daniel Genkin BarisKasikci Frank Piessens Mark Silberstein Thomas F Wenisch Yu-val Yarom and Raoul Strackx 2018 Foreshadow Extracting theKeys to the Intel SGX Kingdom with Transient Out-of-Order Execu-tion In Proceedings of the 27th USENIX Security Symposium USENIXAssociation See also technical report Foreshadow-NG [95]

[88] Ashish Venkat 2018 Breaking the ISA Barrier in Modern ComputingPhD Dissertation UC San Diego

[89] Ashish Venkat Arvind Krishnaswamy Koichi Yamada and RajanPalanivel 2015 Binary Translation driven Program State RelocationIn United States Patent Grant US009135435B2

[90] Ashish Venkat S Shamasunder Hovav Shacham and DeanM Tullsen2016 HIPStR Heterogeneous-ISA Program State Relocation In Pro-ceedings of the International Symposium on Architectural Support forProgramming Languages and Operating Systems

[91] Ashish Venkat and Dean M Tullsen 2014 Harnessing ISA diversityDesign of a heterogeneous-ISA chip multiprocessor In Proceedingsof the International Symposium on Computer Architecture

[92] G Venkataramani I Doudalis Y Solihin and M Prvulovic 2008FlexiTaint A programmable accelerator for dynamic taint propaga-tion In 2008 IEEE 14th International Symposium on High PerformanceComputer Architecture 173ndash184 httpsdoiorg101109HPCA20084658637

[93] Philipp Vogt Florian Nentwich Nenad Jovanovic Engin KirdaChristopher Kruegel and Giovanni Vigna 2007 Cross Site Script-ing Prevention with Dynamic Data Tainting and Static Analysis InNDSS Vol 2007 12

[94] Zhenghong Wang and Ruby B Lee 2006 Covert and side channelsdue to processor architecture In Computer Security Applications Con-ference 2006 ACSACrsquo06 22nd Annual IEEE 473ndash482

[95] Ofir Weisse Jo Van Bulck Marina Minkin Daniel Genkin BarisKasikci Frank Piessens Mark Silberstein Raoul Strackx Thomas FWenisch and Yuval Yarom 2018 Foreshadow-NG Breaking theVirtual Memory Abstraction with Transient Out-of-Order Execu-tion Technical report (2018) See also USENIX Security paper Fore-shadow [87]

[96] Troy Wolverton 2018 Spectre and Meltdown are now a le-gal pain for Intel the chip maker faces 35 lawsuits over the at-tacks httpswwwbusinessinsidercom35-lawsuits-have-been-filed-against-intel-over-spectre-and-meltdown-2018-2 Online ac-cessed Aug 2018

[97] Zhenyu Wu Zhang Xu and Haining Wang 2012 Whispers in theHyper-space High-speed Covert Channel Attacks in the Cloud InPresented as part of the 21st USENIX Security Symposium (USENIX Se-curity 12) USENIX Bellevue WA 159ndash173 httpswwwusenixorgconferenceusenixsecurity12technical-sessionspresentationwu

[98] Wei Xu Sandeep Bhatkar and Ramachandran Sekar 2006 Taint-Enhanced Policy Enforcement A Practical Approach to Defeat aWide Range of Attacks In USENIX Security Symposium 121ndash136

[99] M Yan J Choi D Skarlatos A Morrison C Fletcher and J Torrellas2018 InvisiSpec Making Speculative Execution Invisible in the CacheHierarchy In 2018 51st Annual IEEEACM International Symposium onMicroarchitecture (MICRO) 428ndash441 httpsdoiorg101109MICRO201800042

[100] Yuval Yarom and Katrina Falkner 2014 FLUSH+ RELOAD A HighResolution Low Noise L3 Cache Side-Channel Attack In USENIXSecurity

[101] Yuval Yarom Daniel Genkin and Nadia Heninger 2016 CacheBleedA timing attack on OpenSSL constant time RSA In InternationalConference on Cryptographic Hardware and Embedded Systems

[102] Alexander Yip Xi Wang Nickolai Zeldovich and M Frans Kaashoek2009 Improving Application Security with Data Flow AssertionsIn Proceedings of the ACM SIGOPS 22Nd Symposium on OperatingSystems Principles (SOSP rsquo09) ACM New York NY USA 291ndash304httpsdoiorg10114516295751629604

[103] Craig Zilles 2001 Linked List Traversal Micro-Benchmark httpzillescsillinoisedullubenchmarkhtml Online accessed Jun 2018

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
  • 3 Assumptions and Threat Model
  • 4 Architectural Overview
  • 5 Design and Implementation
    • 51 Microcode Customization
    • 52 Decoder-Level Information Flow Tracking
    • 53 Mitigations for Control-flow Mistraining
      • 6 Methodology
      • 7 Evaluation
        • 71 Security Discussion
        • 72 Performance
          • 8 Conclusion
          • Acknowledgments
          • References
Page 11: Context-Sensitive Fencing: Securing Speculative Execution ...av6ds/papers/asplos2019.pdf · Context-Sensitive Fencing: Securing Speculative Execution via Microcode Customization Mohammadkazem

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

Injected for All Kernel Loads

100

105

110

115

120

125

130

135

140

145

150

155

160

165

170

175

180

185

190

195

200

Nor

mal

ized

Exe

cution

Tim

e

LFENCE-LC

LSQ-MFENCE-LC

CFENCE-LC

Figure 6 Execution Time of Different Fence EnforcementLevels (normalized to insecure execution)

effectively avoiding store-to-load forwarding between spec-ulative loads and stores

The surgical insertion of these fences is facilitated by thedecoder-level information flow tracker (DLIFT) that can fol-low instruction sequences that could serve as Spectre gad-gets as described in Section 52 While the DLIFT itself per-forms speculative tracking of taint information it has theability to detect mistaints and recover within three stages ofthe pipeline We further experimentally show later in thissection that the speculative nature of the DLIFT engine maysometimes result in overtainting (where trusted operand-sinstructions get marked as tainted) but never results in anundetected and unrecovered undertainting (where untrustedoperandsinstructions remain untainted)

Variants 2 and 5 These variants reverse-engineer andmistrain the branch predictor and the return address stack toinfluence the branch outcomes of a victim which in our caseis the Linux kernel In addition to the surgical fence injec-tion and speculative taint tracking context-sensitive fencingfurther employs mistraining mitigations that circumventsuch attempts More specifically we intercept all protectiondomain crossings via syscall and int 0x80 instructions andperform a full reset of the branch target buffer (BTB) and thereturn stack buffer (RSB) clearing all previous state Thisprevents user code from influencing branch outcomes of thekernel code We record no instance of BTB collision betweenuser and kernel branches and we ensure that we alwaysstart with a clean BTB and RSB upon entering kernel mode

To show that our different fences and frequency optimiza-tions close Spectre attacks we use a proof of concept Spectreimplementation and visualization tool [63] The attacks failin every case when we use context-sensitive fencing to insertfences despite our performance optimizations

72 PerformanceFigure 6 measures the performance impact of three differentfences ndash (a) the standard x86 LFENCE (b) the LSQ-MFENCE

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

Injected for All Kernel Loads

100

105

110

115

120

125

130

135

140

145

Nor

mal

ized

Exe

cution

Tim

e

CFENCE-LC

CFENCE-EC

Figure 7 Execution Time of Early and Late Commit ofCFENCE (normalized to insecure execution)

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

NoF

ence

CF

EN

CE

00

01

02

03

04

05

06

07

08

09

10

Rat

ioO

ver

All

Loa

ds

bzip2 gcc omnet llu nginx sjeng ls ping ps

Non-Modifying Misses Non-Modifying Hits Misses Hits

Figure 8 Effects of injecting CFENCE on Cache Miss Rate

and (c) the CFENCE all enforced with the standard latecommit pessimistically inserted for every kernel load in theprogram Clearly from the figure the CFENCE incurs thelowest performance overhead since it is less restrictive thanthe other two and is enforced at a much later pipeline stageWe further study the effect of late and early commit policiesfor the CFENCE as shown in Figure 7 The early commitversion of the CFENCE consistently performs better than thelate commit version saving about 4 in overall executiontime on averageOverall the CFENCE reduces the incurred performance

overhead due to fencing by 23X bringing down the exe-cution time overhead from 48 to 21 Furthermore thisperformance improvement is consistent across all the bench-marks the only exception being llu which performs randommemory accesses due to the linked list traversal and suffersfrom a very high cache miss rateIn Figure 8 we study the effect of the CFENCE on cache

miss rate Recall that when a load passes a CFENCE it getsmarked as non-modifying and as a result if it ends up beinga miss in the cache it is deemed uncacheable and therefore

the fetched cache block isnrsquot brought into cache If the missrate of a program is already high as in the case of llu thepresence of a CFENCE in the instruction stream could po-tentially result in under-utilization of the cache since themissed blocks arenrsquot being filled back into the cache whilethe CFENCE is begin enforced For programs that have un-usually low hit rates we suggest using the standard LFENCEinstead of the CFENCE However in those cases it should benoted that the overhead of the mitigation is low regardlessof the fence usedFurthermore it is important to note that the number of

non-modifying loads in the dynamic instruction stream donot necessarily have a negative impact on performancewhile a CFENCE is being enforced For example while theping program has the most non-modifying loads it doesnot suffer much in terms of overall performance becausethe working set fits well into the cache and most of thenon-modifying accesses end up being hits In general wegain more from employing the CFENCE instead of standardserializing fences when the cache hit rate is lowFigure 9 shows the performance of our proposed fence

frequency optimization techniques In the first optimizationwe inject the CFENCE only for tainted loads and branches asindicated by the DLIFT engine This reduces the performanceoverhead of the defense from 21 to 11 on average We fur-ther optimize the number of fences inserted by performingbasic block-level fence insertion where we only instrumentthe first instance of a vulnerable load in each basic block Thisresults in an additional 4 improvement in performance AsFigure 9 shows the once per basic block opitimization is suc-cessful at improving the performance of all the benchmarksexcept llu That is because CFENCE changes the cache accesspattern usually at some cost but occasionally beneficiallyby bypassing accesses and reducing pressure on the cacheIn such a case like llu with its large working set executingfewer fences would then be slightly less beneficial Overallcompared to the state-of-the-art fence injection scheme thatpessimistically injects an LFENCE for all kernel loads ourDLIFT-Based CFENCE injection reduces the performanceoverhead from 48 to just 77 on averageWe next study the accuracy and coverage of our DLIFT

implementation Figure 10 examines overtainting scenarioswhere the DLIFT engine conservatively marks instructionsas tainted when theyrsquore actually not The results are nor-malized to the total number of dynamic load instructionsexecuted Further we examine two scenarios ndash the bars tothe left represent DLIFT-based fencing without the basic-block level fence insertion and the bars to the right includesthe basic-block level fence insertion optimization In mostcases the percentage of overtainted loads remains low inboth scenariosFigure 11 examines undertainting scenarios where the

DLIFT engine optimistically forgoes instrumenting certaininstructions that are actually vulnerable but recovers from

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

CFENCE

100

105

110

115

120

125

130

135

140

145

150

155

160

Nor

mal

ized

Exe

cution

Tim

e

Always Fencing

DLIFT-Based Fencing

DLIFT + Once per BB

Figure 9 Impact of Fence Frequency Optimizations on Exe-cution Time

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

00

01

02

03

04

05

06

07

08

09

10

Rat

ioO

ver

All

Loa

ds

ping ps nginx bzip2 ls sjeng llu omnet gcc

Over-tainted Loads Tainted Loads

Figure 10 Accuracy of DLIFT Overtainting Rate of Loads

such scenarios later in the pipeline We observe that thepercentage of undertainted loads is low and consistentlybelow 10 with the outliers being the kernel-heavy applica-tions ping and ps which have the highest number of taintedinstructions (instructions with at least one tainted sourceregister) as shown in Figure 12Note that our DLIFT engine performs explicit informa-

tion flow tracking modeled after DIFT [79] that tracks copyload-address store-address and computational dependen-cies However we also evaluate CSF with a more conser-vative implicit information flow tracking model in whichwe taint the program counter when the branch outcomedepends upon a tainted value and further track both sidesof the branch This results in more tainted instructions andincurs 118 extra performance overhead on top of our taint-based CFENCE insertionTo evaluate the effect of our mistraining mitigations we

measure the impact of a full micro-op based BTB and RSBreset at every protection domain crossing We do this ex-periment on three separate branch predictor configurations(a) a small predictor with 256 BTB and 8 RAS entries (b) a

pssje

ngbz

ip2

ping llu gc

c ls

nginx

omne

t

Speculative DIFT ndash Sfence

000

001

002

003

004

005

006

007

008

009

Rat

ioof

Under

-tai

nte

dLoa

ds

Figure 11 Coverage of DLIFT Undertainting Rate of Loads

omnet bzip2 llu ls nginx ping ps sjeng gcc

Speculative DIFT ndash Sfence

00

01

02

03

04

05

06

07

08

09

10

Tai

nte

dIn

tstr

uct

ions

Tainted Untatinted

Figure 12 Ratio of instructions marked as tainted by DLIFT

medium predictor with 512 BTB and 16 RAS entries and(c) a large predictor with 1024 BTB and 32 RAS entries Wemeasure an average performance degradation of 27 on oursmall predictor 66 on our medium predictor and 152 onour large predictor Naturally the overhead is almost com-pletely due to the loss of prediction accuracy rather thanthe cost of micro-op expansionIn summary the proposed defense strategy introduces a

flexible microcode customization framework that performthe surgical insertion of newly introduced speculation fencesthat mitigate five variants of the Spectre class of attacksreducing the fencing overhead of state-of-the-art fence-basedSpectre mitigations by a factor of 6

8 ConclusionIn this work we propose context-sensitive fencing (CSF) aset of architectural techniques that provide high-performancedefense against Spectre-style attacks In particular we showthat we can reduce fencing overhead by a factor of 6 com-pared to a conservative fence insertion method

This is done by injecting fence instructions dynamicallyin the decoder with no recompilation and binary translationrequired This allows us to employ runtime information tostrategically insert fences when needed using taint infor-mation and avoiding redundant fences after branches Thiswork also introduces new fence primitives which protectsensitive structures from speculation-based microarchitec-tural effects with minimal impact on instruction throughputin the pipeline

AcknowledgmentsThe authors would like to thank the anonymous reviewersfor their helpful insights This research was supported inpart by NSF Grant CNS-1652925 NSFIntel FoundationalMicroarchitecture Research Grant CCF-1823444 and DARPAunder the agreement number HR0011-18-C-0020

References[1] Onur Acıiccedilmez Ccediletin Kaya Koccedil and Jean-Pierre Seifert 2007 Predict-

ing secret keys via branch prediction In CryptographersacircĂŹ Track atthe RSA Conference

[2] Onur Acıiccedilmez Werner Schindler and Ccediletin K Koccedil 2007 Cachebased remote timing attack on the AES In CryptographersacircĂŹ Trackat the RSA Conference

[3] Shaizeen Aga and Satish Narayanasamy 2017 InvisiMem SmartMemory Defenses for Memory Bus Side Channel In Proceedings ofthe 44th Annual International Symposium on Computer Architecture(ISCA rsquo17) ACM New York NY USA 94ndash106 httpsdoiorg10114530798563080232

[4] AMD 2018 White paper SOFTWARE TECHNIQUES FOR MANAG-ING SPECULATION ON AMD PROCESSORS Technical Report RE-VISION 12418 httpsdeveloperamdcomwp-contentresourcesManaging-Speculation-on-AMD-Processorspdf

[5] Nadav Amit Muli Ben-Yehuda Dan Tsafrir and Assaf Schuster 2011vIOMMU Efficient IOMMU Emulation In USENIX Annual TechnicalConference

[6] ARM 2018 Whitepaper Cache Speculation Side-channelsTechnical Report Version 22 httpsdeveloperarmcom-mediadeveloperpdfSecurity20update201020July2018CacheSpeculationSide-channels-v22pdfrevision=7de26366-a49f-4c23-85f0-34e8f5e38881

[7] Eli Biham and Adi Shamir 1997 Differential fault analysis of secretkey cryptosystems In Annual International Cryptology Conference

[8] Nathan L Binkert Ronald G Dreslinski Lisa R Hsu Kevin T Lim Ali GSaidi and Steven K Reinhardt 2006 The M5 Simulator ModelingNetworked Systems Micro IEEE (2006)

[9] Darrell Boggs Gary Brown Nathan Tuck and K S Venkatraman2015 Denver Nvidiarsquos First 64-bit ARM Processor IEEE Micro (2015)

[10] Chandler Carruth 2018 Mitigating Speculative Attacks inCrypto httpsgithubcomHACS-workshopspectre-mitigationsblobmastercryptoguidelinesmd Online accessed Jul 2018

[11] Chandler Carruth 2018 RFC Speculative Load Hardening (aSpectre variant1 mitigation) httpsdocsgooglecomdocumentd1wwcfv3UV9ZnZVcGiGuoITT61eKo3TmoCS3uXLcJR0edit Onlineaccessed Jul 2018

[12] Chandler Carruth 2018 RFC Speculative Load Hardening (a Spectrevariant1 mitigation) httpslistsllvmorgpipermailllvm-dev2018-March122085html Online accessed Jul 2018

[13] Guoxing Chen Sanchuan Chen Yuan Xiao Yinqian Zhang ZhiqiangLin and Ten H Lai 2018 SgxPectre Attacks Leaking Enclave

Secrets via Speculative Execution CoRR abs180209085 (2018)arXiv180209085 httparxivorgabs180209085

[14] H Chen X Wu L Yuan B Zang P c Yew and F T Chong 2008From Speculation to Security Practical and Efficient InformationFlow Tracking Using Speculative Hardware In 2008 InternationalSymposium on Computer Architecture 401ndash412 httpsdoiorg101109ISCA200818

[15] Shimin Chen Phillip B GibbonsMichael Kozuch and Todd CMowry2011 Log-based Architectures Using Multicore to Help SoftwareBehave Correctly SIGOPS Oper Syst Rev 45 1 (Feb 2011) 84ndash91httpsdoiorg10114519450231945034

[16] Hari Cherupalli Henry Duwe Weidong Ye Rakesh Kumar and JohnSartori 2017 Software-based Gate-level Information Flow Securityfor IoT Systems In Proceedings of the 50th Annual IEEEACM Interna-tional Symposium on Microarchitecture (MICRO-50 rsquo17) ACM NewYork NY USA 328ndash340 httpsdoiorg10114531239393123955

[17] JonathanCorbet 2017 The current state of kernel page-table isolationhttpslwnnetArticles741878 Online accessed Jan 2018

[18] Marc L Corliss E Christopher Lewis and Amir Roth 2003 DISE Aprogrammable macro engine for customizing applications In Com-puter Architecture 2003 Proceedings 30th Annual International Sym-posium on

[19] Marc L Corliss E Christopher Lewis and Amir Roth 2005 Low-overhead interactive debugging via dynamic instrumentation withDISE In 11th International Symposium on High-Performance ComputerArchitecture

[20] Marc L Corliss E Christopher Lewis and Amir Roth 2005 UsingDISE to Protect Return Addresses from Attack SIGARCH ComputArchit News (March 2005)

[21] Jedidiah R Crandall and Frederic T Chong 2004 Minos Control DataAttack Prevention Orthogonal toMemoryModel In Proceedings of the37th Annual IEEEACM International Symposium on Microarchitecture(MICRO 37) IEEE Computer Society Washington DC USA 221ndash232httpsdoiorg101109MICRO200426

[22] John Criswell Nathan Dautenhahn and Vikram Adve 2014 VirtualGhost Protecting Applications from Hostile Operating Systems InProceedings of the 19th International Conference on Architectural Sup-port for Programming Languages and Operating Systems (ASPLOS rsquo14)ACM New York NY USA 81ndash96 httpsdoiorg10114525419402541986

[23] John Criswell Andrew Lenharth Dinakar Dhurjati and Vikram Adve2007 Secure Virtual Architecture A Safe Execution Environmentfor Commodity Operating Systems In Proceedings of Twenty-firstACM SIGOPS Symposium on Operating Systems Principles (SOSP rsquo07)ACM New York NY USA 351ndash366 httpsdoiorg10114512942611294295

[24] Michael Dalton Hari Kannan and Christos Kozyrakis 2007 RakshaA Flexible Information Flow Architecture for Software Security InProceedings of the 34th Annual International Symposium on ComputerArchitecture (ISCA rsquo07) ACM New York NY USA 482ndash493 httpsdoiorg10114512506621250722

[25] James C Dehnert Brian K Grant John P Banning Richard JohnsonThomas Kistler Alexander Klaiber and Jim Mattson 2003 The Trans-meta Code Morphing Software using speculation recovery andadaptive retranslation to address real-life challenges In Proceedingsof the international symposium on Code generation and optimizationfeedback-directed and runtime optimization

[26] Matthew DeVuyst Ashish Venkat and Dean M Tullsen 2012 Ex-ecution migration in a heterogeneous-isa chip multiprocessor InProceedings of the Seventeenth International Conference on Architec-tural Support for Programming Languages and Operating Systems

[27] Craig Disselkoen David Kohlbrenner Leo Porter and Dean Tullsen2017 Prime+Abort A Timer-Free High-Precision L3 CacheAttack using Intel TSX In 26th USENIX Security Symposium

(USENIX Security 17) USENIX Association Vancouver BC 51ndash67 httpswwwusenixorgconferenceusenixsecurity17technical-sessionspresentationdisselkoen

[28] Xiaowan Dong Zhuojia Shen John Criswell Alan Cox and SandhyaDwarkadas 2018 Spectres Virtual Ghosts and Hardware SupportIn Proceedings of the 7th International Workshop on Hardware andArchitectural Support for Security and Privacy (HASP rsquo18) ACM NewYork NY USA Article 5 9 pages httpsdoiorg10114532142923214297

[29] Xiaowan Dong Zhuojia Shen John Criswell Alan L Cox and Sand-hya Dwarkadas 2018 Shielding Software From Privileged Side-Channel Attacks In 27th USENIX Security Symposium (USENIX Se-curity 18) USENIX Association Baltimore MD 1441ndash1458 httpswwwusenixorgconferenceusenixsecurity18presentationdong

[30] Dmitry Evtyushkin Ryan Riley Nael CSE Abu-Ghazaleh ECE andDmitry Ponomarev 2018 BranchScope A New Side-Channel Attackon Directional Branch Predictor In Proceedings of the Twenty-ThirdInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems (ASPLOS rsquo18) ACM New York NYUSA 693ndash707 httpsdoiorg10114531731623173204

[31] Karine Gandolfi Christophe Mourtel and Francis Olivier 2001 Elec-tromagnetic analysis Concrete results In International Workshop onCryptographic Hardware and Embedded Systems

[32] Qian Ge Yuval Yarom David Cock and Gernot Heiser 2018 Asurvey of microarchitectural timing attacks and countermeasures oncontemporary hardware Journal of Cryptographic Engineering 8 1(2018) 1ndash27

[33] Daniel Genkin Adi Shamir and Eran Tromer 2014 RSA key ex-traction via low-bandwidth acoustic cryptanalysis In InternationalCryptology Conference

[34] Daniel Genkin Adi Shamir and Eran Tromer 2016 Acoustic Crypt-analysis Journal of Cryptology (2016)

[35] Github [n d] wrk - a HTTP benchmarking tool httpsgithubcomwgwrk Online accessed Jun 2018

[36] Daniel Gruss Moritz Lipp Michael Schwarz Richard Fellner Cleacute-mentine Maurice and Stefan Mangard 2017 Kaslr is dead long livekaslr In International Symposium on Engineering Secure Software andSystems Springer 161ndash176

[37] Daniel Gruss Raphael Spreitzer and Stefan Mangard 2015Cache Template Attacks Automating Attacks on InclusiveLast-Level Caches In 24th USENIX Security Symposium (USENIXSecurity 15) USENIX Association Washington DC 897ndash912httpswwwusenixorgconferenceusenixsecurity15technical-sessionspresentationgruss

[38] Maurice CleacutementineWagner Klaus Gruss Daniel and StefanMangard2016 Flush+Flush A Fast and Stealthy Cache Attack

[39] V Haldar D Chandra andM Franz 2005 Dynamic taint propagationfor Java In 21st Annual Computer Security Applications Conference(ACSACrsquo05) 9 ppndash311 httpsdoiorg101109CSAC200521

[40] W Halfond A Orso and P Manolios 2008 WASP Protecting WebApplications Using Positive Tainting and Syntax-Aware EvaluationIEEE Transactions on Software Engineering 34 1 (Jan 2008) 65ndash81httpsdoiorg101109TSE200770748

[41] John L Henning 2006 SPEC CPU2006 Benchmark DescriptionsSIGARCH Comput Archit News (Sept 2006)

[42] Jonathan J Hoch and Adi Shamir 2004 Fault analysis of streamciphers In International Workshop on Cryptographic Hardware andEmbedded Systems

[43] Intel 2008 2nd Generation Intel Core vPro Processor Family Tech-nical Report httpwwwintelcomcontentdamdocwhite-paperperformance-2nd-generation-core-vpro-family-paperpdf

[44] Intel 2014 Software guard extensions programming reference (2014)httpssoftwareintelcomsitesdefaultfiles329298-001pdf

[45] Intel 2018 White paper Intel Analysis of Speculative Exe-cution Side Channels Technical Report 336983-001 Revision10 httpsnewsroomintelcomwp-contentuploadssites11201801Intel-Analysis-of-Speculative-Execution-Side-Channelspdf

[46] Intel 2018 White paper Retpoline A Branch Target In-jection Mitigation Technical Report 337131-003 Revision003 httpssoftwareintelcomsitesdefaultfilesmanaged1d46Retpoline-A-Branch-Target-Injection-Mitigationpdf

[47] Intel Corporation 2009 Intelreg 64 and IA-32 Architectures OptimizationReference Manual

[48] Intel Corporation 2011 Intel 64 and IA-32 Architectures SoftwareDeveloperrsquos Manual Intel Corporation

[49] Project Zero Jann Horn 2018 Reading privileged memory with a side-channel httpsgoogleprojectzeroblogspotcom201801reading-privileged-memory-with-sidehtml Online accessed Jan 2018

[50] Morteza Mohajjel Kafshdooz Mohammadkazem Taram Sepehr As-sadi and Alireza Ejlali 2016 A Compile-Time Optimization MethodforWCET Reduction in Real-Time Embedded Systems Through BlockFormation ACMTrans Archit Code Optim 12 4 Article 66 (Jan 2016)25 pages httpsdoiorg1011452845083

[51] R Karri K Wu P Mishra and Yongkook Kim 2002 Concurrenterror detection schemes for fault-based side-channel cryptanalysisof symmetric block ciphers IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems 21 12 (Dec 2002) 1509ndash1517httpsdoiorg101109TCAD2002804378

[52] Mehmet Kayaalp Khaled N Khasawneh Hodjat Asghari EsfedenJesse Elwell Nael Abu-Ghazaleh Dmitry Ponomarev and AamerJaleel 2017 RIC Relaxed Inclusion Caches for Mitigating LLC Side-Channel Attacks In Proceedings of the 54th Annual Design AutomationConference 2017 (DAC rsquo17) ACM New York NY USA Article 76 pages httpsdoiorg10114530616393062313

[53] Khaled N Khasawneh Esmaeil Mohammadian Koruyeh ChengyuSong Dmitry Evtyushkin Dmitry Ponomarev and Nael Abu-Ghazaleh 2018 SafeSpec Banishing the Spectre of a Meltdown withLeakage-Free Speculation arXiv preprint arXiv180605179 (2018)

[54] Vladimir Kiriansky and Carl Waldspurger 2018 Speculative bufferoverflows Attacks and defenses arXiv preprint arXiv180703757(2018)

[55] Vladimir Kiriansky and Carl Waldspurger 2018 Speculative BufferOverflows Attacks and Defenses arXiv preprint arXiv180703757(2018)

[56] Paul Kocher 2018 Spectre Mitigations in MicrosoftrsquosCC++ Compiler httpswwwpaulkochercomdocMicrosoftCompilerSpectreMitigationhtml Online accessedJul 2018

[57] Paul Kocher Daniel Genkin Daniel Gruss Werner Haas MikeHamburg Moritz Lipp Stefan Mangard Thomas Prescher MichaelSchwarz and Yuval Yarom 2018 Spectre Attacks Exploiting Specu-lative Execution ArXiv e-prints (Jan 2018) arXiv180101203

[58] Esmaeil Mohammadian Koruyeh Khaled N Khasawneh ChengyuSong and Nael Abu-Ghazaleh 2018 Spectre Returns SpeculationAttacks using the Return Stack Buffer In 12th USENIXWorkshop on Of-fensive Technologies (WOOT 18) USENIX Association Baltimore MDhttpswwwusenixorgconferencewoot18presentationkoruyeh

[59] Esmaeil Mohammadian Koruyeh Khaled N Khasawneh ChengyuSong and Nael Abu-Ghazaleh 2018 Spectre Returns SpeculationAttacks using the Return Stack Buffer In 12th USENIXWorkshop on Of-fensive Technologies (WOOT 18) USENIX Association Baltimore MDhttpswwwusenixorgconferencewoot18presentationkoruyeh

[60] Monica S Lam Michael Martin Benjamin Livshits and John Wha-ley 2008 Securing Web Applications with Static and DynamicInformation Flow Tracking In Proceedings of the 2008 ACM SIG-PLAN Symposium on Partial Evaluation and Semantics-based Pro-gram Manipulation (PEPM rsquo08) ACM New York NY USA 3ndash12

httpsdoiorg10114513284081328410[61] Moritz Lipp Michael Schwarz Daniel Gruss Thomas Prescher

Werner Haas Anders Fogh Jann Horn Stefan Mangard Paul KocherDaniel Genkin Yuval Yarom and Mike Hamburg 2018 MeltdownReading Kernel Memory from User Space In 27th USENIX SecuritySymposium (USENIX Security 18)

[62] Fangfei Liu Yuval Yarom Qian Ge Gernot Heiser and Ruby BLee 2015 Last-Level Cache Side-Channel Attacks Are PracticalIn Proceedings of the 2015 IEEE Symposium on Security and Privacy(SP rsquo15) IEEE Computer Society Washington DC USA 605ndash622httpsdoiorg101109SP201543

[63] Jason Lowe-Power 2018 Visualizing Spectre with gem5 httpwwwlowepowercomjasonvisualizing-spectre-with-gem5html Onlineaccessed Jun 2018

[64] Jason Lowe-Power Venkatesh Akella Matthew K Farrens Samuel TKing and Christopher J Nitta 2018 Position Paper A Case forExposing Extra-architectural State in the ISA In Proceedings of the7th International Workshop on Hardware and Architectural Support forSecurity and Privacy (HASP rsquo18) ACM New York NY USA Article 86 pages httpsdoiorg10114532142923214300

[65] Scott A Mahlke David C Lin William Y Chen Richard E Hank andRoger A Bringmann 1992 Effective compiler support for predicatedexecution using the hyperblock In MICRO

[66] Giorgi Maisuradze and Christian Rossow 2018 Ret2Spec SpeculativeExecution Using Return Stack Buffers In Proceedings of the 2018 ACMSIGSAC Conference on Computer and Communications Security (CCSrsquo18) ACM New York NY USA 2109ndash2122 httpsdoiorg10114532437343243761

[67] Microsoft Security Response Center (MSRC) Matt Miller 2018Analysis and mitigation of speculative store bypass (CVE-2018-3639) httpsblogstechnetmicrosoftcomsrd20180521analysis-and-mitigation-of-speculative-store-bypass-cve-2018-3639 Onlineaccessed Jul 2018

[68] Keaton Mowery Sriram Keelveedhi and Hovav Shacham 2012 AreAES x86 cache timing attacks still feasible In Proceedings of the 2012ACM Workshop on Cloud computing security workshop

[69] Oleksii Oleksenko Bohdan Trach Tobias Reiher Mark Silbersteinand Christof Fetzer 2018 You Shall Not Bypass Employing datadependencies to prevent Bounds Check Bypass CoRR abs180508506(2018) arXiv180508506 httparxivorgabs180508506

[70] Yossef Oren Vasileios P Kemerlis Simha Sethumadhavan and An-gelos D Keromytis 2015 The Spy in the Sandbox Practical CacheAttacks in JavaScript and Their Implications In Proceedings of the22Nd ACM SIGSAC Conference on Computer and CommunicationsSecurity

[71] Guilherme Ottoni Thomas Hartin ChristopherWeaver Jason BrandtBelliappa Kuttanna and HongWang 2011 Harmonia A TransparentEfficient and Harmonious Dynamic Binary Translator Targeting theIntelampReg Architecture In Proceedings of the 8th ACM InternationalConference on Computing Frontiers (CF rsquo11) ACM New York NYUSA Article 26 10 pages httpsdoiorg10114520166042016635

[72] Andrew Pardoe 2018 Spectre mitigations in MSVC httpsblogsmsdnmicrosoftcomvcblog20180115spectre-mitigations-in-msvc Online accessed Jul 2018

[73] Joseph CH Park and Mike Schlansker 1991 On predicated executionHewlett-Packard Laboratories Palo Alto California

[74] Will Reese 2008 Nginx The High-performance Web Server andReverse Proxy Linux J 2008 173 Article 2 (Sept 2008) httpdlacmorgcitationcfmid=14122021412204

[75] Ryan Roemer Erik Buchanan Hovav Shacham and Stefan Savage2012 Return-oriented programming Systems languages and ap-plications ACM Transactions on Information and System Security(2012)

[76] Michael Schwarz Martin Schwarzl Moritz Lipp and Daniel Gruss2018 NetSpectre Read Arbitrary Memory over Network arXivpreprint arXiv180710535 (2018)

[77] Hovav Shacham Matthew Page Ben Pfaff Eu-Jin Goh NagendraModadugu and Dan Boneh 2004 On the Effectiveness of Address-space Randomization In Proceedings of the 11th ACM Conference onComputer and Communications Security (CCS rsquo04) ACM New YorkNY USA 298ndash307 httpsdoiorg10114510300831030124

[78] Julian Stecklina and Thomas Prescher 2018 LazyFP Leaking FPURegister State using Microarchitectural Side-Channels arXiv preprintarXiv180607480 (2018)

[79] G Edward Suh Jae W Lee David Zhang and Srinivas Devadas2004 Secure Program Execution via Dynamic Information FlowTracking In Proceedings of the 11th International Conference onArchitectural Support for Programming Languages and OperatingSystems (ASPLOS XI) ACM New York NY USA 85ndash96 httpsdoiorg10114510243931024404

[80] Mohammadkazem Taram Ashish Venkat and Dean Tullsen 2018Mobilizing the Micro-Ops Exploiting Context Sensitive Decodingfor Security and Energy Efficiency In Proceedings of the 45th AnnualInternational Symposium on Computer Architecture (ISCA rsquo18)

[81] Mohit Tiwari Hassan MG Wassel Bita Mazloom ShashidharMysore Frederic T Chong and Timothy Sherwood 2009 Com-plete Information Flow Tracking from the Gates Up In Proceedings ofthe 14th International Conference on Architectural Support for Program-ming Languages and Operating Systems (ASPLOS XIV) ACM NewYork NY USA 109ndash120 httpsdoiorg10114515082441508258

[82] Caroline Trippel Daniel Lustig and Margaret Martonosi 2018MeltdownPrime and SpectrePrime Automatically-Synthesized At-tacks Exploiting Invalidation-Based Coherence Protocols CoRRabs180203802 (2018) arXiv180203802 httparxivorgabs180203802

[83] DeanM Tullsen Susan J Eggers Joel S Emer Henry M Levy Jack L Loand Rebecca L Stamm 1996 Exploiting choice Instruction fetch andissue on an implementable simultaneous multithreading processor InACM SIGARCH Computer Architecture News Vol 24 ACM 191ndash202

[84] Dean M Tullsen Susan J Eggers and Henry M Levy 1995 Si-multaneous Multithreading Maximizing On-chip Parallelism InProceedings of the 22Nd Annual International Symposium on Com-puter Architecture (ISCA rsquo95) ACM New York NY USA 392ndash403httpsdoiorg101145223982224449

[85] Paul Turner 2018 Retpoline a software construct for preventingbranch-target-injection httpssupportgooglecomfaqsanswer7625886 Online accessed Jul 2018

[86] N Vachharajani M J Bridges J Chang R Rangan G Ottoni J ABlome G A Reis M Vachharajani and D I August 2004 RIFLEAn Architectural Framework for User-Centric Information-Flow Se-curity In Microarchitecture 2004 MICRO-37 2004 37th InternationalSymposium on 243ndash254 httpsdoiorg101109MICRO200431

[87] Jo Van Bulck Marina Minkin Ofir Weisse Daniel Genkin BarisKasikci Frank Piessens Mark Silberstein Thomas F Wenisch Yu-val Yarom and Raoul Strackx 2018 Foreshadow Extracting theKeys to the Intel SGX Kingdom with Transient Out-of-Order Execu-tion In Proceedings of the 27th USENIX Security Symposium USENIXAssociation See also technical report Foreshadow-NG [95]

[88] Ashish Venkat 2018 Breaking the ISA Barrier in Modern ComputingPhD Dissertation UC San Diego

[89] Ashish Venkat Arvind Krishnaswamy Koichi Yamada and RajanPalanivel 2015 Binary Translation driven Program State RelocationIn United States Patent Grant US009135435B2

[90] Ashish Venkat S Shamasunder Hovav Shacham and DeanM Tullsen2016 HIPStR Heterogeneous-ISA Program State Relocation In Pro-ceedings of the International Symposium on Architectural Support forProgramming Languages and Operating Systems

[91] Ashish Venkat and Dean M Tullsen 2014 Harnessing ISA diversityDesign of a heterogeneous-ISA chip multiprocessor In Proceedingsof the International Symposium on Computer Architecture

[92] G Venkataramani I Doudalis Y Solihin and M Prvulovic 2008FlexiTaint A programmable accelerator for dynamic taint propaga-tion In 2008 IEEE 14th International Symposium on High PerformanceComputer Architecture 173ndash184 httpsdoiorg101109HPCA20084658637

[93] Philipp Vogt Florian Nentwich Nenad Jovanovic Engin KirdaChristopher Kruegel and Giovanni Vigna 2007 Cross Site Script-ing Prevention with Dynamic Data Tainting and Static Analysis InNDSS Vol 2007 12

[94] Zhenghong Wang and Ruby B Lee 2006 Covert and side channelsdue to processor architecture In Computer Security Applications Con-ference 2006 ACSACrsquo06 22nd Annual IEEE 473ndash482

[95] Ofir Weisse Jo Van Bulck Marina Minkin Daniel Genkin BarisKasikci Frank Piessens Mark Silberstein Raoul Strackx Thomas FWenisch and Yuval Yarom 2018 Foreshadow-NG Breaking theVirtual Memory Abstraction with Transient Out-of-Order Execu-tion Technical report (2018) See also USENIX Security paper Fore-shadow [87]

[96] Troy Wolverton 2018 Spectre and Meltdown are now a le-gal pain for Intel the chip maker faces 35 lawsuits over the at-tacks httpswwwbusinessinsidercom35-lawsuits-have-been-filed-against-intel-over-spectre-and-meltdown-2018-2 Online ac-cessed Aug 2018

[97] Zhenyu Wu Zhang Xu and Haining Wang 2012 Whispers in theHyper-space High-speed Covert Channel Attacks in the Cloud InPresented as part of the 21st USENIX Security Symposium (USENIX Se-curity 12) USENIX Bellevue WA 159ndash173 httpswwwusenixorgconferenceusenixsecurity12technical-sessionspresentationwu

[98] Wei Xu Sandeep Bhatkar and Ramachandran Sekar 2006 Taint-Enhanced Policy Enforcement A Practical Approach to Defeat aWide Range of Attacks In USENIX Security Symposium 121ndash136

[99] M Yan J Choi D Skarlatos A Morrison C Fletcher and J Torrellas2018 InvisiSpec Making Speculative Execution Invisible in the CacheHierarchy In 2018 51st Annual IEEEACM International Symposium onMicroarchitecture (MICRO) 428ndash441 httpsdoiorg101109MICRO201800042

[100] Yuval Yarom and Katrina Falkner 2014 FLUSH+ RELOAD A HighResolution Low Noise L3 Cache Side-Channel Attack In USENIXSecurity

[101] Yuval Yarom Daniel Genkin and Nadia Heninger 2016 CacheBleedA timing attack on OpenSSL constant time RSA In InternationalConference on Cryptographic Hardware and Embedded Systems

[102] Alexander Yip Xi Wang Nickolai Zeldovich and M Frans Kaashoek2009 Improving Application Security with Data Flow AssertionsIn Proceedings of the ACM SIGOPS 22Nd Symposium on OperatingSystems Principles (SOSP rsquo09) ACM New York NY USA 291ndash304httpsdoiorg10114516295751629604

[103] Craig Zilles 2001 Linked List Traversal Micro-Benchmark httpzillescsillinoisedullubenchmarkhtml Online accessed Jun 2018

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
  • 3 Assumptions and Threat Model
  • 4 Architectural Overview
  • 5 Design and Implementation
    • 51 Microcode Customization
    • 52 Decoder-Level Information Flow Tracking
    • 53 Mitigations for Control-flow Mistraining
      • 6 Methodology
      • 7 Evaluation
        • 71 Security Discussion
        • 72 Performance
          • 8 Conclusion
          • Acknowledgments
          • References
Page 12: Context-Sensitive Fencing: Securing Speculative Execution ...av6ds/papers/asplos2019.pdf · Context-Sensitive Fencing: Securing Speculative Execution via Microcode Customization Mohammadkazem

the fetched cache block isnrsquot brought into cache If the missrate of a program is already high as in the case of llu thepresence of a CFENCE in the instruction stream could po-tentially result in under-utilization of the cache since themissed blocks arenrsquot being filled back into the cache whilethe CFENCE is begin enforced For programs that have un-usually low hit rates we suggest using the standard LFENCEinstead of the CFENCE However in those cases it should benoted that the overhead of the mitigation is low regardlessof the fence usedFurthermore it is important to note that the number of

non-modifying loads in the dynamic instruction stream donot necessarily have a negative impact on performancewhile a CFENCE is being enforced For example while theping program has the most non-modifying loads it doesnot suffer much in terms of overall performance becausethe working set fits well into the cache and most of thenon-modifying accesses end up being hits In general wegain more from employing the CFENCE instead of standardserializing fences when the cache hit rate is lowFigure 9 shows the performance of our proposed fence

frequency optimization techniques In the first optimizationwe inject the CFENCE only for tainted loads and branches asindicated by the DLIFT engine This reduces the performanceoverhead of the defense from 21 to 11 on average We fur-ther optimize the number of fences inserted by performingbasic block-level fence insertion where we only instrumentthe first instance of a vulnerable load in each basic block Thisresults in an additional 4 improvement in performance AsFigure 9 shows the once per basic block opitimization is suc-cessful at improving the performance of all the benchmarksexcept llu That is because CFENCE changes the cache accesspattern usually at some cost but occasionally beneficiallyby bypassing accesses and reducing pressure on the cacheIn such a case like llu with its large working set executingfewer fences would then be slightly less beneficial Overallcompared to the state-of-the-art fence injection scheme thatpessimistically injects an LFENCE for all kernel loads ourDLIFT-Based CFENCE injection reduces the performanceoverhead from 48 to just 77 on averageWe next study the accuracy and coverage of our DLIFT

implementation Figure 10 examines overtainting scenarioswhere the DLIFT engine conservatively marks instructionsas tainted when theyrsquore actually not The results are nor-malized to the total number of dynamic load instructionsexecuted Further we examine two scenarios ndash the bars tothe left represent DLIFT-based fencing without the basic-block level fence insertion and the bars to the right includesthe basic-block level fence insertion optimization In mostcases the percentage of overtainted loads remains low inboth scenariosFigure 11 examines undertainting scenarios where the

DLIFT engine optimistically forgoes instrumenting certaininstructions that are actually vulnerable but recovers from

ps sjeng bzip2 ping llu gcc ls nginx omnet gmean

CFENCE

100

105

110

115

120

125

130

135

140

145

150

155

160

Nor

mal

ized

Exe

cution

Tim

e

Always Fencing

DLIFT-Based Fencing

DLIFT + Once per BB

Figure 9 Impact of Fence Frequency Optimizations on Exe-cution Time

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

DL

IFT

On

ceB

B

00

01

02

03

04

05

06

07

08

09

10

Rat

ioO

ver

All

Loa

ds

ping ps nginx bzip2 ls sjeng llu omnet gcc

Over-tainted Loads Tainted Loads

Figure 10 Accuracy of DLIFT Overtainting Rate of Loads

such scenarios later in the pipeline We observe that thepercentage of undertainted loads is low and consistentlybelow 10 with the outliers being the kernel-heavy applica-tions ping and ps which have the highest number of taintedinstructions (instructions with at least one tainted sourceregister) as shown in Figure 12Note that our DLIFT engine performs explicit informa-

tion flow tracking modeled after DIFT [79] that tracks copyload-address store-address and computational dependen-cies However we also evaluate CSF with a more conser-vative implicit information flow tracking model in whichwe taint the program counter when the branch outcomedepends upon a tainted value and further track both sidesof the branch This results in more tainted instructions andincurs 118 extra performance overhead on top of our taint-based CFENCE insertionTo evaluate the effect of our mistraining mitigations we

measure the impact of a full micro-op based BTB and RSBreset at every protection domain crossing We do this ex-periment on three separate branch predictor configurations(a) a small predictor with 256 BTB and 8 RAS entries (b) a

pssje

ngbz

ip2

ping llu gc

c ls

nginx

omne

t

Speculative DIFT ndash Sfence

000

001

002

003

004

005

006

007

008

009

Rat

ioof

Under

-tai

nte

dLoa

ds

Figure 11 Coverage of DLIFT Undertainting Rate of Loads

omnet bzip2 llu ls nginx ping ps sjeng gcc

Speculative DIFT ndash Sfence

00

01

02

03

04

05

06

07

08

09

10

Tai

nte

dIn

tstr

uct

ions

Tainted Untatinted

Figure 12 Ratio of instructions marked as tainted by DLIFT

medium predictor with 512 BTB and 16 RAS entries and(c) a large predictor with 1024 BTB and 32 RAS entries Wemeasure an average performance degradation of 27 on oursmall predictor 66 on our medium predictor and 152 onour large predictor Naturally the overhead is almost com-pletely due to the loss of prediction accuracy rather thanthe cost of micro-op expansionIn summary the proposed defense strategy introduces a

flexible microcode customization framework that performthe surgical insertion of newly introduced speculation fencesthat mitigate five variants of the Spectre class of attacksreducing the fencing overhead of state-of-the-art fence-basedSpectre mitigations by a factor of 6

8 ConclusionIn this work we propose context-sensitive fencing (CSF) aset of architectural techniques that provide high-performancedefense against Spectre-style attacks In particular we showthat we can reduce fencing overhead by a factor of 6 com-pared to a conservative fence insertion method

This is done by injecting fence instructions dynamicallyin the decoder with no recompilation and binary translationrequired This allows us to employ runtime information tostrategically insert fences when needed using taint infor-mation and avoiding redundant fences after branches Thiswork also introduces new fence primitives which protectsensitive structures from speculation-based microarchitec-tural effects with minimal impact on instruction throughputin the pipeline

AcknowledgmentsThe authors would like to thank the anonymous reviewersfor their helpful insights This research was supported inpart by NSF Grant CNS-1652925 NSFIntel FoundationalMicroarchitecture Research Grant CCF-1823444 and DARPAunder the agreement number HR0011-18-C-0020

References[1] Onur Acıiccedilmez Ccediletin Kaya Koccedil and Jean-Pierre Seifert 2007 Predict-

ing secret keys via branch prediction In CryptographersacircĂŹ Track atthe RSA Conference

[2] Onur Acıiccedilmez Werner Schindler and Ccediletin K Koccedil 2007 Cachebased remote timing attack on the AES In CryptographersacircĂŹ Trackat the RSA Conference

[3] Shaizeen Aga and Satish Narayanasamy 2017 InvisiMem SmartMemory Defenses for Memory Bus Side Channel In Proceedings ofthe 44th Annual International Symposium on Computer Architecture(ISCA rsquo17) ACM New York NY USA 94ndash106 httpsdoiorg10114530798563080232

[4] AMD 2018 White paper SOFTWARE TECHNIQUES FOR MANAG-ING SPECULATION ON AMD PROCESSORS Technical Report RE-VISION 12418 httpsdeveloperamdcomwp-contentresourcesManaging-Speculation-on-AMD-Processorspdf

[5] Nadav Amit Muli Ben-Yehuda Dan Tsafrir and Assaf Schuster 2011vIOMMU Efficient IOMMU Emulation In USENIX Annual TechnicalConference

[6] ARM 2018 Whitepaper Cache Speculation Side-channelsTechnical Report Version 22 httpsdeveloperarmcom-mediadeveloperpdfSecurity20update201020July2018CacheSpeculationSide-channels-v22pdfrevision=7de26366-a49f-4c23-85f0-34e8f5e38881

[7] Eli Biham and Adi Shamir 1997 Differential fault analysis of secretkey cryptosystems In Annual International Cryptology Conference

[8] Nathan L Binkert Ronald G Dreslinski Lisa R Hsu Kevin T Lim Ali GSaidi and Steven K Reinhardt 2006 The M5 Simulator ModelingNetworked Systems Micro IEEE (2006)

[9] Darrell Boggs Gary Brown Nathan Tuck and K S Venkatraman2015 Denver Nvidiarsquos First 64-bit ARM Processor IEEE Micro (2015)

[10] Chandler Carruth 2018 Mitigating Speculative Attacks inCrypto httpsgithubcomHACS-workshopspectre-mitigationsblobmastercryptoguidelinesmd Online accessed Jul 2018

[11] Chandler Carruth 2018 RFC Speculative Load Hardening (aSpectre variant1 mitigation) httpsdocsgooglecomdocumentd1wwcfv3UV9ZnZVcGiGuoITT61eKo3TmoCS3uXLcJR0edit Onlineaccessed Jul 2018

[12] Chandler Carruth 2018 RFC Speculative Load Hardening (a Spectrevariant1 mitigation) httpslistsllvmorgpipermailllvm-dev2018-March122085html Online accessed Jul 2018

[13] Guoxing Chen Sanchuan Chen Yuan Xiao Yinqian Zhang ZhiqiangLin and Ten H Lai 2018 SgxPectre Attacks Leaking Enclave

Secrets via Speculative Execution CoRR abs180209085 (2018)arXiv180209085 httparxivorgabs180209085

[14] H Chen X Wu L Yuan B Zang P c Yew and F T Chong 2008From Speculation to Security Practical and Efficient InformationFlow Tracking Using Speculative Hardware In 2008 InternationalSymposium on Computer Architecture 401ndash412 httpsdoiorg101109ISCA200818

[15] Shimin Chen Phillip B GibbonsMichael Kozuch and Todd CMowry2011 Log-based Architectures Using Multicore to Help SoftwareBehave Correctly SIGOPS Oper Syst Rev 45 1 (Feb 2011) 84ndash91httpsdoiorg10114519450231945034

[16] Hari Cherupalli Henry Duwe Weidong Ye Rakesh Kumar and JohnSartori 2017 Software-based Gate-level Information Flow Securityfor IoT Systems In Proceedings of the 50th Annual IEEEACM Interna-tional Symposium on Microarchitecture (MICRO-50 rsquo17) ACM NewYork NY USA 328ndash340 httpsdoiorg10114531239393123955

[17] JonathanCorbet 2017 The current state of kernel page-table isolationhttpslwnnetArticles741878 Online accessed Jan 2018

[18] Marc L Corliss E Christopher Lewis and Amir Roth 2003 DISE Aprogrammable macro engine for customizing applications In Com-puter Architecture 2003 Proceedings 30th Annual International Sym-posium on

[19] Marc L Corliss E Christopher Lewis and Amir Roth 2005 Low-overhead interactive debugging via dynamic instrumentation withDISE In 11th International Symposium on High-Performance ComputerArchitecture

[20] Marc L Corliss E Christopher Lewis and Amir Roth 2005 UsingDISE to Protect Return Addresses from Attack SIGARCH ComputArchit News (March 2005)

[21] Jedidiah R Crandall and Frederic T Chong 2004 Minos Control DataAttack Prevention Orthogonal toMemoryModel In Proceedings of the37th Annual IEEEACM International Symposium on Microarchitecture(MICRO 37) IEEE Computer Society Washington DC USA 221ndash232httpsdoiorg101109MICRO200426

[22] John Criswell Nathan Dautenhahn and Vikram Adve 2014 VirtualGhost Protecting Applications from Hostile Operating Systems InProceedings of the 19th International Conference on Architectural Sup-port for Programming Languages and Operating Systems (ASPLOS rsquo14)ACM New York NY USA 81ndash96 httpsdoiorg10114525419402541986

[23] John Criswell Andrew Lenharth Dinakar Dhurjati and Vikram Adve2007 Secure Virtual Architecture A Safe Execution Environmentfor Commodity Operating Systems In Proceedings of Twenty-firstACM SIGOPS Symposium on Operating Systems Principles (SOSP rsquo07)ACM New York NY USA 351ndash366 httpsdoiorg10114512942611294295

[24] Michael Dalton Hari Kannan and Christos Kozyrakis 2007 RakshaA Flexible Information Flow Architecture for Software Security InProceedings of the 34th Annual International Symposium on ComputerArchitecture (ISCA rsquo07) ACM New York NY USA 482ndash493 httpsdoiorg10114512506621250722

[25] James C Dehnert Brian K Grant John P Banning Richard JohnsonThomas Kistler Alexander Klaiber and Jim Mattson 2003 The Trans-meta Code Morphing Software using speculation recovery andadaptive retranslation to address real-life challenges In Proceedingsof the international symposium on Code generation and optimizationfeedback-directed and runtime optimization

[26] Matthew DeVuyst Ashish Venkat and Dean M Tullsen 2012 Ex-ecution migration in a heterogeneous-isa chip multiprocessor InProceedings of the Seventeenth International Conference on Architec-tural Support for Programming Languages and Operating Systems

[27] Craig Disselkoen David Kohlbrenner Leo Porter and Dean Tullsen2017 Prime+Abort A Timer-Free High-Precision L3 CacheAttack using Intel TSX In 26th USENIX Security Symposium

(USENIX Security 17) USENIX Association Vancouver BC 51ndash67 httpswwwusenixorgconferenceusenixsecurity17technical-sessionspresentationdisselkoen

[28] Xiaowan Dong Zhuojia Shen John Criswell Alan Cox and SandhyaDwarkadas 2018 Spectres Virtual Ghosts and Hardware SupportIn Proceedings of the 7th International Workshop on Hardware andArchitectural Support for Security and Privacy (HASP rsquo18) ACM NewYork NY USA Article 5 9 pages httpsdoiorg10114532142923214297

[29] Xiaowan Dong Zhuojia Shen John Criswell Alan L Cox and Sand-hya Dwarkadas 2018 Shielding Software From Privileged Side-Channel Attacks In 27th USENIX Security Symposium (USENIX Se-curity 18) USENIX Association Baltimore MD 1441ndash1458 httpswwwusenixorgconferenceusenixsecurity18presentationdong

[30] Dmitry Evtyushkin Ryan Riley Nael CSE Abu-Ghazaleh ECE andDmitry Ponomarev 2018 BranchScope A New Side-Channel Attackon Directional Branch Predictor In Proceedings of the Twenty-ThirdInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems (ASPLOS rsquo18) ACM New York NYUSA 693ndash707 httpsdoiorg10114531731623173204

[31] Karine Gandolfi Christophe Mourtel and Francis Olivier 2001 Elec-tromagnetic analysis Concrete results In International Workshop onCryptographic Hardware and Embedded Systems

[32] Qian Ge Yuval Yarom David Cock and Gernot Heiser 2018 Asurvey of microarchitectural timing attacks and countermeasures oncontemporary hardware Journal of Cryptographic Engineering 8 1(2018) 1ndash27

[33] Daniel Genkin Adi Shamir and Eran Tromer 2014 RSA key ex-traction via low-bandwidth acoustic cryptanalysis In InternationalCryptology Conference

[34] Daniel Genkin Adi Shamir and Eran Tromer 2016 Acoustic Crypt-analysis Journal of Cryptology (2016)

[35] Github [n d] wrk - a HTTP benchmarking tool httpsgithubcomwgwrk Online accessed Jun 2018

[36] Daniel Gruss Moritz Lipp Michael Schwarz Richard Fellner Cleacute-mentine Maurice and Stefan Mangard 2017 Kaslr is dead long livekaslr In International Symposium on Engineering Secure Software andSystems Springer 161ndash176

[37] Daniel Gruss Raphael Spreitzer and Stefan Mangard 2015Cache Template Attacks Automating Attacks on InclusiveLast-Level Caches In 24th USENIX Security Symposium (USENIXSecurity 15) USENIX Association Washington DC 897ndash912httpswwwusenixorgconferenceusenixsecurity15technical-sessionspresentationgruss

[38] Maurice CleacutementineWagner Klaus Gruss Daniel and StefanMangard2016 Flush+Flush A Fast and Stealthy Cache Attack

[39] V Haldar D Chandra andM Franz 2005 Dynamic taint propagationfor Java In 21st Annual Computer Security Applications Conference(ACSACrsquo05) 9 ppndash311 httpsdoiorg101109CSAC200521

[40] W Halfond A Orso and P Manolios 2008 WASP Protecting WebApplications Using Positive Tainting and Syntax-Aware EvaluationIEEE Transactions on Software Engineering 34 1 (Jan 2008) 65ndash81httpsdoiorg101109TSE200770748

[41] John L Henning 2006 SPEC CPU2006 Benchmark DescriptionsSIGARCH Comput Archit News (Sept 2006)

[42] Jonathan J Hoch and Adi Shamir 2004 Fault analysis of streamciphers In International Workshop on Cryptographic Hardware andEmbedded Systems

[43] Intel 2008 2nd Generation Intel Core vPro Processor Family Tech-nical Report httpwwwintelcomcontentdamdocwhite-paperperformance-2nd-generation-core-vpro-family-paperpdf

[44] Intel 2014 Software guard extensions programming reference (2014)httpssoftwareintelcomsitesdefaultfiles329298-001pdf

[45] Intel 2018 White paper Intel Analysis of Speculative Exe-cution Side Channels Technical Report 336983-001 Revision10 httpsnewsroomintelcomwp-contentuploadssites11201801Intel-Analysis-of-Speculative-Execution-Side-Channelspdf

[46] Intel 2018 White paper Retpoline A Branch Target In-jection Mitigation Technical Report 337131-003 Revision003 httpssoftwareintelcomsitesdefaultfilesmanaged1d46Retpoline-A-Branch-Target-Injection-Mitigationpdf

[47] Intel Corporation 2009 Intelreg 64 and IA-32 Architectures OptimizationReference Manual

[48] Intel Corporation 2011 Intel 64 and IA-32 Architectures SoftwareDeveloperrsquos Manual Intel Corporation

[49] Project Zero Jann Horn 2018 Reading privileged memory with a side-channel httpsgoogleprojectzeroblogspotcom201801reading-privileged-memory-with-sidehtml Online accessed Jan 2018

[50] Morteza Mohajjel Kafshdooz Mohammadkazem Taram Sepehr As-sadi and Alireza Ejlali 2016 A Compile-Time Optimization MethodforWCET Reduction in Real-Time Embedded Systems Through BlockFormation ACMTrans Archit Code Optim 12 4 Article 66 (Jan 2016)25 pages httpsdoiorg1011452845083

[51] R Karri K Wu P Mishra and Yongkook Kim 2002 Concurrenterror detection schemes for fault-based side-channel cryptanalysisof symmetric block ciphers IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems 21 12 (Dec 2002) 1509ndash1517httpsdoiorg101109TCAD2002804378

[52] Mehmet Kayaalp Khaled N Khasawneh Hodjat Asghari EsfedenJesse Elwell Nael Abu-Ghazaleh Dmitry Ponomarev and AamerJaleel 2017 RIC Relaxed Inclusion Caches for Mitigating LLC Side-Channel Attacks In Proceedings of the 54th Annual Design AutomationConference 2017 (DAC rsquo17) ACM New York NY USA Article 76 pages httpsdoiorg10114530616393062313

[53] Khaled N Khasawneh Esmaeil Mohammadian Koruyeh ChengyuSong Dmitry Evtyushkin Dmitry Ponomarev and Nael Abu-Ghazaleh 2018 SafeSpec Banishing the Spectre of a Meltdown withLeakage-Free Speculation arXiv preprint arXiv180605179 (2018)

[54] Vladimir Kiriansky and Carl Waldspurger 2018 Speculative bufferoverflows Attacks and defenses arXiv preprint arXiv180703757(2018)

[55] Vladimir Kiriansky and Carl Waldspurger 2018 Speculative BufferOverflows Attacks and Defenses arXiv preprint arXiv180703757(2018)

[56] Paul Kocher 2018 Spectre Mitigations in MicrosoftrsquosCC++ Compiler httpswwwpaulkochercomdocMicrosoftCompilerSpectreMitigationhtml Online accessedJul 2018

[57] Paul Kocher Daniel Genkin Daniel Gruss Werner Haas MikeHamburg Moritz Lipp Stefan Mangard Thomas Prescher MichaelSchwarz and Yuval Yarom 2018 Spectre Attacks Exploiting Specu-lative Execution ArXiv e-prints (Jan 2018) arXiv180101203

[58] Esmaeil Mohammadian Koruyeh Khaled N Khasawneh ChengyuSong and Nael Abu-Ghazaleh 2018 Spectre Returns SpeculationAttacks using the Return Stack Buffer In 12th USENIXWorkshop on Of-fensive Technologies (WOOT 18) USENIX Association Baltimore MDhttpswwwusenixorgconferencewoot18presentationkoruyeh

[59] Esmaeil Mohammadian Koruyeh Khaled N Khasawneh ChengyuSong and Nael Abu-Ghazaleh 2018 Spectre Returns SpeculationAttacks using the Return Stack Buffer In 12th USENIXWorkshop on Of-fensive Technologies (WOOT 18) USENIX Association Baltimore MDhttpswwwusenixorgconferencewoot18presentationkoruyeh

[60] Monica S Lam Michael Martin Benjamin Livshits and John Wha-ley 2008 Securing Web Applications with Static and DynamicInformation Flow Tracking In Proceedings of the 2008 ACM SIG-PLAN Symposium on Partial Evaluation and Semantics-based Pro-gram Manipulation (PEPM rsquo08) ACM New York NY USA 3ndash12

httpsdoiorg10114513284081328410[61] Moritz Lipp Michael Schwarz Daniel Gruss Thomas Prescher

Werner Haas Anders Fogh Jann Horn Stefan Mangard Paul KocherDaniel Genkin Yuval Yarom and Mike Hamburg 2018 MeltdownReading Kernel Memory from User Space In 27th USENIX SecuritySymposium (USENIX Security 18)

[62] Fangfei Liu Yuval Yarom Qian Ge Gernot Heiser and Ruby BLee 2015 Last-Level Cache Side-Channel Attacks Are PracticalIn Proceedings of the 2015 IEEE Symposium on Security and Privacy(SP rsquo15) IEEE Computer Society Washington DC USA 605ndash622httpsdoiorg101109SP201543

[63] Jason Lowe-Power 2018 Visualizing Spectre with gem5 httpwwwlowepowercomjasonvisualizing-spectre-with-gem5html Onlineaccessed Jun 2018

[64] Jason Lowe-Power Venkatesh Akella Matthew K Farrens Samuel TKing and Christopher J Nitta 2018 Position Paper A Case forExposing Extra-architectural State in the ISA In Proceedings of the7th International Workshop on Hardware and Architectural Support forSecurity and Privacy (HASP rsquo18) ACM New York NY USA Article 86 pages httpsdoiorg10114532142923214300

[65] Scott A Mahlke David C Lin William Y Chen Richard E Hank andRoger A Bringmann 1992 Effective compiler support for predicatedexecution using the hyperblock In MICRO

[66] Giorgi Maisuradze and Christian Rossow 2018 Ret2Spec SpeculativeExecution Using Return Stack Buffers In Proceedings of the 2018 ACMSIGSAC Conference on Computer and Communications Security (CCSrsquo18) ACM New York NY USA 2109ndash2122 httpsdoiorg10114532437343243761

[67] Microsoft Security Response Center (MSRC) Matt Miller 2018Analysis and mitigation of speculative store bypass (CVE-2018-3639) httpsblogstechnetmicrosoftcomsrd20180521analysis-and-mitigation-of-speculative-store-bypass-cve-2018-3639 Onlineaccessed Jul 2018

[68] Keaton Mowery Sriram Keelveedhi and Hovav Shacham 2012 AreAES x86 cache timing attacks still feasible In Proceedings of the 2012ACM Workshop on Cloud computing security workshop

[69] Oleksii Oleksenko Bohdan Trach Tobias Reiher Mark Silbersteinand Christof Fetzer 2018 You Shall Not Bypass Employing datadependencies to prevent Bounds Check Bypass CoRR abs180508506(2018) arXiv180508506 httparxivorgabs180508506

[70] Yossef Oren Vasileios P Kemerlis Simha Sethumadhavan and An-gelos D Keromytis 2015 The Spy in the Sandbox Practical CacheAttacks in JavaScript and Their Implications In Proceedings of the22Nd ACM SIGSAC Conference on Computer and CommunicationsSecurity

[71] Guilherme Ottoni Thomas Hartin ChristopherWeaver Jason BrandtBelliappa Kuttanna and HongWang 2011 Harmonia A TransparentEfficient and Harmonious Dynamic Binary Translator Targeting theIntelampReg Architecture In Proceedings of the 8th ACM InternationalConference on Computing Frontiers (CF rsquo11) ACM New York NYUSA Article 26 10 pages httpsdoiorg10114520166042016635

[72] Andrew Pardoe 2018 Spectre mitigations in MSVC httpsblogsmsdnmicrosoftcomvcblog20180115spectre-mitigations-in-msvc Online accessed Jul 2018

[73] Joseph CH Park and Mike Schlansker 1991 On predicated executionHewlett-Packard Laboratories Palo Alto California

[74] Will Reese 2008 Nginx The High-performance Web Server andReverse Proxy Linux J 2008 173 Article 2 (Sept 2008) httpdlacmorgcitationcfmid=14122021412204

[75] Ryan Roemer Erik Buchanan Hovav Shacham and Stefan Savage2012 Return-oriented programming Systems languages and ap-plications ACM Transactions on Information and System Security(2012)

[76] Michael Schwarz Martin Schwarzl Moritz Lipp and Daniel Gruss2018 NetSpectre Read Arbitrary Memory over Network arXivpreprint arXiv180710535 (2018)

[77] Hovav Shacham Matthew Page Ben Pfaff Eu-Jin Goh NagendraModadugu and Dan Boneh 2004 On the Effectiveness of Address-space Randomization In Proceedings of the 11th ACM Conference onComputer and Communications Security (CCS rsquo04) ACM New YorkNY USA 298ndash307 httpsdoiorg10114510300831030124

[78] Julian Stecklina and Thomas Prescher 2018 LazyFP Leaking FPURegister State using Microarchitectural Side-Channels arXiv preprintarXiv180607480 (2018)

[79] G Edward Suh Jae W Lee David Zhang and Srinivas Devadas2004 Secure Program Execution via Dynamic Information FlowTracking In Proceedings of the 11th International Conference onArchitectural Support for Programming Languages and OperatingSystems (ASPLOS XI) ACM New York NY USA 85ndash96 httpsdoiorg10114510243931024404

[80] Mohammadkazem Taram Ashish Venkat and Dean Tullsen 2018Mobilizing the Micro-Ops Exploiting Context Sensitive Decodingfor Security and Energy Efficiency In Proceedings of the 45th AnnualInternational Symposium on Computer Architecture (ISCA rsquo18)

[81] Mohit Tiwari Hassan MG Wassel Bita Mazloom ShashidharMysore Frederic T Chong and Timothy Sherwood 2009 Com-plete Information Flow Tracking from the Gates Up In Proceedings ofthe 14th International Conference on Architectural Support for Program-ming Languages and Operating Systems (ASPLOS XIV) ACM NewYork NY USA 109ndash120 httpsdoiorg10114515082441508258

[82] Caroline Trippel Daniel Lustig and Margaret Martonosi 2018MeltdownPrime and SpectrePrime Automatically-Synthesized At-tacks Exploiting Invalidation-Based Coherence Protocols CoRRabs180203802 (2018) arXiv180203802 httparxivorgabs180203802

[83] DeanM Tullsen Susan J Eggers Joel S Emer Henry M Levy Jack L Loand Rebecca L Stamm 1996 Exploiting choice Instruction fetch andissue on an implementable simultaneous multithreading processor InACM SIGARCH Computer Architecture News Vol 24 ACM 191ndash202

[84] Dean M Tullsen Susan J Eggers and Henry M Levy 1995 Si-multaneous Multithreading Maximizing On-chip Parallelism InProceedings of the 22Nd Annual International Symposium on Com-puter Architecture (ISCA rsquo95) ACM New York NY USA 392ndash403httpsdoiorg101145223982224449

[85] Paul Turner 2018 Retpoline a software construct for preventingbranch-target-injection httpssupportgooglecomfaqsanswer7625886 Online accessed Jul 2018

[86] N Vachharajani M J Bridges J Chang R Rangan G Ottoni J ABlome G A Reis M Vachharajani and D I August 2004 RIFLEAn Architectural Framework for User-Centric Information-Flow Se-curity In Microarchitecture 2004 MICRO-37 2004 37th InternationalSymposium on 243ndash254 httpsdoiorg101109MICRO200431

[87] Jo Van Bulck Marina Minkin Ofir Weisse Daniel Genkin BarisKasikci Frank Piessens Mark Silberstein Thomas F Wenisch Yu-val Yarom and Raoul Strackx 2018 Foreshadow Extracting theKeys to the Intel SGX Kingdom with Transient Out-of-Order Execu-tion In Proceedings of the 27th USENIX Security Symposium USENIXAssociation See also technical report Foreshadow-NG [95]

[88] Ashish Venkat 2018 Breaking the ISA Barrier in Modern ComputingPhD Dissertation UC San Diego

[89] Ashish Venkat Arvind Krishnaswamy Koichi Yamada and RajanPalanivel 2015 Binary Translation driven Program State RelocationIn United States Patent Grant US009135435B2

[90] Ashish Venkat S Shamasunder Hovav Shacham and DeanM Tullsen2016 HIPStR Heterogeneous-ISA Program State Relocation In Pro-ceedings of the International Symposium on Architectural Support forProgramming Languages and Operating Systems

[91] Ashish Venkat and Dean M Tullsen 2014 Harnessing ISA diversityDesign of a heterogeneous-ISA chip multiprocessor In Proceedingsof the International Symposium on Computer Architecture

[92] G Venkataramani I Doudalis Y Solihin and M Prvulovic 2008FlexiTaint A programmable accelerator for dynamic taint propaga-tion In 2008 IEEE 14th International Symposium on High PerformanceComputer Architecture 173ndash184 httpsdoiorg101109HPCA20084658637

[93] Philipp Vogt Florian Nentwich Nenad Jovanovic Engin KirdaChristopher Kruegel and Giovanni Vigna 2007 Cross Site Script-ing Prevention with Dynamic Data Tainting and Static Analysis InNDSS Vol 2007 12

[94] Zhenghong Wang and Ruby B Lee 2006 Covert and side channelsdue to processor architecture In Computer Security Applications Con-ference 2006 ACSACrsquo06 22nd Annual IEEE 473ndash482

[95] Ofir Weisse Jo Van Bulck Marina Minkin Daniel Genkin BarisKasikci Frank Piessens Mark Silberstein Raoul Strackx Thomas FWenisch and Yuval Yarom 2018 Foreshadow-NG Breaking theVirtual Memory Abstraction with Transient Out-of-Order Execu-tion Technical report (2018) See also USENIX Security paper Fore-shadow [87]

[96] Troy Wolverton 2018 Spectre and Meltdown are now a le-gal pain for Intel the chip maker faces 35 lawsuits over the at-tacks httpswwwbusinessinsidercom35-lawsuits-have-been-filed-against-intel-over-spectre-and-meltdown-2018-2 Online ac-cessed Aug 2018

[97] Zhenyu Wu Zhang Xu and Haining Wang 2012 Whispers in theHyper-space High-speed Covert Channel Attacks in the Cloud InPresented as part of the 21st USENIX Security Symposium (USENIX Se-curity 12) USENIX Bellevue WA 159ndash173 httpswwwusenixorgconferenceusenixsecurity12technical-sessionspresentationwu

[98] Wei Xu Sandeep Bhatkar and Ramachandran Sekar 2006 Taint-Enhanced Policy Enforcement A Practical Approach to Defeat aWide Range of Attacks In USENIX Security Symposium 121ndash136

[99] M Yan J Choi D Skarlatos A Morrison C Fletcher and J Torrellas2018 InvisiSpec Making Speculative Execution Invisible in the CacheHierarchy In 2018 51st Annual IEEEACM International Symposium onMicroarchitecture (MICRO) 428ndash441 httpsdoiorg101109MICRO201800042

[100] Yuval Yarom and Katrina Falkner 2014 FLUSH+ RELOAD A HighResolution Low Noise L3 Cache Side-Channel Attack In USENIXSecurity

[101] Yuval Yarom Daniel Genkin and Nadia Heninger 2016 CacheBleedA timing attack on OpenSSL constant time RSA In InternationalConference on Cryptographic Hardware and Embedded Systems

[102] Alexander Yip Xi Wang Nickolai Zeldovich and M Frans Kaashoek2009 Improving Application Security with Data Flow AssertionsIn Proceedings of the ACM SIGOPS 22Nd Symposium on OperatingSystems Principles (SOSP rsquo09) ACM New York NY USA 291ndash304httpsdoiorg10114516295751629604

[103] Craig Zilles 2001 Linked List Traversal Micro-Benchmark httpzillescsillinoisedullubenchmarkhtml Online accessed Jun 2018

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
  • 3 Assumptions and Threat Model
  • 4 Architectural Overview
  • 5 Design and Implementation
    • 51 Microcode Customization
    • 52 Decoder-Level Information Flow Tracking
    • 53 Mitigations for Control-flow Mistraining
      • 6 Methodology
      • 7 Evaluation
        • 71 Security Discussion
        • 72 Performance
          • 8 Conclusion
          • Acknowledgments
          • References
Page 13: Context-Sensitive Fencing: Securing Speculative Execution ...av6ds/papers/asplos2019.pdf · Context-Sensitive Fencing: Securing Speculative Execution via Microcode Customization Mohammadkazem

pssje

ngbz

ip2

ping llu gc

c ls

nginx

omne

t

Speculative DIFT ndash Sfence

000

001

002

003

004

005

006

007

008

009

Rat

ioof

Under

-tai

nte

dLoa

ds

Figure 11 Coverage of DLIFT Undertainting Rate of Loads

omnet bzip2 llu ls nginx ping ps sjeng gcc

Speculative DIFT ndash Sfence

00

01

02

03

04

05

06

07

08

09

10

Tai

nte

dIn

tstr

uct

ions

Tainted Untatinted

Figure 12 Ratio of instructions marked as tainted by DLIFT

medium predictor with 512 BTB and 16 RAS entries and(c) a large predictor with 1024 BTB and 32 RAS entries Wemeasure an average performance degradation of 27 on oursmall predictor 66 on our medium predictor and 152 onour large predictor Naturally the overhead is almost com-pletely due to the loss of prediction accuracy rather thanthe cost of micro-op expansionIn summary the proposed defense strategy introduces a

flexible microcode customization framework that performthe surgical insertion of newly introduced speculation fencesthat mitigate five variants of the Spectre class of attacksreducing the fencing overhead of state-of-the-art fence-basedSpectre mitigations by a factor of 6

8 ConclusionIn this work we propose context-sensitive fencing (CSF) aset of architectural techniques that provide high-performancedefense against Spectre-style attacks In particular we showthat we can reduce fencing overhead by a factor of 6 com-pared to a conservative fence insertion method

This is done by injecting fence instructions dynamicallyin the decoder with no recompilation and binary translationrequired This allows us to employ runtime information tostrategically insert fences when needed using taint infor-mation and avoiding redundant fences after branches Thiswork also introduces new fence primitives which protectsensitive structures from speculation-based microarchitec-tural effects with minimal impact on instruction throughputin the pipeline

AcknowledgmentsThe authors would like to thank the anonymous reviewersfor their helpful insights This research was supported inpart by NSF Grant CNS-1652925 NSFIntel FoundationalMicroarchitecture Research Grant CCF-1823444 and DARPAunder the agreement number HR0011-18-C-0020

References[1] Onur Acıiccedilmez Ccediletin Kaya Koccedil and Jean-Pierre Seifert 2007 Predict-

ing secret keys via branch prediction In CryptographersacircĂŹ Track atthe RSA Conference

[2] Onur Acıiccedilmez Werner Schindler and Ccediletin K Koccedil 2007 Cachebased remote timing attack on the AES In CryptographersacircĂŹ Trackat the RSA Conference

[3] Shaizeen Aga and Satish Narayanasamy 2017 InvisiMem SmartMemory Defenses for Memory Bus Side Channel In Proceedings ofthe 44th Annual International Symposium on Computer Architecture(ISCA rsquo17) ACM New York NY USA 94ndash106 httpsdoiorg10114530798563080232

[4] AMD 2018 White paper SOFTWARE TECHNIQUES FOR MANAG-ING SPECULATION ON AMD PROCESSORS Technical Report RE-VISION 12418 httpsdeveloperamdcomwp-contentresourcesManaging-Speculation-on-AMD-Processorspdf

[5] Nadav Amit Muli Ben-Yehuda Dan Tsafrir and Assaf Schuster 2011vIOMMU Efficient IOMMU Emulation In USENIX Annual TechnicalConference

[6] ARM 2018 Whitepaper Cache Speculation Side-channelsTechnical Report Version 22 httpsdeveloperarmcom-mediadeveloperpdfSecurity20update201020July2018CacheSpeculationSide-channels-v22pdfrevision=7de26366-a49f-4c23-85f0-34e8f5e38881

[7] Eli Biham and Adi Shamir 1997 Differential fault analysis of secretkey cryptosystems In Annual International Cryptology Conference

[8] Nathan L Binkert Ronald G Dreslinski Lisa R Hsu Kevin T Lim Ali GSaidi and Steven K Reinhardt 2006 The M5 Simulator ModelingNetworked Systems Micro IEEE (2006)

[9] Darrell Boggs Gary Brown Nathan Tuck and K S Venkatraman2015 Denver Nvidiarsquos First 64-bit ARM Processor IEEE Micro (2015)

[10] Chandler Carruth 2018 Mitigating Speculative Attacks inCrypto httpsgithubcomHACS-workshopspectre-mitigationsblobmastercryptoguidelinesmd Online accessed Jul 2018

[11] Chandler Carruth 2018 RFC Speculative Load Hardening (aSpectre variant1 mitigation) httpsdocsgooglecomdocumentd1wwcfv3UV9ZnZVcGiGuoITT61eKo3TmoCS3uXLcJR0edit Onlineaccessed Jul 2018

[12] Chandler Carruth 2018 RFC Speculative Load Hardening (a Spectrevariant1 mitigation) httpslistsllvmorgpipermailllvm-dev2018-March122085html Online accessed Jul 2018

[13] Guoxing Chen Sanchuan Chen Yuan Xiao Yinqian Zhang ZhiqiangLin and Ten H Lai 2018 SgxPectre Attacks Leaking Enclave

Secrets via Speculative Execution CoRR abs180209085 (2018)arXiv180209085 httparxivorgabs180209085

[14] H Chen X Wu L Yuan B Zang P c Yew and F T Chong 2008From Speculation to Security Practical and Efficient InformationFlow Tracking Using Speculative Hardware In 2008 InternationalSymposium on Computer Architecture 401ndash412 httpsdoiorg101109ISCA200818

[15] Shimin Chen Phillip B GibbonsMichael Kozuch and Todd CMowry2011 Log-based Architectures Using Multicore to Help SoftwareBehave Correctly SIGOPS Oper Syst Rev 45 1 (Feb 2011) 84ndash91httpsdoiorg10114519450231945034

[16] Hari Cherupalli Henry Duwe Weidong Ye Rakesh Kumar and JohnSartori 2017 Software-based Gate-level Information Flow Securityfor IoT Systems In Proceedings of the 50th Annual IEEEACM Interna-tional Symposium on Microarchitecture (MICRO-50 rsquo17) ACM NewYork NY USA 328ndash340 httpsdoiorg10114531239393123955

[17] JonathanCorbet 2017 The current state of kernel page-table isolationhttpslwnnetArticles741878 Online accessed Jan 2018

[18] Marc L Corliss E Christopher Lewis and Amir Roth 2003 DISE Aprogrammable macro engine for customizing applications In Com-puter Architecture 2003 Proceedings 30th Annual International Sym-posium on

[19] Marc L Corliss E Christopher Lewis and Amir Roth 2005 Low-overhead interactive debugging via dynamic instrumentation withDISE In 11th International Symposium on High-Performance ComputerArchitecture

[20] Marc L Corliss E Christopher Lewis and Amir Roth 2005 UsingDISE to Protect Return Addresses from Attack SIGARCH ComputArchit News (March 2005)

[21] Jedidiah R Crandall and Frederic T Chong 2004 Minos Control DataAttack Prevention Orthogonal toMemoryModel In Proceedings of the37th Annual IEEEACM International Symposium on Microarchitecture(MICRO 37) IEEE Computer Society Washington DC USA 221ndash232httpsdoiorg101109MICRO200426

[22] John Criswell Nathan Dautenhahn and Vikram Adve 2014 VirtualGhost Protecting Applications from Hostile Operating Systems InProceedings of the 19th International Conference on Architectural Sup-port for Programming Languages and Operating Systems (ASPLOS rsquo14)ACM New York NY USA 81ndash96 httpsdoiorg10114525419402541986

[23] John Criswell Andrew Lenharth Dinakar Dhurjati and Vikram Adve2007 Secure Virtual Architecture A Safe Execution Environmentfor Commodity Operating Systems In Proceedings of Twenty-firstACM SIGOPS Symposium on Operating Systems Principles (SOSP rsquo07)ACM New York NY USA 351ndash366 httpsdoiorg10114512942611294295

[24] Michael Dalton Hari Kannan and Christos Kozyrakis 2007 RakshaA Flexible Information Flow Architecture for Software Security InProceedings of the 34th Annual International Symposium on ComputerArchitecture (ISCA rsquo07) ACM New York NY USA 482ndash493 httpsdoiorg10114512506621250722

[25] James C Dehnert Brian K Grant John P Banning Richard JohnsonThomas Kistler Alexander Klaiber and Jim Mattson 2003 The Trans-meta Code Morphing Software using speculation recovery andadaptive retranslation to address real-life challenges In Proceedingsof the international symposium on Code generation and optimizationfeedback-directed and runtime optimization

[26] Matthew DeVuyst Ashish Venkat and Dean M Tullsen 2012 Ex-ecution migration in a heterogeneous-isa chip multiprocessor InProceedings of the Seventeenth International Conference on Architec-tural Support for Programming Languages and Operating Systems

[27] Craig Disselkoen David Kohlbrenner Leo Porter and Dean Tullsen2017 Prime+Abort A Timer-Free High-Precision L3 CacheAttack using Intel TSX In 26th USENIX Security Symposium

(USENIX Security 17) USENIX Association Vancouver BC 51ndash67 httpswwwusenixorgconferenceusenixsecurity17technical-sessionspresentationdisselkoen

[28] Xiaowan Dong Zhuojia Shen John Criswell Alan Cox and SandhyaDwarkadas 2018 Spectres Virtual Ghosts and Hardware SupportIn Proceedings of the 7th International Workshop on Hardware andArchitectural Support for Security and Privacy (HASP rsquo18) ACM NewYork NY USA Article 5 9 pages httpsdoiorg10114532142923214297

[29] Xiaowan Dong Zhuojia Shen John Criswell Alan L Cox and Sand-hya Dwarkadas 2018 Shielding Software From Privileged Side-Channel Attacks In 27th USENIX Security Symposium (USENIX Se-curity 18) USENIX Association Baltimore MD 1441ndash1458 httpswwwusenixorgconferenceusenixsecurity18presentationdong

[30] Dmitry Evtyushkin Ryan Riley Nael CSE Abu-Ghazaleh ECE andDmitry Ponomarev 2018 BranchScope A New Side-Channel Attackon Directional Branch Predictor In Proceedings of the Twenty-ThirdInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems (ASPLOS rsquo18) ACM New York NYUSA 693ndash707 httpsdoiorg10114531731623173204

[31] Karine Gandolfi Christophe Mourtel and Francis Olivier 2001 Elec-tromagnetic analysis Concrete results In International Workshop onCryptographic Hardware and Embedded Systems

[32] Qian Ge Yuval Yarom David Cock and Gernot Heiser 2018 Asurvey of microarchitectural timing attacks and countermeasures oncontemporary hardware Journal of Cryptographic Engineering 8 1(2018) 1ndash27

[33] Daniel Genkin Adi Shamir and Eran Tromer 2014 RSA key ex-traction via low-bandwidth acoustic cryptanalysis In InternationalCryptology Conference

[34] Daniel Genkin Adi Shamir and Eran Tromer 2016 Acoustic Crypt-analysis Journal of Cryptology (2016)

[35] Github [n d] wrk - a HTTP benchmarking tool httpsgithubcomwgwrk Online accessed Jun 2018

[36] Daniel Gruss Moritz Lipp Michael Schwarz Richard Fellner Cleacute-mentine Maurice and Stefan Mangard 2017 Kaslr is dead long livekaslr In International Symposium on Engineering Secure Software andSystems Springer 161ndash176

[37] Daniel Gruss Raphael Spreitzer and Stefan Mangard 2015Cache Template Attacks Automating Attacks on InclusiveLast-Level Caches In 24th USENIX Security Symposium (USENIXSecurity 15) USENIX Association Washington DC 897ndash912httpswwwusenixorgconferenceusenixsecurity15technical-sessionspresentationgruss

[38] Maurice CleacutementineWagner Klaus Gruss Daniel and StefanMangard2016 Flush+Flush A Fast and Stealthy Cache Attack

[39] V Haldar D Chandra andM Franz 2005 Dynamic taint propagationfor Java In 21st Annual Computer Security Applications Conference(ACSACrsquo05) 9 ppndash311 httpsdoiorg101109CSAC200521

[40] W Halfond A Orso and P Manolios 2008 WASP Protecting WebApplications Using Positive Tainting and Syntax-Aware EvaluationIEEE Transactions on Software Engineering 34 1 (Jan 2008) 65ndash81httpsdoiorg101109TSE200770748

[41] John L Henning 2006 SPEC CPU2006 Benchmark DescriptionsSIGARCH Comput Archit News (Sept 2006)

[42] Jonathan J Hoch and Adi Shamir 2004 Fault analysis of streamciphers In International Workshop on Cryptographic Hardware andEmbedded Systems

[43] Intel 2008 2nd Generation Intel Core vPro Processor Family Tech-nical Report httpwwwintelcomcontentdamdocwhite-paperperformance-2nd-generation-core-vpro-family-paperpdf

[44] Intel 2014 Software guard extensions programming reference (2014)httpssoftwareintelcomsitesdefaultfiles329298-001pdf

[45] Intel 2018 White paper Intel Analysis of Speculative Exe-cution Side Channels Technical Report 336983-001 Revision10 httpsnewsroomintelcomwp-contentuploadssites11201801Intel-Analysis-of-Speculative-Execution-Side-Channelspdf

[46] Intel 2018 White paper Retpoline A Branch Target In-jection Mitigation Technical Report 337131-003 Revision003 httpssoftwareintelcomsitesdefaultfilesmanaged1d46Retpoline-A-Branch-Target-Injection-Mitigationpdf

[47] Intel Corporation 2009 Intelreg 64 and IA-32 Architectures OptimizationReference Manual

[48] Intel Corporation 2011 Intel 64 and IA-32 Architectures SoftwareDeveloperrsquos Manual Intel Corporation

[49] Project Zero Jann Horn 2018 Reading privileged memory with a side-channel httpsgoogleprojectzeroblogspotcom201801reading-privileged-memory-with-sidehtml Online accessed Jan 2018

[50] Morteza Mohajjel Kafshdooz Mohammadkazem Taram Sepehr As-sadi and Alireza Ejlali 2016 A Compile-Time Optimization MethodforWCET Reduction in Real-Time Embedded Systems Through BlockFormation ACMTrans Archit Code Optim 12 4 Article 66 (Jan 2016)25 pages httpsdoiorg1011452845083

[51] R Karri K Wu P Mishra and Yongkook Kim 2002 Concurrenterror detection schemes for fault-based side-channel cryptanalysisof symmetric block ciphers IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems 21 12 (Dec 2002) 1509ndash1517httpsdoiorg101109TCAD2002804378

[52] Mehmet Kayaalp Khaled N Khasawneh Hodjat Asghari EsfedenJesse Elwell Nael Abu-Ghazaleh Dmitry Ponomarev and AamerJaleel 2017 RIC Relaxed Inclusion Caches for Mitigating LLC Side-Channel Attacks In Proceedings of the 54th Annual Design AutomationConference 2017 (DAC rsquo17) ACM New York NY USA Article 76 pages httpsdoiorg10114530616393062313

[53] Khaled N Khasawneh Esmaeil Mohammadian Koruyeh ChengyuSong Dmitry Evtyushkin Dmitry Ponomarev and Nael Abu-Ghazaleh 2018 SafeSpec Banishing the Spectre of a Meltdown withLeakage-Free Speculation arXiv preprint arXiv180605179 (2018)

[54] Vladimir Kiriansky and Carl Waldspurger 2018 Speculative bufferoverflows Attacks and defenses arXiv preprint arXiv180703757(2018)

[55] Vladimir Kiriansky and Carl Waldspurger 2018 Speculative BufferOverflows Attacks and Defenses arXiv preprint arXiv180703757(2018)

[56] Paul Kocher 2018 Spectre Mitigations in MicrosoftrsquosCC++ Compiler httpswwwpaulkochercomdocMicrosoftCompilerSpectreMitigationhtml Online accessedJul 2018

[57] Paul Kocher Daniel Genkin Daniel Gruss Werner Haas MikeHamburg Moritz Lipp Stefan Mangard Thomas Prescher MichaelSchwarz and Yuval Yarom 2018 Spectre Attacks Exploiting Specu-lative Execution ArXiv e-prints (Jan 2018) arXiv180101203

[58] Esmaeil Mohammadian Koruyeh Khaled N Khasawneh ChengyuSong and Nael Abu-Ghazaleh 2018 Spectre Returns SpeculationAttacks using the Return Stack Buffer In 12th USENIXWorkshop on Of-fensive Technologies (WOOT 18) USENIX Association Baltimore MDhttpswwwusenixorgconferencewoot18presentationkoruyeh

[59] Esmaeil Mohammadian Koruyeh Khaled N Khasawneh ChengyuSong and Nael Abu-Ghazaleh 2018 Spectre Returns SpeculationAttacks using the Return Stack Buffer In 12th USENIXWorkshop on Of-fensive Technologies (WOOT 18) USENIX Association Baltimore MDhttpswwwusenixorgconferencewoot18presentationkoruyeh

[60] Monica S Lam Michael Martin Benjamin Livshits and John Wha-ley 2008 Securing Web Applications with Static and DynamicInformation Flow Tracking In Proceedings of the 2008 ACM SIG-PLAN Symposium on Partial Evaluation and Semantics-based Pro-gram Manipulation (PEPM rsquo08) ACM New York NY USA 3ndash12

httpsdoiorg10114513284081328410[61] Moritz Lipp Michael Schwarz Daniel Gruss Thomas Prescher

Werner Haas Anders Fogh Jann Horn Stefan Mangard Paul KocherDaniel Genkin Yuval Yarom and Mike Hamburg 2018 MeltdownReading Kernel Memory from User Space In 27th USENIX SecuritySymposium (USENIX Security 18)

[62] Fangfei Liu Yuval Yarom Qian Ge Gernot Heiser and Ruby BLee 2015 Last-Level Cache Side-Channel Attacks Are PracticalIn Proceedings of the 2015 IEEE Symposium on Security and Privacy(SP rsquo15) IEEE Computer Society Washington DC USA 605ndash622httpsdoiorg101109SP201543

[63] Jason Lowe-Power 2018 Visualizing Spectre with gem5 httpwwwlowepowercomjasonvisualizing-spectre-with-gem5html Onlineaccessed Jun 2018

[64] Jason Lowe-Power Venkatesh Akella Matthew K Farrens Samuel TKing and Christopher J Nitta 2018 Position Paper A Case forExposing Extra-architectural State in the ISA In Proceedings of the7th International Workshop on Hardware and Architectural Support forSecurity and Privacy (HASP rsquo18) ACM New York NY USA Article 86 pages httpsdoiorg10114532142923214300

[65] Scott A Mahlke David C Lin William Y Chen Richard E Hank andRoger A Bringmann 1992 Effective compiler support for predicatedexecution using the hyperblock In MICRO

[66] Giorgi Maisuradze and Christian Rossow 2018 Ret2Spec SpeculativeExecution Using Return Stack Buffers In Proceedings of the 2018 ACMSIGSAC Conference on Computer and Communications Security (CCSrsquo18) ACM New York NY USA 2109ndash2122 httpsdoiorg10114532437343243761

[67] Microsoft Security Response Center (MSRC) Matt Miller 2018Analysis and mitigation of speculative store bypass (CVE-2018-3639) httpsblogstechnetmicrosoftcomsrd20180521analysis-and-mitigation-of-speculative-store-bypass-cve-2018-3639 Onlineaccessed Jul 2018

[68] Keaton Mowery Sriram Keelveedhi and Hovav Shacham 2012 AreAES x86 cache timing attacks still feasible In Proceedings of the 2012ACM Workshop on Cloud computing security workshop

[69] Oleksii Oleksenko Bohdan Trach Tobias Reiher Mark Silbersteinand Christof Fetzer 2018 You Shall Not Bypass Employing datadependencies to prevent Bounds Check Bypass CoRR abs180508506(2018) arXiv180508506 httparxivorgabs180508506

[70] Yossef Oren Vasileios P Kemerlis Simha Sethumadhavan and An-gelos D Keromytis 2015 The Spy in the Sandbox Practical CacheAttacks in JavaScript and Their Implications In Proceedings of the22Nd ACM SIGSAC Conference on Computer and CommunicationsSecurity

[71] Guilherme Ottoni Thomas Hartin ChristopherWeaver Jason BrandtBelliappa Kuttanna and HongWang 2011 Harmonia A TransparentEfficient and Harmonious Dynamic Binary Translator Targeting theIntelampReg Architecture In Proceedings of the 8th ACM InternationalConference on Computing Frontiers (CF rsquo11) ACM New York NYUSA Article 26 10 pages httpsdoiorg10114520166042016635

[72] Andrew Pardoe 2018 Spectre mitigations in MSVC httpsblogsmsdnmicrosoftcomvcblog20180115spectre-mitigations-in-msvc Online accessed Jul 2018

[73] Joseph CH Park and Mike Schlansker 1991 On predicated executionHewlett-Packard Laboratories Palo Alto California

[74] Will Reese 2008 Nginx The High-performance Web Server andReverse Proxy Linux J 2008 173 Article 2 (Sept 2008) httpdlacmorgcitationcfmid=14122021412204

[75] Ryan Roemer Erik Buchanan Hovav Shacham and Stefan Savage2012 Return-oriented programming Systems languages and ap-plications ACM Transactions on Information and System Security(2012)

[76] Michael Schwarz Martin Schwarzl Moritz Lipp and Daniel Gruss2018 NetSpectre Read Arbitrary Memory over Network arXivpreprint arXiv180710535 (2018)

[77] Hovav Shacham Matthew Page Ben Pfaff Eu-Jin Goh NagendraModadugu and Dan Boneh 2004 On the Effectiveness of Address-space Randomization In Proceedings of the 11th ACM Conference onComputer and Communications Security (CCS rsquo04) ACM New YorkNY USA 298ndash307 httpsdoiorg10114510300831030124

[78] Julian Stecklina and Thomas Prescher 2018 LazyFP Leaking FPURegister State using Microarchitectural Side-Channels arXiv preprintarXiv180607480 (2018)

[79] G Edward Suh Jae W Lee David Zhang and Srinivas Devadas2004 Secure Program Execution via Dynamic Information FlowTracking In Proceedings of the 11th International Conference onArchitectural Support for Programming Languages and OperatingSystems (ASPLOS XI) ACM New York NY USA 85ndash96 httpsdoiorg10114510243931024404

[80] Mohammadkazem Taram Ashish Venkat and Dean Tullsen 2018Mobilizing the Micro-Ops Exploiting Context Sensitive Decodingfor Security and Energy Efficiency In Proceedings of the 45th AnnualInternational Symposium on Computer Architecture (ISCA rsquo18)

[81] Mohit Tiwari Hassan MG Wassel Bita Mazloom ShashidharMysore Frederic T Chong and Timothy Sherwood 2009 Com-plete Information Flow Tracking from the Gates Up In Proceedings ofthe 14th International Conference on Architectural Support for Program-ming Languages and Operating Systems (ASPLOS XIV) ACM NewYork NY USA 109ndash120 httpsdoiorg10114515082441508258

[82] Caroline Trippel Daniel Lustig and Margaret Martonosi 2018MeltdownPrime and SpectrePrime Automatically-Synthesized At-tacks Exploiting Invalidation-Based Coherence Protocols CoRRabs180203802 (2018) arXiv180203802 httparxivorgabs180203802

[83] DeanM Tullsen Susan J Eggers Joel S Emer Henry M Levy Jack L Loand Rebecca L Stamm 1996 Exploiting choice Instruction fetch andissue on an implementable simultaneous multithreading processor InACM SIGARCH Computer Architecture News Vol 24 ACM 191ndash202

[84] Dean M Tullsen Susan J Eggers and Henry M Levy 1995 Si-multaneous Multithreading Maximizing On-chip Parallelism InProceedings of the 22Nd Annual International Symposium on Com-puter Architecture (ISCA rsquo95) ACM New York NY USA 392ndash403httpsdoiorg101145223982224449

[85] Paul Turner 2018 Retpoline a software construct for preventingbranch-target-injection httpssupportgooglecomfaqsanswer7625886 Online accessed Jul 2018

[86] N Vachharajani M J Bridges J Chang R Rangan G Ottoni J ABlome G A Reis M Vachharajani and D I August 2004 RIFLEAn Architectural Framework for User-Centric Information-Flow Se-curity In Microarchitecture 2004 MICRO-37 2004 37th InternationalSymposium on 243ndash254 httpsdoiorg101109MICRO200431

[87] Jo Van Bulck Marina Minkin Ofir Weisse Daniel Genkin BarisKasikci Frank Piessens Mark Silberstein Thomas F Wenisch Yu-val Yarom and Raoul Strackx 2018 Foreshadow Extracting theKeys to the Intel SGX Kingdom with Transient Out-of-Order Execu-tion In Proceedings of the 27th USENIX Security Symposium USENIXAssociation See also technical report Foreshadow-NG [95]

[88] Ashish Venkat 2018 Breaking the ISA Barrier in Modern ComputingPhD Dissertation UC San Diego

[89] Ashish Venkat Arvind Krishnaswamy Koichi Yamada and RajanPalanivel 2015 Binary Translation driven Program State RelocationIn United States Patent Grant US009135435B2

[90] Ashish Venkat S Shamasunder Hovav Shacham and DeanM Tullsen2016 HIPStR Heterogeneous-ISA Program State Relocation In Pro-ceedings of the International Symposium on Architectural Support forProgramming Languages and Operating Systems

[91] Ashish Venkat and Dean M Tullsen 2014 Harnessing ISA diversityDesign of a heterogeneous-ISA chip multiprocessor In Proceedingsof the International Symposium on Computer Architecture

[92] G Venkataramani I Doudalis Y Solihin and M Prvulovic 2008FlexiTaint A programmable accelerator for dynamic taint propaga-tion In 2008 IEEE 14th International Symposium on High PerformanceComputer Architecture 173ndash184 httpsdoiorg101109HPCA20084658637

[93] Philipp Vogt Florian Nentwich Nenad Jovanovic Engin KirdaChristopher Kruegel and Giovanni Vigna 2007 Cross Site Script-ing Prevention with Dynamic Data Tainting and Static Analysis InNDSS Vol 2007 12

[94] Zhenghong Wang and Ruby B Lee 2006 Covert and side channelsdue to processor architecture In Computer Security Applications Con-ference 2006 ACSACrsquo06 22nd Annual IEEE 473ndash482

[95] Ofir Weisse Jo Van Bulck Marina Minkin Daniel Genkin BarisKasikci Frank Piessens Mark Silberstein Raoul Strackx Thomas FWenisch and Yuval Yarom 2018 Foreshadow-NG Breaking theVirtual Memory Abstraction with Transient Out-of-Order Execu-tion Technical report (2018) See also USENIX Security paper Fore-shadow [87]

[96] Troy Wolverton 2018 Spectre and Meltdown are now a le-gal pain for Intel the chip maker faces 35 lawsuits over the at-tacks httpswwwbusinessinsidercom35-lawsuits-have-been-filed-against-intel-over-spectre-and-meltdown-2018-2 Online ac-cessed Aug 2018

[97] Zhenyu Wu Zhang Xu and Haining Wang 2012 Whispers in theHyper-space High-speed Covert Channel Attacks in the Cloud InPresented as part of the 21st USENIX Security Symposium (USENIX Se-curity 12) USENIX Bellevue WA 159ndash173 httpswwwusenixorgconferenceusenixsecurity12technical-sessionspresentationwu

[98] Wei Xu Sandeep Bhatkar and Ramachandran Sekar 2006 Taint-Enhanced Policy Enforcement A Practical Approach to Defeat aWide Range of Attacks In USENIX Security Symposium 121ndash136

[99] M Yan J Choi D Skarlatos A Morrison C Fletcher and J Torrellas2018 InvisiSpec Making Speculative Execution Invisible in the CacheHierarchy In 2018 51st Annual IEEEACM International Symposium onMicroarchitecture (MICRO) 428ndash441 httpsdoiorg101109MICRO201800042

[100] Yuval Yarom and Katrina Falkner 2014 FLUSH+ RELOAD A HighResolution Low Noise L3 Cache Side-Channel Attack In USENIXSecurity

[101] Yuval Yarom Daniel Genkin and Nadia Heninger 2016 CacheBleedA timing attack on OpenSSL constant time RSA In InternationalConference on Cryptographic Hardware and Embedded Systems

[102] Alexander Yip Xi Wang Nickolai Zeldovich and M Frans Kaashoek2009 Improving Application Security with Data Flow AssertionsIn Proceedings of the ACM SIGOPS 22Nd Symposium on OperatingSystems Principles (SOSP rsquo09) ACM New York NY USA 291ndash304httpsdoiorg10114516295751629604

[103] Craig Zilles 2001 Linked List Traversal Micro-Benchmark httpzillescsillinoisedullubenchmarkhtml Online accessed Jun 2018

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
  • 3 Assumptions and Threat Model
  • 4 Architectural Overview
  • 5 Design and Implementation
    • 51 Microcode Customization
    • 52 Decoder-Level Information Flow Tracking
    • 53 Mitigations for Control-flow Mistraining
      • 6 Methodology
      • 7 Evaluation
        • 71 Security Discussion
        • 72 Performance
          • 8 Conclusion
          • Acknowledgments
          • References
Page 14: Context-Sensitive Fencing: Securing Speculative Execution ...av6ds/papers/asplos2019.pdf · Context-Sensitive Fencing: Securing Speculative Execution via Microcode Customization Mohammadkazem

Secrets via Speculative Execution CoRR abs180209085 (2018)arXiv180209085 httparxivorgabs180209085

[14] H Chen X Wu L Yuan B Zang P c Yew and F T Chong 2008From Speculation to Security Practical and Efficient InformationFlow Tracking Using Speculative Hardware In 2008 InternationalSymposium on Computer Architecture 401ndash412 httpsdoiorg101109ISCA200818

[15] Shimin Chen Phillip B GibbonsMichael Kozuch and Todd CMowry2011 Log-based Architectures Using Multicore to Help SoftwareBehave Correctly SIGOPS Oper Syst Rev 45 1 (Feb 2011) 84ndash91httpsdoiorg10114519450231945034

[16] Hari Cherupalli Henry Duwe Weidong Ye Rakesh Kumar and JohnSartori 2017 Software-based Gate-level Information Flow Securityfor IoT Systems In Proceedings of the 50th Annual IEEEACM Interna-tional Symposium on Microarchitecture (MICRO-50 rsquo17) ACM NewYork NY USA 328ndash340 httpsdoiorg10114531239393123955

[17] JonathanCorbet 2017 The current state of kernel page-table isolationhttpslwnnetArticles741878 Online accessed Jan 2018

[18] Marc L Corliss E Christopher Lewis and Amir Roth 2003 DISE Aprogrammable macro engine for customizing applications In Com-puter Architecture 2003 Proceedings 30th Annual International Sym-posium on

[19] Marc L Corliss E Christopher Lewis and Amir Roth 2005 Low-overhead interactive debugging via dynamic instrumentation withDISE In 11th International Symposium on High-Performance ComputerArchitecture

[20] Marc L Corliss E Christopher Lewis and Amir Roth 2005 UsingDISE to Protect Return Addresses from Attack SIGARCH ComputArchit News (March 2005)

[21] Jedidiah R Crandall and Frederic T Chong 2004 Minos Control DataAttack Prevention Orthogonal toMemoryModel In Proceedings of the37th Annual IEEEACM International Symposium on Microarchitecture(MICRO 37) IEEE Computer Society Washington DC USA 221ndash232httpsdoiorg101109MICRO200426

[22] John Criswell Nathan Dautenhahn and Vikram Adve 2014 VirtualGhost Protecting Applications from Hostile Operating Systems InProceedings of the 19th International Conference on Architectural Sup-port for Programming Languages and Operating Systems (ASPLOS rsquo14)ACM New York NY USA 81ndash96 httpsdoiorg10114525419402541986

[23] John Criswell Andrew Lenharth Dinakar Dhurjati and Vikram Adve2007 Secure Virtual Architecture A Safe Execution Environmentfor Commodity Operating Systems In Proceedings of Twenty-firstACM SIGOPS Symposium on Operating Systems Principles (SOSP rsquo07)ACM New York NY USA 351ndash366 httpsdoiorg10114512942611294295

[24] Michael Dalton Hari Kannan and Christos Kozyrakis 2007 RakshaA Flexible Information Flow Architecture for Software Security InProceedings of the 34th Annual International Symposium on ComputerArchitecture (ISCA rsquo07) ACM New York NY USA 482ndash493 httpsdoiorg10114512506621250722

[25] James C Dehnert Brian K Grant John P Banning Richard JohnsonThomas Kistler Alexander Klaiber and Jim Mattson 2003 The Trans-meta Code Morphing Software using speculation recovery andadaptive retranslation to address real-life challenges In Proceedingsof the international symposium on Code generation and optimizationfeedback-directed and runtime optimization

[26] Matthew DeVuyst Ashish Venkat and Dean M Tullsen 2012 Ex-ecution migration in a heterogeneous-isa chip multiprocessor InProceedings of the Seventeenth International Conference on Architec-tural Support for Programming Languages and Operating Systems

[27] Craig Disselkoen David Kohlbrenner Leo Porter and Dean Tullsen2017 Prime+Abort A Timer-Free High-Precision L3 CacheAttack using Intel TSX In 26th USENIX Security Symposium

(USENIX Security 17) USENIX Association Vancouver BC 51ndash67 httpswwwusenixorgconferenceusenixsecurity17technical-sessionspresentationdisselkoen

[28] Xiaowan Dong Zhuojia Shen John Criswell Alan Cox and SandhyaDwarkadas 2018 Spectres Virtual Ghosts and Hardware SupportIn Proceedings of the 7th International Workshop on Hardware andArchitectural Support for Security and Privacy (HASP rsquo18) ACM NewYork NY USA Article 5 9 pages httpsdoiorg10114532142923214297

[29] Xiaowan Dong Zhuojia Shen John Criswell Alan L Cox and Sand-hya Dwarkadas 2018 Shielding Software From Privileged Side-Channel Attacks In 27th USENIX Security Symposium (USENIX Se-curity 18) USENIX Association Baltimore MD 1441ndash1458 httpswwwusenixorgconferenceusenixsecurity18presentationdong

[30] Dmitry Evtyushkin Ryan Riley Nael CSE Abu-Ghazaleh ECE andDmitry Ponomarev 2018 BranchScope A New Side-Channel Attackon Directional Branch Predictor In Proceedings of the Twenty-ThirdInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems (ASPLOS rsquo18) ACM New York NYUSA 693ndash707 httpsdoiorg10114531731623173204

[31] Karine Gandolfi Christophe Mourtel and Francis Olivier 2001 Elec-tromagnetic analysis Concrete results In International Workshop onCryptographic Hardware and Embedded Systems

[32] Qian Ge Yuval Yarom David Cock and Gernot Heiser 2018 Asurvey of microarchitectural timing attacks and countermeasures oncontemporary hardware Journal of Cryptographic Engineering 8 1(2018) 1ndash27

[33] Daniel Genkin Adi Shamir and Eran Tromer 2014 RSA key ex-traction via low-bandwidth acoustic cryptanalysis In InternationalCryptology Conference

[34] Daniel Genkin Adi Shamir and Eran Tromer 2016 Acoustic Crypt-analysis Journal of Cryptology (2016)

[35] Github [n d] wrk - a HTTP benchmarking tool httpsgithubcomwgwrk Online accessed Jun 2018

[36] Daniel Gruss Moritz Lipp Michael Schwarz Richard Fellner Cleacute-mentine Maurice and Stefan Mangard 2017 Kaslr is dead long livekaslr In International Symposium on Engineering Secure Software andSystems Springer 161ndash176

[37] Daniel Gruss Raphael Spreitzer and Stefan Mangard 2015Cache Template Attacks Automating Attacks on InclusiveLast-Level Caches In 24th USENIX Security Symposium (USENIXSecurity 15) USENIX Association Washington DC 897ndash912httpswwwusenixorgconferenceusenixsecurity15technical-sessionspresentationgruss

[38] Maurice CleacutementineWagner Klaus Gruss Daniel and StefanMangard2016 Flush+Flush A Fast and Stealthy Cache Attack

[39] V Haldar D Chandra andM Franz 2005 Dynamic taint propagationfor Java In 21st Annual Computer Security Applications Conference(ACSACrsquo05) 9 ppndash311 httpsdoiorg101109CSAC200521

[40] W Halfond A Orso and P Manolios 2008 WASP Protecting WebApplications Using Positive Tainting and Syntax-Aware EvaluationIEEE Transactions on Software Engineering 34 1 (Jan 2008) 65ndash81httpsdoiorg101109TSE200770748

[41] John L Henning 2006 SPEC CPU2006 Benchmark DescriptionsSIGARCH Comput Archit News (Sept 2006)

[42] Jonathan J Hoch and Adi Shamir 2004 Fault analysis of streamciphers In International Workshop on Cryptographic Hardware andEmbedded Systems

[43] Intel 2008 2nd Generation Intel Core vPro Processor Family Tech-nical Report httpwwwintelcomcontentdamdocwhite-paperperformance-2nd-generation-core-vpro-family-paperpdf

[44] Intel 2014 Software guard extensions programming reference (2014)httpssoftwareintelcomsitesdefaultfiles329298-001pdf

[45] Intel 2018 White paper Intel Analysis of Speculative Exe-cution Side Channels Technical Report 336983-001 Revision10 httpsnewsroomintelcomwp-contentuploadssites11201801Intel-Analysis-of-Speculative-Execution-Side-Channelspdf

[46] Intel 2018 White paper Retpoline A Branch Target In-jection Mitigation Technical Report 337131-003 Revision003 httpssoftwareintelcomsitesdefaultfilesmanaged1d46Retpoline-A-Branch-Target-Injection-Mitigationpdf

[47] Intel Corporation 2009 Intelreg 64 and IA-32 Architectures OptimizationReference Manual

[48] Intel Corporation 2011 Intel 64 and IA-32 Architectures SoftwareDeveloperrsquos Manual Intel Corporation

[49] Project Zero Jann Horn 2018 Reading privileged memory with a side-channel httpsgoogleprojectzeroblogspotcom201801reading-privileged-memory-with-sidehtml Online accessed Jan 2018

[50] Morteza Mohajjel Kafshdooz Mohammadkazem Taram Sepehr As-sadi and Alireza Ejlali 2016 A Compile-Time Optimization MethodforWCET Reduction in Real-Time Embedded Systems Through BlockFormation ACMTrans Archit Code Optim 12 4 Article 66 (Jan 2016)25 pages httpsdoiorg1011452845083

[51] R Karri K Wu P Mishra and Yongkook Kim 2002 Concurrenterror detection schemes for fault-based side-channel cryptanalysisof symmetric block ciphers IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems 21 12 (Dec 2002) 1509ndash1517httpsdoiorg101109TCAD2002804378

[52] Mehmet Kayaalp Khaled N Khasawneh Hodjat Asghari EsfedenJesse Elwell Nael Abu-Ghazaleh Dmitry Ponomarev and AamerJaleel 2017 RIC Relaxed Inclusion Caches for Mitigating LLC Side-Channel Attacks In Proceedings of the 54th Annual Design AutomationConference 2017 (DAC rsquo17) ACM New York NY USA Article 76 pages httpsdoiorg10114530616393062313

[53] Khaled N Khasawneh Esmaeil Mohammadian Koruyeh ChengyuSong Dmitry Evtyushkin Dmitry Ponomarev and Nael Abu-Ghazaleh 2018 SafeSpec Banishing the Spectre of a Meltdown withLeakage-Free Speculation arXiv preprint arXiv180605179 (2018)

[54] Vladimir Kiriansky and Carl Waldspurger 2018 Speculative bufferoverflows Attacks and defenses arXiv preprint arXiv180703757(2018)

[55] Vladimir Kiriansky and Carl Waldspurger 2018 Speculative BufferOverflows Attacks and Defenses arXiv preprint arXiv180703757(2018)

[56] Paul Kocher 2018 Spectre Mitigations in MicrosoftrsquosCC++ Compiler httpswwwpaulkochercomdocMicrosoftCompilerSpectreMitigationhtml Online accessedJul 2018

[57] Paul Kocher Daniel Genkin Daniel Gruss Werner Haas MikeHamburg Moritz Lipp Stefan Mangard Thomas Prescher MichaelSchwarz and Yuval Yarom 2018 Spectre Attacks Exploiting Specu-lative Execution ArXiv e-prints (Jan 2018) arXiv180101203

[58] Esmaeil Mohammadian Koruyeh Khaled N Khasawneh ChengyuSong and Nael Abu-Ghazaleh 2018 Spectre Returns SpeculationAttacks using the Return Stack Buffer In 12th USENIXWorkshop on Of-fensive Technologies (WOOT 18) USENIX Association Baltimore MDhttpswwwusenixorgconferencewoot18presentationkoruyeh

[59] Esmaeil Mohammadian Koruyeh Khaled N Khasawneh ChengyuSong and Nael Abu-Ghazaleh 2018 Spectre Returns SpeculationAttacks using the Return Stack Buffer In 12th USENIXWorkshop on Of-fensive Technologies (WOOT 18) USENIX Association Baltimore MDhttpswwwusenixorgconferencewoot18presentationkoruyeh

[60] Monica S Lam Michael Martin Benjamin Livshits and John Wha-ley 2008 Securing Web Applications with Static and DynamicInformation Flow Tracking In Proceedings of the 2008 ACM SIG-PLAN Symposium on Partial Evaluation and Semantics-based Pro-gram Manipulation (PEPM rsquo08) ACM New York NY USA 3ndash12

httpsdoiorg10114513284081328410[61] Moritz Lipp Michael Schwarz Daniel Gruss Thomas Prescher

Werner Haas Anders Fogh Jann Horn Stefan Mangard Paul KocherDaniel Genkin Yuval Yarom and Mike Hamburg 2018 MeltdownReading Kernel Memory from User Space In 27th USENIX SecuritySymposium (USENIX Security 18)

[62] Fangfei Liu Yuval Yarom Qian Ge Gernot Heiser and Ruby BLee 2015 Last-Level Cache Side-Channel Attacks Are PracticalIn Proceedings of the 2015 IEEE Symposium on Security and Privacy(SP rsquo15) IEEE Computer Society Washington DC USA 605ndash622httpsdoiorg101109SP201543

[63] Jason Lowe-Power 2018 Visualizing Spectre with gem5 httpwwwlowepowercomjasonvisualizing-spectre-with-gem5html Onlineaccessed Jun 2018

[64] Jason Lowe-Power Venkatesh Akella Matthew K Farrens Samuel TKing and Christopher J Nitta 2018 Position Paper A Case forExposing Extra-architectural State in the ISA In Proceedings of the7th International Workshop on Hardware and Architectural Support forSecurity and Privacy (HASP rsquo18) ACM New York NY USA Article 86 pages httpsdoiorg10114532142923214300

[65] Scott A Mahlke David C Lin William Y Chen Richard E Hank andRoger A Bringmann 1992 Effective compiler support for predicatedexecution using the hyperblock In MICRO

[66] Giorgi Maisuradze and Christian Rossow 2018 Ret2Spec SpeculativeExecution Using Return Stack Buffers In Proceedings of the 2018 ACMSIGSAC Conference on Computer and Communications Security (CCSrsquo18) ACM New York NY USA 2109ndash2122 httpsdoiorg10114532437343243761

[67] Microsoft Security Response Center (MSRC) Matt Miller 2018Analysis and mitigation of speculative store bypass (CVE-2018-3639) httpsblogstechnetmicrosoftcomsrd20180521analysis-and-mitigation-of-speculative-store-bypass-cve-2018-3639 Onlineaccessed Jul 2018

[68] Keaton Mowery Sriram Keelveedhi and Hovav Shacham 2012 AreAES x86 cache timing attacks still feasible In Proceedings of the 2012ACM Workshop on Cloud computing security workshop

[69] Oleksii Oleksenko Bohdan Trach Tobias Reiher Mark Silbersteinand Christof Fetzer 2018 You Shall Not Bypass Employing datadependencies to prevent Bounds Check Bypass CoRR abs180508506(2018) arXiv180508506 httparxivorgabs180508506

[70] Yossef Oren Vasileios P Kemerlis Simha Sethumadhavan and An-gelos D Keromytis 2015 The Spy in the Sandbox Practical CacheAttacks in JavaScript and Their Implications In Proceedings of the22Nd ACM SIGSAC Conference on Computer and CommunicationsSecurity

[71] Guilherme Ottoni Thomas Hartin ChristopherWeaver Jason BrandtBelliappa Kuttanna and HongWang 2011 Harmonia A TransparentEfficient and Harmonious Dynamic Binary Translator Targeting theIntelampReg Architecture In Proceedings of the 8th ACM InternationalConference on Computing Frontiers (CF rsquo11) ACM New York NYUSA Article 26 10 pages httpsdoiorg10114520166042016635

[72] Andrew Pardoe 2018 Spectre mitigations in MSVC httpsblogsmsdnmicrosoftcomvcblog20180115spectre-mitigations-in-msvc Online accessed Jul 2018

[73] Joseph CH Park and Mike Schlansker 1991 On predicated executionHewlett-Packard Laboratories Palo Alto California

[74] Will Reese 2008 Nginx The High-performance Web Server andReverse Proxy Linux J 2008 173 Article 2 (Sept 2008) httpdlacmorgcitationcfmid=14122021412204

[75] Ryan Roemer Erik Buchanan Hovav Shacham and Stefan Savage2012 Return-oriented programming Systems languages and ap-plications ACM Transactions on Information and System Security(2012)

[76] Michael Schwarz Martin Schwarzl Moritz Lipp and Daniel Gruss2018 NetSpectre Read Arbitrary Memory over Network arXivpreprint arXiv180710535 (2018)

[77] Hovav Shacham Matthew Page Ben Pfaff Eu-Jin Goh NagendraModadugu and Dan Boneh 2004 On the Effectiveness of Address-space Randomization In Proceedings of the 11th ACM Conference onComputer and Communications Security (CCS rsquo04) ACM New YorkNY USA 298ndash307 httpsdoiorg10114510300831030124

[78] Julian Stecklina and Thomas Prescher 2018 LazyFP Leaking FPURegister State using Microarchitectural Side-Channels arXiv preprintarXiv180607480 (2018)

[79] G Edward Suh Jae W Lee David Zhang and Srinivas Devadas2004 Secure Program Execution via Dynamic Information FlowTracking In Proceedings of the 11th International Conference onArchitectural Support for Programming Languages and OperatingSystems (ASPLOS XI) ACM New York NY USA 85ndash96 httpsdoiorg10114510243931024404

[80] Mohammadkazem Taram Ashish Venkat and Dean Tullsen 2018Mobilizing the Micro-Ops Exploiting Context Sensitive Decodingfor Security and Energy Efficiency In Proceedings of the 45th AnnualInternational Symposium on Computer Architecture (ISCA rsquo18)

[81] Mohit Tiwari Hassan MG Wassel Bita Mazloom ShashidharMysore Frederic T Chong and Timothy Sherwood 2009 Com-plete Information Flow Tracking from the Gates Up In Proceedings ofthe 14th International Conference on Architectural Support for Program-ming Languages and Operating Systems (ASPLOS XIV) ACM NewYork NY USA 109ndash120 httpsdoiorg10114515082441508258

[82] Caroline Trippel Daniel Lustig and Margaret Martonosi 2018MeltdownPrime and SpectrePrime Automatically-Synthesized At-tacks Exploiting Invalidation-Based Coherence Protocols CoRRabs180203802 (2018) arXiv180203802 httparxivorgabs180203802

[83] DeanM Tullsen Susan J Eggers Joel S Emer Henry M Levy Jack L Loand Rebecca L Stamm 1996 Exploiting choice Instruction fetch andissue on an implementable simultaneous multithreading processor InACM SIGARCH Computer Architecture News Vol 24 ACM 191ndash202

[84] Dean M Tullsen Susan J Eggers and Henry M Levy 1995 Si-multaneous Multithreading Maximizing On-chip Parallelism InProceedings of the 22Nd Annual International Symposium on Com-puter Architecture (ISCA rsquo95) ACM New York NY USA 392ndash403httpsdoiorg101145223982224449

[85] Paul Turner 2018 Retpoline a software construct for preventingbranch-target-injection httpssupportgooglecomfaqsanswer7625886 Online accessed Jul 2018

[86] N Vachharajani M J Bridges J Chang R Rangan G Ottoni J ABlome G A Reis M Vachharajani and D I August 2004 RIFLEAn Architectural Framework for User-Centric Information-Flow Se-curity In Microarchitecture 2004 MICRO-37 2004 37th InternationalSymposium on 243ndash254 httpsdoiorg101109MICRO200431

[87] Jo Van Bulck Marina Minkin Ofir Weisse Daniel Genkin BarisKasikci Frank Piessens Mark Silberstein Thomas F Wenisch Yu-val Yarom and Raoul Strackx 2018 Foreshadow Extracting theKeys to the Intel SGX Kingdom with Transient Out-of-Order Execu-tion In Proceedings of the 27th USENIX Security Symposium USENIXAssociation See also technical report Foreshadow-NG [95]

[88] Ashish Venkat 2018 Breaking the ISA Barrier in Modern ComputingPhD Dissertation UC San Diego

[89] Ashish Venkat Arvind Krishnaswamy Koichi Yamada and RajanPalanivel 2015 Binary Translation driven Program State RelocationIn United States Patent Grant US009135435B2

[90] Ashish Venkat S Shamasunder Hovav Shacham and DeanM Tullsen2016 HIPStR Heterogeneous-ISA Program State Relocation In Pro-ceedings of the International Symposium on Architectural Support forProgramming Languages and Operating Systems

[91] Ashish Venkat and Dean M Tullsen 2014 Harnessing ISA diversityDesign of a heterogeneous-ISA chip multiprocessor In Proceedingsof the International Symposium on Computer Architecture

[92] G Venkataramani I Doudalis Y Solihin and M Prvulovic 2008FlexiTaint A programmable accelerator for dynamic taint propaga-tion In 2008 IEEE 14th International Symposium on High PerformanceComputer Architecture 173ndash184 httpsdoiorg101109HPCA20084658637

[93] Philipp Vogt Florian Nentwich Nenad Jovanovic Engin KirdaChristopher Kruegel and Giovanni Vigna 2007 Cross Site Script-ing Prevention with Dynamic Data Tainting and Static Analysis InNDSS Vol 2007 12

[94] Zhenghong Wang and Ruby B Lee 2006 Covert and side channelsdue to processor architecture In Computer Security Applications Con-ference 2006 ACSACrsquo06 22nd Annual IEEE 473ndash482

[95] Ofir Weisse Jo Van Bulck Marina Minkin Daniel Genkin BarisKasikci Frank Piessens Mark Silberstein Raoul Strackx Thomas FWenisch and Yuval Yarom 2018 Foreshadow-NG Breaking theVirtual Memory Abstraction with Transient Out-of-Order Execu-tion Technical report (2018) See also USENIX Security paper Fore-shadow [87]

[96] Troy Wolverton 2018 Spectre and Meltdown are now a le-gal pain for Intel the chip maker faces 35 lawsuits over the at-tacks httpswwwbusinessinsidercom35-lawsuits-have-been-filed-against-intel-over-spectre-and-meltdown-2018-2 Online ac-cessed Aug 2018

[97] Zhenyu Wu Zhang Xu and Haining Wang 2012 Whispers in theHyper-space High-speed Covert Channel Attacks in the Cloud InPresented as part of the 21st USENIX Security Symposium (USENIX Se-curity 12) USENIX Bellevue WA 159ndash173 httpswwwusenixorgconferenceusenixsecurity12technical-sessionspresentationwu

[98] Wei Xu Sandeep Bhatkar and Ramachandran Sekar 2006 Taint-Enhanced Policy Enforcement A Practical Approach to Defeat aWide Range of Attacks In USENIX Security Symposium 121ndash136

[99] M Yan J Choi D Skarlatos A Morrison C Fletcher and J Torrellas2018 InvisiSpec Making Speculative Execution Invisible in the CacheHierarchy In 2018 51st Annual IEEEACM International Symposium onMicroarchitecture (MICRO) 428ndash441 httpsdoiorg101109MICRO201800042

[100] Yuval Yarom and Katrina Falkner 2014 FLUSH+ RELOAD A HighResolution Low Noise L3 Cache Side-Channel Attack In USENIXSecurity

[101] Yuval Yarom Daniel Genkin and Nadia Heninger 2016 CacheBleedA timing attack on OpenSSL constant time RSA In InternationalConference on Cryptographic Hardware and Embedded Systems

[102] Alexander Yip Xi Wang Nickolai Zeldovich and M Frans Kaashoek2009 Improving Application Security with Data Flow AssertionsIn Proceedings of the ACM SIGOPS 22Nd Symposium on OperatingSystems Principles (SOSP rsquo09) ACM New York NY USA 291ndash304httpsdoiorg10114516295751629604

[103] Craig Zilles 2001 Linked List Traversal Micro-Benchmark httpzillescsillinoisedullubenchmarkhtml Online accessed Jun 2018

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
  • 3 Assumptions and Threat Model
  • 4 Architectural Overview
  • 5 Design and Implementation
    • 51 Microcode Customization
    • 52 Decoder-Level Information Flow Tracking
    • 53 Mitigations for Control-flow Mistraining
      • 6 Methodology
      • 7 Evaluation
        • 71 Security Discussion
        • 72 Performance
          • 8 Conclusion
          • Acknowledgments
          • References
Page 15: Context-Sensitive Fencing: Securing Speculative Execution ...av6ds/papers/asplos2019.pdf · Context-Sensitive Fencing: Securing Speculative Execution via Microcode Customization Mohammadkazem

[45] Intel 2018 White paper Intel Analysis of Speculative Exe-cution Side Channels Technical Report 336983-001 Revision10 httpsnewsroomintelcomwp-contentuploadssites11201801Intel-Analysis-of-Speculative-Execution-Side-Channelspdf

[46] Intel 2018 White paper Retpoline A Branch Target In-jection Mitigation Technical Report 337131-003 Revision003 httpssoftwareintelcomsitesdefaultfilesmanaged1d46Retpoline-A-Branch-Target-Injection-Mitigationpdf

[47] Intel Corporation 2009 Intelreg 64 and IA-32 Architectures OptimizationReference Manual

[48] Intel Corporation 2011 Intel 64 and IA-32 Architectures SoftwareDeveloperrsquos Manual Intel Corporation

[49] Project Zero Jann Horn 2018 Reading privileged memory with a side-channel httpsgoogleprojectzeroblogspotcom201801reading-privileged-memory-with-sidehtml Online accessed Jan 2018

[50] Morteza Mohajjel Kafshdooz Mohammadkazem Taram Sepehr As-sadi and Alireza Ejlali 2016 A Compile-Time Optimization MethodforWCET Reduction in Real-Time Embedded Systems Through BlockFormation ACMTrans Archit Code Optim 12 4 Article 66 (Jan 2016)25 pages httpsdoiorg1011452845083

[51] R Karri K Wu P Mishra and Yongkook Kim 2002 Concurrenterror detection schemes for fault-based side-channel cryptanalysisof symmetric block ciphers IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems 21 12 (Dec 2002) 1509ndash1517httpsdoiorg101109TCAD2002804378

[52] Mehmet Kayaalp Khaled N Khasawneh Hodjat Asghari EsfedenJesse Elwell Nael Abu-Ghazaleh Dmitry Ponomarev and AamerJaleel 2017 RIC Relaxed Inclusion Caches for Mitigating LLC Side-Channel Attacks In Proceedings of the 54th Annual Design AutomationConference 2017 (DAC rsquo17) ACM New York NY USA Article 76 pages httpsdoiorg10114530616393062313

[53] Khaled N Khasawneh Esmaeil Mohammadian Koruyeh ChengyuSong Dmitry Evtyushkin Dmitry Ponomarev and Nael Abu-Ghazaleh 2018 SafeSpec Banishing the Spectre of a Meltdown withLeakage-Free Speculation arXiv preprint arXiv180605179 (2018)

[54] Vladimir Kiriansky and Carl Waldspurger 2018 Speculative bufferoverflows Attacks and defenses arXiv preprint arXiv180703757(2018)

[55] Vladimir Kiriansky and Carl Waldspurger 2018 Speculative BufferOverflows Attacks and Defenses arXiv preprint arXiv180703757(2018)

[56] Paul Kocher 2018 Spectre Mitigations in MicrosoftrsquosCC++ Compiler httpswwwpaulkochercomdocMicrosoftCompilerSpectreMitigationhtml Online accessedJul 2018

[57] Paul Kocher Daniel Genkin Daniel Gruss Werner Haas MikeHamburg Moritz Lipp Stefan Mangard Thomas Prescher MichaelSchwarz and Yuval Yarom 2018 Spectre Attacks Exploiting Specu-lative Execution ArXiv e-prints (Jan 2018) arXiv180101203

[58] Esmaeil Mohammadian Koruyeh Khaled N Khasawneh ChengyuSong and Nael Abu-Ghazaleh 2018 Spectre Returns SpeculationAttacks using the Return Stack Buffer In 12th USENIXWorkshop on Of-fensive Technologies (WOOT 18) USENIX Association Baltimore MDhttpswwwusenixorgconferencewoot18presentationkoruyeh

[59] Esmaeil Mohammadian Koruyeh Khaled N Khasawneh ChengyuSong and Nael Abu-Ghazaleh 2018 Spectre Returns SpeculationAttacks using the Return Stack Buffer In 12th USENIXWorkshop on Of-fensive Technologies (WOOT 18) USENIX Association Baltimore MDhttpswwwusenixorgconferencewoot18presentationkoruyeh

[60] Monica S Lam Michael Martin Benjamin Livshits and John Wha-ley 2008 Securing Web Applications with Static and DynamicInformation Flow Tracking In Proceedings of the 2008 ACM SIG-PLAN Symposium on Partial Evaluation and Semantics-based Pro-gram Manipulation (PEPM rsquo08) ACM New York NY USA 3ndash12

httpsdoiorg10114513284081328410[61] Moritz Lipp Michael Schwarz Daniel Gruss Thomas Prescher

Werner Haas Anders Fogh Jann Horn Stefan Mangard Paul KocherDaniel Genkin Yuval Yarom and Mike Hamburg 2018 MeltdownReading Kernel Memory from User Space In 27th USENIX SecuritySymposium (USENIX Security 18)

[62] Fangfei Liu Yuval Yarom Qian Ge Gernot Heiser and Ruby BLee 2015 Last-Level Cache Side-Channel Attacks Are PracticalIn Proceedings of the 2015 IEEE Symposium on Security and Privacy(SP rsquo15) IEEE Computer Society Washington DC USA 605ndash622httpsdoiorg101109SP201543

[63] Jason Lowe-Power 2018 Visualizing Spectre with gem5 httpwwwlowepowercomjasonvisualizing-spectre-with-gem5html Onlineaccessed Jun 2018

[64] Jason Lowe-Power Venkatesh Akella Matthew K Farrens Samuel TKing and Christopher J Nitta 2018 Position Paper A Case forExposing Extra-architectural State in the ISA In Proceedings of the7th International Workshop on Hardware and Architectural Support forSecurity and Privacy (HASP rsquo18) ACM New York NY USA Article 86 pages httpsdoiorg10114532142923214300

[65] Scott A Mahlke David C Lin William Y Chen Richard E Hank andRoger A Bringmann 1992 Effective compiler support for predicatedexecution using the hyperblock In MICRO

[66] Giorgi Maisuradze and Christian Rossow 2018 Ret2Spec SpeculativeExecution Using Return Stack Buffers In Proceedings of the 2018 ACMSIGSAC Conference on Computer and Communications Security (CCSrsquo18) ACM New York NY USA 2109ndash2122 httpsdoiorg10114532437343243761

[67] Microsoft Security Response Center (MSRC) Matt Miller 2018Analysis and mitigation of speculative store bypass (CVE-2018-3639) httpsblogstechnetmicrosoftcomsrd20180521analysis-and-mitigation-of-speculative-store-bypass-cve-2018-3639 Onlineaccessed Jul 2018

[68] Keaton Mowery Sriram Keelveedhi and Hovav Shacham 2012 AreAES x86 cache timing attacks still feasible In Proceedings of the 2012ACM Workshop on Cloud computing security workshop

[69] Oleksii Oleksenko Bohdan Trach Tobias Reiher Mark Silbersteinand Christof Fetzer 2018 You Shall Not Bypass Employing datadependencies to prevent Bounds Check Bypass CoRR abs180508506(2018) arXiv180508506 httparxivorgabs180508506

[70] Yossef Oren Vasileios P Kemerlis Simha Sethumadhavan and An-gelos D Keromytis 2015 The Spy in the Sandbox Practical CacheAttacks in JavaScript and Their Implications In Proceedings of the22Nd ACM SIGSAC Conference on Computer and CommunicationsSecurity

[71] Guilherme Ottoni Thomas Hartin ChristopherWeaver Jason BrandtBelliappa Kuttanna and HongWang 2011 Harmonia A TransparentEfficient and Harmonious Dynamic Binary Translator Targeting theIntelampReg Architecture In Proceedings of the 8th ACM InternationalConference on Computing Frontiers (CF rsquo11) ACM New York NYUSA Article 26 10 pages httpsdoiorg10114520166042016635

[72] Andrew Pardoe 2018 Spectre mitigations in MSVC httpsblogsmsdnmicrosoftcomvcblog20180115spectre-mitigations-in-msvc Online accessed Jul 2018

[73] Joseph CH Park and Mike Schlansker 1991 On predicated executionHewlett-Packard Laboratories Palo Alto California

[74] Will Reese 2008 Nginx The High-performance Web Server andReverse Proxy Linux J 2008 173 Article 2 (Sept 2008) httpdlacmorgcitationcfmid=14122021412204

[75] Ryan Roemer Erik Buchanan Hovav Shacham and Stefan Savage2012 Return-oriented programming Systems languages and ap-plications ACM Transactions on Information and System Security(2012)

[76] Michael Schwarz Martin Schwarzl Moritz Lipp and Daniel Gruss2018 NetSpectre Read Arbitrary Memory over Network arXivpreprint arXiv180710535 (2018)

[77] Hovav Shacham Matthew Page Ben Pfaff Eu-Jin Goh NagendraModadugu and Dan Boneh 2004 On the Effectiveness of Address-space Randomization In Proceedings of the 11th ACM Conference onComputer and Communications Security (CCS rsquo04) ACM New YorkNY USA 298ndash307 httpsdoiorg10114510300831030124

[78] Julian Stecklina and Thomas Prescher 2018 LazyFP Leaking FPURegister State using Microarchitectural Side-Channels arXiv preprintarXiv180607480 (2018)

[79] G Edward Suh Jae W Lee David Zhang and Srinivas Devadas2004 Secure Program Execution via Dynamic Information FlowTracking In Proceedings of the 11th International Conference onArchitectural Support for Programming Languages and OperatingSystems (ASPLOS XI) ACM New York NY USA 85ndash96 httpsdoiorg10114510243931024404

[80] Mohammadkazem Taram Ashish Venkat and Dean Tullsen 2018Mobilizing the Micro-Ops Exploiting Context Sensitive Decodingfor Security and Energy Efficiency In Proceedings of the 45th AnnualInternational Symposium on Computer Architecture (ISCA rsquo18)

[81] Mohit Tiwari Hassan MG Wassel Bita Mazloom ShashidharMysore Frederic T Chong and Timothy Sherwood 2009 Com-plete Information Flow Tracking from the Gates Up In Proceedings ofthe 14th International Conference on Architectural Support for Program-ming Languages and Operating Systems (ASPLOS XIV) ACM NewYork NY USA 109ndash120 httpsdoiorg10114515082441508258

[82] Caroline Trippel Daniel Lustig and Margaret Martonosi 2018MeltdownPrime and SpectrePrime Automatically-Synthesized At-tacks Exploiting Invalidation-Based Coherence Protocols CoRRabs180203802 (2018) arXiv180203802 httparxivorgabs180203802

[83] DeanM Tullsen Susan J Eggers Joel S Emer Henry M Levy Jack L Loand Rebecca L Stamm 1996 Exploiting choice Instruction fetch andissue on an implementable simultaneous multithreading processor InACM SIGARCH Computer Architecture News Vol 24 ACM 191ndash202

[84] Dean M Tullsen Susan J Eggers and Henry M Levy 1995 Si-multaneous Multithreading Maximizing On-chip Parallelism InProceedings of the 22Nd Annual International Symposium on Com-puter Architecture (ISCA rsquo95) ACM New York NY USA 392ndash403httpsdoiorg101145223982224449

[85] Paul Turner 2018 Retpoline a software construct for preventingbranch-target-injection httpssupportgooglecomfaqsanswer7625886 Online accessed Jul 2018

[86] N Vachharajani M J Bridges J Chang R Rangan G Ottoni J ABlome G A Reis M Vachharajani and D I August 2004 RIFLEAn Architectural Framework for User-Centric Information-Flow Se-curity In Microarchitecture 2004 MICRO-37 2004 37th InternationalSymposium on 243ndash254 httpsdoiorg101109MICRO200431

[87] Jo Van Bulck Marina Minkin Ofir Weisse Daniel Genkin BarisKasikci Frank Piessens Mark Silberstein Thomas F Wenisch Yu-val Yarom and Raoul Strackx 2018 Foreshadow Extracting theKeys to the Intel SGX Kingdom with Transient Out-of-Order Execu-tion In Proceedings of the 27th USENIX Security Symposium USENIXAssociation See also technical report Foreshadow-NG [95]

[88] Ashish Venkat 2018 Breaking the ISA Barrier in Modern ComputingPhD Dissertation UC San Diego

[89] Ashish Venkat Arvind Krishnaswamy Koichi Yamada and RajanPalanivel 2015 Binary Translation driven Program State RelocationIn United States Patent Grant US009135435B2

[90] Ashish Venkat S Shamasunder Hovav Shacham and DeanM Tullsen2016 HIPStR Heterogeneous-ISA Program State Relocation In Pro-ceedings of the International Symposium on Architectural Support forProgramming Languages and Operating Systems

[91] Ashish Venkat and Dean M Tullsen 2014 Harnessing ISA diversityDesign of a heterogeneous-ISA chip multiprocessor In Proceedingsof the International Symposium on Computer Architecture

[92] G Venkataramani I Doudalis Y Solihin and M Prvulovic 2008FlexiTaint A programmable accelerator for dynamic taint propaga-tion In 2008 IEEE 14th International Symposium on High PerformanceComputer Architecture 173ndash184 httpsdoiorg101109HPCA20084658637

[93] Philipp Vogt Florian Nentwich Nenad Jovanovic Engin KirdaChristopher Kruegel and Giovanni Vigna 2007 Cross Site Script-ing Prevention with Dynamic Data Tainting and Static Analysis InNDSS Vol 2007 12

[94] Zhenghong Wang and Ruby B Lee 2006 Covert and side channelsdue to processor architecture In Computer Security Applications Con-ference 2006 ACSACrsquo06 22nd Annual IEEE 473ndash482

[95] Ofir Weisse Jo Van Bulck Marina Minkin Daniel Genkin BarisKasikci Frank Piessens Mark Silberstein Raoul Strackx Thomas FWenisch and Yuval Yarom 2018 Foreshadow-NG Breaking theVirtual Memory Abstraction with Transient Out-of-Order Execu-tion Technical report (2018) See also USENIX Security paper Fore-shadow [87]

[96] Troy Wolverton 2018 Spectre and Meltdown are now a le-gal pain for Intel the chip maker faces 35 lawsuits over the at-tacks httpswwwbusinessinsidercom35-lawsuits-have-been-filed-against-intel-over-spectre-and-meltdown-2018-2 Online ac-cessed Aug 2018

[97] Zhenyu Wu Zhang Xu and Haining Wang 2012 Whispers in theHyper-space High-speed Covert Channel Attacks in the Cloud InPresented as part of the 21st USENIX Security Symposium (USENIX Se-curity 12) USENIX Bellevue WA 159ndash173 httpswwwusenixorgconferenceusenixsecurity12technical-sessionspresentationwu

[98] Wei Xu Sandeep Bhatkar and Ramachandran Sekar 2006 Taint-Enhanced Policy Enforcement A Practical Approach to Defeat aWide Range of Attacks In USENIX Security Symposium 121ndash136

[99] M Yan J Choi D Skarlatos A Morrison C Fletcher and J Torrellas2018 InvisiSpec Making Speculative Execution Invisible in the CacheHierarchy In 2018 51st Annual IEEEACM International Symposium onMicroarchitecture (MICRO) 428ndash441 httpsdoiorg101109MICRO201800042

[100] Yuval Yarom and Katrina Falkner 2014 FLUSH+ RELOAD A HighResolution Low Noise L3 Cache Side-Channel Attack In USENIXSecurity

[101] Yuval Yarom Daniel Genkin and Nadia Heninger 2016 CacheBleedA timing attack on OpenSSL constant time RSA In InternationalConference on Cryptographic Hardware and Embedded Systems

[102] Alexander Yip Xi Wang Nickolai Zeldovich and M Frans Kaashoek2009 Improving Application Security with Data Flow AssertionsIn Proceedings of the ACM SIGOPS 22Nd Symposium on OperatingSystems Principles (SOSP rsquo09) ACM New York NY USA 291ndash304httpsdoiorg10114516295751629604

[103] Craig Zilles 2001 Linked List Traversal Micro-Benchmark httpzillescsillinoisedullubenchmarkhtml Online accessed Jun 2018

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
  • 3 Assumptions and Threat Model
  • 4 Architectural Overview
  • 5 Design and Implementation
    • 51 Microcode Customization
    • 52 Decoder-Level Information Flow Tracking
    • 53 Mitigations for Control-flow Mistraining
      • 6 Methodology
      • 7 Evaluation
        • 71 Security Discussion
        • 72 Performance
          • 8 Conclusion
          • Acknowledgments
          • References
Page 16: Context-Sensitive Fencing: Securing Speculative Execution ...av6ds/papers/asplos2019.pdf · Context-Sensitive Fencing: Securing Speculative Execution via Microcode Customization Mohammadkazem

[76] Michael Schwarz Martin Schwarzl Moritz Lipp and Daniel Gruss2018 NetSpectre Read Arbitrary Memory over Network arXivpreprint arXiv180710535 (2018)

[77] Hovav Shacham Matthew Page Ben Pfaff Eu-Jin Goh NagendraModadugu and Dan Boneh 2004 On the Effectiveness of Address-space Randomization In Proceedings of the 11th ACM Conference onComputer and Communications Security (CCS rsquo04) ACM New YorkNY USA 298ndash307 httpsdoiorg10114510300831030124

[78] Julian Stecklina and Thomas Prescher 2018 LazyFP Leaking FPURegister State using Microarchitectural Side-Channels arXiv preprintarXiv180607480 (2018)

[79] G Edward Suh Jae W Lee David Zhang and Srinivas Devadas2004 Secure Program Execution via Dynamic Information FlowTracking In Proceedings of the 11th International Conference onArchitectural Support for Programming Languages and OperatingSystems (ASPLOS XI) ACM New York NY USA 85ndash96 httpsdoiorg10114510243931024404

[80] Mohammadkazem Taram Ashish Venkat and Dean Tullsen 2018Mobilizing the Micro-Ops Exploiting Context Sensitive Decodingfor Security and Energy Efficiency In Proceedings of the 45th AnnualInternational Symposium on Computer Architecture (ISCA rsquo18)

[81] Mohit Tiwari Hassan MG Wassel Bita Mazloom ShashidharMysore Frederic T Chong and Timothy Sherwood 2009 Com-plete Information Flow Tracking from the Gates Up In Proceedings ofthe 14th International Conference on Architectural Support for Program-ming Languages and Operating Systems (ASPLOS XIV) ACM NewYork NY USA 109ndash120 httpsdoiorg10114515082441508258

[82] Caroline Trippel Daniel Lustig and Margaret Martonosi 2018MeltdownPrime and SpectrePrime Automatically-Synthesized At-tacks Exploiting Invalidation-Based Coherence Protocols CoRRabs180203802 (2018) arXiv180203802 httparxivorgabs180203802

[83] DeanM Tullsen Susan J Eggers Joel S Emer Henry M Levy Jack L Loand Rebecca L Stamm 1996 Exploiting choice Instruction fetch andissue on an implementable simultaneous multithreading processor InACM SIGARCH Computer Architecture News Vol 24 ACM 191ndash202

[84] Dean M Tullsen Susan J Eggers and Henry M Levy 1995 Si-multaneous Multithreading Maximizing On-chip Parallelism InProceedings of the 22Nd Annual International Symposium on Com-puter Architecture (ISCA rsquo95) ACM New York NY USA 392ndash403httpsdoiorg101145223982224449

[85] Paul Turner 2018 Retpoline a software construct for preventingbranch-target-injection httpssupportgooglecomfaqsanswer7625886 Online accessed Jul 2018

[86] N Vachharajani M J Bridges J Chang R Rangan G Ottoni J ABlome G A Reis M Vachharajani and D I August 2004 RIFLEAn Architectural Framework for User-Centric Information-Flow Se-curity In Microarchitecture 2004 MICRO-37 2004 37th InternationalSymposium on 243ndash254 httpsdoiorg101109MICRO200431

[87] Jo Van Bulck Marina Minkin Ofir Weisse Daniel Genkin BarisKasikci Frank Piessens Mark Silberstein Thomas F Wenisch Yu-val Yarom and Raoul Strackx 2018 Foreshadow Extracting theKeys to the Intel SGX Kingdom with Transient Out-of-Order Execu-tion In Proceedings of the 27th USENIX Security Symposium USENIXAssociation See also technical report Foreshadow-NG [95]

[88] Ashish Venkat 2018 Breaking the ISA Barrier in Modern ComputingPhD Dissertation UC San Diego

[89] Ashish Venkat Arvind Krishnaswamy Koichi Yamada and RajanPalanivel 2015 Binary Translation driven Program State RelocationIn United States Patent Grant US009135435B2

[90] Ashish Venkat S Shamasunder Hovav Shacham and DeanM Tullsen2016 HIPStR Heterogeneous-ISA Program State Relocation In Pro-ceedings of the International Symposium on Architectural Support forProgramming Languages and Operating Systems

[91] Ashish Venkat and Dean M Tullsen 2014 Harnessing ISA diversityDesign of a heterogeneous-ISA chip multiprocessor In Proceedingsof the International Symposium on Computer Architecture

[92] G Venkataramani I Doudalis Y Solihin and M Prvulovic 2008FlexiTaint A programmable accelerator for dynamic taint propaga-tion In 2008 IEEE 14th International Symposium on High PerformanceComputer Architecture 173ndash184 httpsdoiorg101109HPCA20084658637

[93] Philipp Vogt Florian Nentwich Nenad Jovanovic Engin KirdaChristopher Kruegel and Giovanni Vigna 2007 Cross Site Script-ing Prevention with Dynamic Data Tainting and Static Analysis InNDSS Vol 2007 12

[94] Zhenghong Wang and Ruby B Lee 2006 Covert and side channelsdue to processor architecture In Computer Security Applications Con-ference 2006 ACSACrsquo06 22nd Annual IEEE 473ndash482

[95] Ofir Weisse Jo Van Bulck Marina Minkin Daniel Genkin BarisKasikci Frank Piessens Mark Silberstein Raoul Strackx Thomas FWenisch and Yuval Yarom 2018 Foreshadow-NG Breaking theVirtual Memory Abstraction with Transient Out-of-Order Execu-tion Technical report (2018) See also USENIX Security paper Fore-shadow [87]

[96] Troy Wolverton 2018 Spectre and Meltdown are now a le-gal pain for Intel the chip maker faces 35 lawsuits over the at-tacks httpswwwbusinessinsidercom35-lawsuits-have-been-filed-against-intel-over-spectre-and-meltdown-2018-2 Online ac-cessed Aug 2018

[97] Zhenyu Wu Zhang Xu and Haining Wang 2012 Whispers in theHyper-space High-speed Covert Channel Attacks in the Cloud InPresented as part of the 21st USENIX Security Symposium (USENIX Se-curity 12) USENIX Bellevue WA 159ndash173 httpswwwusenixorgconferenceusenixsecurity12technical-sessionspresentationwu

[98] Wei Xu Sandeep Bhatkar and Ramachandran Sekar 2006 Taint-Enhanced Policy Enforcement A Practical Approach to Defeat aWide Range of Attacks In USENIX Security Symposium 121ndash136

[99] M Yan J Choi D Skarlatos A Morrison C Fletcher and J Torrellas2018 InvisiSpec Making Speculative Execution Invisible in the CacheHierarchy In 2018 51st Annual IEEEACM International Symposium onMicroarchitecture (MICRO) 428ndash441 httpsdoiorg101109MICRO201800042

[100] Yuval Yarom and Katrina Falkner 2014 FLUSH+ RELOAD A HighResolution Low Noise L3 Cache Side-Channel Attack In USENIXSecurity

[101] Yuval Yarom Daniel Genkin and Nadia Heninger 2016 CacheBleedA timing attack on OpenSSL constant time RSA In InternationalConference on Cryptographic Hardware and Embedded Systems

[102] Alexander Yip Xi Wang Nickolai Zeldovich and M Frans Kaashoek2009 Improving Application Security with Data Flow AssertionsIn Proceedings of the ACM SIGOPS 22Nd Symposium on OperatingSystems Principles (SOSP rsquo09) ACM New York NY USA 291ndash304httpsdoiorg10114516295751629604

[103] Craig Zilles 2001 Linked List Traversal Micro-Benchmark httpzillescsillinoisedullubenchmarkhtml Online accessed Jun 2018

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
  • 3 Assumptions and Threat Model
  • 4 Architectural Overview
  • 5 Design and Implementation
    • 51 Microcode Customization
    • 52 Decoder-Level Information Flow Tracking
    • 53 Mitigations for Control-flow Mistraining
      • 6 Methodology
      • 7 Evaluation
        • 71 Security Discussion
        • 72 Performance
          • 8 Conclusion
          • Acknowledgments
          • References