jean-pierre lozi new mcf recruiti3s.unice.fr/sites/default/files/seminaires/seminar.pdf · under...

Post on 22-Aug-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

My research: an overviewJean-Pierre Lozi — New McF recruit

Jean-Pierre Lozi — Past activity

Who am I?

Jean-Pierre Lozi — Past activity

- From here ! (born in Nice, grew up in Antibes)

- Before my PhD :

- DEUG MIAS, Licence d’Informatique + Licence de Mathématiques in Nice

- Télécom ParisTech (Paris)

- Masters in Université Pierre et Marie Curie (Paris)

Who am I?

PhD in Computer Science (“allocataire moniteur”)

“Towards more scalable mutual exclusion for multicore architectures”

Under the supervision of Gilles Muller and Gaël Thomas

REGAL/WHISPER team: “Well-Honed Infrastructure Software for Programming Environments and Runtimes”

Laboratoire d’Informatique de Paris 6, UPMC, Paris

Postdoctoral Research Fellow, then University Research Associate

Under the supervision of Alexandra Fedorova

SYNAR team: “Systems Networking and Architecture Research”

Simon Fraser University (SFU), Vancouver, Canada

Until July 2014

Until Sept. 2015

Jean-Pierre Lozi — Past activity

Who am I?

Jean-Pierre Lozi — Past activity

- Three main projects:- Remote Core Locking: dedicating cores for the execution of critical sections

- Hector: automated fault detection in error-handling codes

- A decade of idle cores: scheduling bugs in Linux

- Domain: systems!- Multicore architectures, synchronization / lock algorithms

- Automated source code analysis, bug detection

- Schedulers (on multicore architectures, again)

- Probably not your domain…- I will just give a quick overview of my previous work, no details

- Objective: work with you on some projects ! Low-level systems stuff needed for performance…

- If interested, we can discuss things in more detail

Project 1Remote Core Locking: dedicating cores for faster

execution of critical sections

Jean-Pierre Lozi — Past activity

Project 1: Remote Core Locking

Context: multicore architectures

- Decades of increasing CPU clock speeds, now issues with power usage / heat

- Increasing numbers of cores to keep increasing processing power- Possible because number of transistors keep increasing

Jean-Pierre Lozi — Past activity

Project 1: Remote Core Locking

Context: multicore architectures

- Decades of increasing CPU clock speeds, now issues with power usage / heat

- Increasing numbers of cores to keep increasing processing power- Possible because number of transistors keep increasing

Jean-Pierre Lozi — Past activity

# transistors

Clock speed

Power consumption

Ratio power/speed

Project 1: Remote Core Locking

Problem:

- Many legacy applications don’t scale well on modern multicore architectures

- For instance, Memcached on an x86 48-core machine (Get/Set requests):

Jean-Pierre Lozi — Past activity

Hig

her

isb

ett

er

Project 1: Remote Core Locking

Problem:

- Many legacy applications don’t scale well on multicore architectures

- For instance, Memcached on an x86 48-core machine (Get/Set requests):

Jean-Pierre Lozi — Past activity

Hig

her

isb

ett

er

Project 1: Remote Core Locking

Poor scalability on multicore architectures: why?

- Bottleneck = critical sections, protected by locks

- High contention => lock acquisition is costly (more cores => higher contention)

Jean-Pierre Lozi — Past activity

Project 1: Remote Core Locking

Poor scalability on multicore architectures: why?

- Bottleneck = critical sections, protected by locks

- High contention => lock acquisition is costly (more cores => higher contention)

Jean-Pierre Lozi — Past activity

0%

20%

40%

60%

80%

100%

1 4 8 16 22 32 48

SPLASH-2/Radiosity

SPLASH-2/Raytrace

Phoenix 2/LG

Phoenix 2/SM

Phoenix 2/MM

Memcached/Get

Memcached/Set

Berkeley DB/OS

Berkeley DB/SL

Number of cores

% o

f tim

e s

pe

nt in

critical section

s*

Project 1: Remote Core Locking

Poor scalability on multicore architectures: why?

- Bottleneck = critical sections, protected by locks

- High contention => lock acquisition is costly (more cores => higher contention)

Two possible solutions:

- Redesign applications (fine-grained locking)

- Costly (millions of lines of legacy code)

- Design better locks!

Jean-Pierre Lozi — Past activity

Project 1: Remote Core Locking

Poor scalability on multicore architectures: why?

- Bottleneck = critical sections, protected by locks

- High contention => lock acquisition is costly (more cores => higher contention)

Two possible solutions:

- Redesign applications (fine-grained locking)

- Costly (millions of lines of legacy code)

- Design better locks!

Jean-Pierre Lozi — Past activity

Higher contention

MCS

CAS spinlock

Critical sections access 5 cache lines each

Lo

we

ris

be

tte

r

[Mellor-Crummey ASPLOS’91]

Project 1: Remote Core Locking

Designing better locks

- No need to redesign the application, better resistance to contention

- Custom microbenchmark to compare locks:

Jean-Pierre Lozi — Past activity Lower contention

Improvement

Higher contention

MCS

CAS spinlock

Critical sections access 5 cache lines each

Lo

we

ris

be

tte

r

Project 1: Remote Core Locking

Designing better locks

- No need to redesign the application, better resistance to contention

- Custom microbenchmark to compare locks:

Jean-Pierre Lozi — Past activity Lower contention

[Mellor-Crummey ASPLOS’91]

Higher contention

Critical sections access 5 cache lines each

Lo

we

ris

be

tte

r

Project 1: Remote Core Locking

Designing better locks

- Custom microbenchmark to compare locks:

Jean-Pierre Lozi — Past activity Lower contention

Flat Combining Blocking locks

MCS

CAS spinlock

Project 1: Remote Core Locking

Question : why are lock algorithms inefficient ?

Jean-Pierre Lozi — Past activity

Project 1: Remote Core Locking

Question : why are lock algorithms inefficient ?

Because critical paths are too long!

- Overhead 1: costly lock handovers

- Overhead 2: poor locality of critical sections

Jean-Pierre Lozi — Past activity

Project 1: Remote Core Locking

Overhead 1: costly lock handovers

Jean-Pierre Lozi — Past activity

Tim

eT1 T2 T3

CS1

CS2

CS3

Project 1: Remote Core Locking

Overhead 1: costly lock handovers

Jean-Pierre Lozi — Past activity

Tim

eT1 T2 T3

CS1

CS2

CS3

Lock

han

do

vers

Project 1: Remote Core Locking

Overhead 1: costly lock handovers

Jean-Pierre Lozi — Past activity

Tim

eT1 T2 T3

CS1

CS2

CS3

Lock

han

do

vers

Cri

tica

l pat

h

Project 1: Remote Core Locking

Overhead 1: costly lock handovers

Jean-Pierre Lozi — Past activity

Tim

eT1 T2 T3

CS1

SC2

CS3

Lock

han

do

vers

Cri

tcal

pat

h

Lock handovers :Spinlocks: busy-waiting,POSIX locks: context switch,MCS: sending a message from one thread to the next,Flat combining: sometimes acquires a spinlock

Project 1: Remote Core Locking

Overhead 2: poor locality of critical sections

Jean-Pierre Lozi — Past activity

Tim

e

T1 T2 T3CS1

CS3

CS2

Shared variable 2Shared variable 1

Project 1: Remote Core Locking

Overhead 2: poor locality of critical sections

Jean-Pierre Lozi — Past activity

Tim

e

T1 T2 T3CS1

CS3

CS2

Shared variable 2Shared variable 1

Cac

he

mis

ses

Project 1: Remote Core Locking

Overhead 2: poor locality of critical sections

Jean-Pierre Lozi — Past activity

Tim

e

T1 T2 T3CS1

CS3

CS2

Shared variable 2Shared variable 1

Cri

tica

l pat

h

Cac

he

mis

ses

Project 1: Remote Core Locking

Idea: RCL = shorten the critical path as much as possible by dedicating a server core!

Jean-Pierre Lozi — Past activity

Project 1: Remote Core Locking

Idea: RCL = shorten the critical path as much as possible by dedicating a server core!

Jean-Pierre Lozi — Past activity

Tim

e

T1 T2 T3

CS1CS2CS3

Server core

Project 1: Remote Core Locking

Idea: RCL = shorten the critical path as much as possible by dedicating a server core!

Jean-Pierre Lozi — Past activity

Tim

e

T1 T2 T3

CS1CS2CS3

Server core

Shared variable 2Shared variable 1

No cache misses!

Project 1: Remote Core Locking

Idea: RCL = shorten the critical path as much as possible by dedicating a server core!

Jean-Pierre Lozi — Past activity

Tim

e

T1 T2 T3

Cri

tica

l pat

hCS1CS2CS3

Server core

Shared variable 2Shared variable 1

No cache misses!

Project 1: Remote Core Locking

Performance ?

Jean-Pierre Lozi — Past activity

Project 1: Remote Core Locking

Jean-Pierre Lozi — Past activity

Higher contention Lower contention

CAS spinlock

MCS

RCL

Flat Combining Blocking locks

Lo

we

ris

be

tte

r

Microbenchmark:

Project 1: Remote Core Locking

Jean-Pierre Lozi — Past activity

Higher contention Lower contention

CAS spinlock

MCS

RCL

Blocking locks

Lo

we

ris

be

tte

r

Microbenchmark:

Combining locks {

Project 1: Remote Core Locking

That was the general idea… but RCL is much more than that!

RCL offers:

- A runtime designed to work with legacy applications: i.e., it works efficiently withmultiple locks per server and/or multiple servers, it handles critical sections thatbusy-wait/block, that supports condition variables / trylocks / nested / recursivecritical sections… lots of complex problems to solve here !

- A reengineering tool that transforms applications to use RCL: can’t be used withonly lock/unlock functions, need to encapsulate critical sections, and ship them toserver cores... We don’t want to do this manually!

- A profiler to detect applications that can benefit from RCL: even with thereenginering tool, using RCL takes time… We want to make sure we can benefitfrom it! Use case : applications with highly contended locks or critical sections w/poor locality...

Jean-Pierre Lozi — Past activity

Project 1: Remote Core Locking

That was the general idea… but RCL is much more than that!

Focus: legacy system (C) applications. RCL offers:

- A runtime designed to work with legacy applications: i.e., it works efficiently withmultiple locks per server and / or multiple servers, it handles critical sections thatbusy-wait / block, that supports condition variables / trylocks / nested / recursivecritical sections… lots of algorithmic / engineering problems to solve here !

- A reengineering tool that transforms applications to use RCL: can’t be used withonly lock/unlock functions, need to encapsulate critical sections, and ship them toserver cores... We don’t want to do this manually!

- A profiler to detect applications that can benefit from RCL: even with thereenginering tool, using RCL takes time… We want to make sure we can benefitfrom it! Use case : applications with highly contended locks or critical sections w/poor locality...

Jean-Pierre Lozi — Past activity

Project 1: Remote Core Locking

That was the general idea… but RCL is much more than that!

Focus: legacy system (C) applications. RCL offers:

- A runtime designed to work with legacy applications: i.e., it works efficiently withmultiple locks per server and / or multiple servers, it handles critical sections thatbusy-wait / block, that supports condition variables / trylocks / nested / recursivecritical sections… lots of algorithmic / engineering problems to solve here !

- A reengineering tool that transforms applications to use RCL: can’t be used withonly lock / unlock functions, need to encapsulate critical sections, and ship themto server cores... We don’t want to do this manually!

- A profiler to detect applications that can benefit from RCL: even with thereenginering tool, using RCL takes time… We want to make sure we can benefitfrom it! Use case : applications with highly contended locks or critical sections w/poor locality...

Jean-Pierre Lozi — Past activity

Project 1: Remote Core Locking

That was the general idea… but RCL is much more than that!

Focus: legacy system (C) applications. RCL offers:

- A runtime designed to work with legacy applications: i.e., it works efficiently withmultiple locks per server and / or multiple servers, it handles critical sections thatbusy-wait / block, that supports condition variables / trylocks / nested / recursivecritical sections… lots of algorithmic / engineering problems to solve here !

- A reengineering tool that transforms applications to use RCL: can’t be used withonly lock / unlock functions, need to encapsulate critical sections, and ship themto server cores... We don’t want to do this manually!

- A profiler to detect applications that can benefit from RCL: even with thereenginering tool, using RCL takes time… We want to make sure we can benefitfrom it! Use case : applications with highly contended locks or critical sections w/poor locality...

Jean-Pierre Lozi — Past activity

void func(void) {

int a, b, x;

a = …;

pthread_mutex_lock();

a = f(a);

f(b);

pthread_mutex_unlock();

}

struct context { int a, b };

void func(void) {

struct context c;

int x;

c.a = …;

execute_rcl(__cs, &c);

}

void __cs(struct context *c) {

c->a = f(c->a)

f(c->b)

}

Project 1: Remote Core Locking

Reengineering tool: a simple case

void func(void) {

int a, b, x;

a = …;

pthread_mutex_lock();

a = f(a);

f(b);

pthread_mutex_unlock();

}

struct context { int a, b };

void func(void) {

struct context c;

int x;

c.a = …;

execute_rcl(__cs, &c);

}

void __cs(struct context *c) {

c->a = f(c->a)

f(c->b)

}

Project 1: Remote Core Locking

Reengineering tool: a simple case

void func(void) {

int a, b, x;

a = …;

pthread_mutex_lock();

a = f(a);

f(b);

pthread_mutex_unlock();

}

struct context { int a, b };

void func(void) {

struct context c;

int x;

c.a = …;

execute_rcl(__cs, &c);

}

void __cs(struct context *c) {

c->a = f(c->a)

f(c->b)

}

Project 1: Remote Core Locking

Reengineering tool: a simple case

void func(void) {

int a, b, x;

a = …;

pthread_mutex_lock();

a = f(a);

f(b);

pthread_mutex_unlock();

}

struct context { int a, b };

void func(void) {

struct context c;

int x;

c.a = …;

execute_rcl(__cs, &c);

}

void __cs(struct context *c) {

c->a = f(c->a)

f(c->b)

}

Project 1: Remote Core Locking

Reengineering tool: a simple case

Project 1: Remote Core Locking

Performance in legacy applications:

Jean-Pierre Lozi — Past activity

% in CS: 44.7%(many DCMs)

63.9% 65.7% 79.0% 81.6% 87.7% 90.2% 92.2%

Hig

he

ris

better

Project 1: Remote Core Locking

Performance in legacy applications:

Jean-Pierre Lozi — Past activity

% in CS: 44.7%(many DCMs)

63.9% 65.7% 79.0% 81.6% 87.7% 90.2% 92.2%

Hig

he

ris

better

Project 1: Remote Core Locking

Performance in legacy applications:

Jean-Pierre Lozi — Past activity

% in CS: 44.7%(many DCMs)

63.9% 65.7% 79.0% 81.6% 87.7% 90.2% 92.2%

Hig

he

ris

better

Project 1: Remote Core Locking

Improved scalability:

Jean-Pierre Lozi — Past activity

Project 1: Remote Core Locking

Very efficient when more threads than cores: server always makes progress

Jean-Pierre Lozi — Past activity

Hig

her

isb

ett

er

Project 1: Remote Core Locking

Very efficient when more threads than cores: server always makes progress

Jean-Pierre Lozi — Past activity

Quick collapse

Hig

her

isb

ett

er

Project 1: Remote Core Locking

Very efficient when more threads than cores: server always makes progress

Jean-Pierre Lozi — Past activity

Quick collapse

Hig

her

isb

ett

er

Project 1: Remote Core Locking

Publications (44 citations):

- In CFSE ‘8 (national conference, best paper)

- In USENIX ATC ’12 (international conference)

- Long version of the paper submitted to TOCS (intl journal, waiting for reviews)

Several research works based on RCL already:

- [Petrovic et al., PPoPP’14] : RCL for partially cache coherent architectures

- [Pusuruki et al., PPoPP ‘14] : migrating threads to improve locality of CS

- [Hassan et al., IPDPS ’14] : dedicated server cores for transactional memory

Jean-Pierre Lozi — Past activity

Project 2Hector: automated fault detection in error-

handling codes

Jean-Pierre Lozi — Past activity

Project 2: Hector

Problem:

- System applications are complex => impossible to avoid bugs

- Large part of system applications are error-handling codes- Errors tested after most function calls

Jean-Pierre Lozi — Past activity

Project 2: Hector

Problem:

- System applications are complex => impossible to avoid bugs

- Large part of system applications are error-handling codes- Errors tested after most function calls

- A common mistake: forgetting to release a resource in an EHC- Releasing memory or a lock, unloading a device…

Jean-Pierre Lozi — Past activity

Project 2: Hector

Problem:

- System applications are complex => impossible to avoid bugs

- Large part of system applications are error-handling codes- Errors tested after most function calls

- A common mistake: forgetting to release a resource in an EHC- Releasing memory or a lock, unloading a device…

- Can have major consequences!- Memory leaks (exploits!), deadlocks, crashes...

Jean-Pierre Lozi — Past activity

Project 2: Hector

Existing solutions:

- Specification mining: automated search for “protocols”, i.e., ways to use an API: e.g., function Y (release) often after function X

- Problem: in practice, need to prune number of protocols found to avoid false positive: filtering using high confidence and support

Jean-Pierre Lozi — Past activity

Project 2: Hector

Existing solutions:

- Specification mining: automated search for “protocols”, i.e., ways to use an API: e.g., function Y (release) often after function X

- Problem: in practice, need to prune number of protocols found to avoid false positives: filtering using high confidence and support

Jean-Pierre Lozi — Past activity

Project 2: Hector

Existing solutions:

- Specification mining: automated search for “protocols”, i.e., ways to use an API: e.g., function Y (release) often after function X

- Problem: in practice, need to prune number of protocols found to avoid false positives: filtering using high confidence and support

- “Macroscopic approaches”: missing lots of acquire/release faults because many acquire/release operations used only a handful of times!

Jean-Pierre Lozi — Past activity

Project 2: Hector

Our solution: Hector

- Observation: when resources released in a given way in an EHC, often releasedin the same way in nearby EHCs

- We first annotate functions globally, guessing if they are acquire or release functions based on heuristics (return a parameter or not, first/last access to a variable, …)

- Local analysis of Flow Control Graph: looks for acquire / release operations

Jean-Pierre Lozi — Past activity

Project 2: Hector

Our solution: Hector

- Observation: when resources released in a given way in an EHC, often releasedin the same way in nearby EHCs

- We first annotate functions globally, guessing if they are acquire or release functions based on heuristics (return a parameter or not, first/last access to a variable, …)

- Local (“microscopic”) analysis of Flow Control Graph:- Look for release operation in EHC, flow graph”exemplar”

- Parts of the code with same acquire/release operations in flow graph before EHC but missing release operation: fault candidate!

Jean-Pierre Lozi — Past activity

Project 2: Hector

Our solution: Hector

- Observation: when resources released in a given way in an EHC, often releasedin the same way in nearby EHCs

- We first annotate functions globally, guessing if they are acquire or release functions based on heuristics (return a parameter or not, first/last access to a variable, …)

- Local (“microscopic”) analysis of Flow Control Graph:- Look for release operation in EHC, flow graph before the EHC = exemplar

- Parts of the code with same acquire/release operations in exemplar but missing release operation: fault candidate!

Jean-Pierre Lozi — Past activity

Expérience en recherche (8/14)

Jean-Pierre Lozi — Past activity

Allocation 1 ⟶

Allocation 2 ⟶

Libération 1 ⟶

Libération 2 ⟶

a

b

d

e

f Sortie ⟶

c GOTO ⟶

Basic example: from Linux

Expérience en recherche (8/14)

Jean-Pierre Lozi — Past activity

Libération 1 ⟶

Libération 2 ⟶

⟵ Sortie

d

e

f Sortie ⟶ f ‘’

c GOTO ⟶

⟵ GOTO c

Basic example: from LinuxAllocation 1 ⟶

Allocation 2 ⟶

a

b

Expérience en recherche (8/14)

Basic example: from Linux

Jean-Pierre Lozi — Past activity

Libération 1 ⟶

Libération 2 ⟶

⟵ Sortie

d

e

f Sortie ⟶ f ‘’

Same acquire/release operationsbefore EHC, missing releaseoperations: candidate fault!

c GOTO ⟶

⟵ GOTO c

Allocation 1 ⟶

Allocation 2 ⟶

a

b

Project 2: Hector

Our solution: Hector

- Was the general idea: in practice, smart heuristics in the Control Flow Graph to handle more elaborate cases…

- Found 484 fault candidates in Linux, Python, Apache, Wine, PHP, and PostgreSQL… 371 were real faults! => few false positives

- Most wouldn’t have been detected by macroscopic approaches!

Jean-Pierre Lozi — Past activity

Project 2: Hector

Our solution: Hector

- Was the general idea: in practice, smart heuristics in the Control Flow Graph to handle more elaborate cases…

- Found 484 fault candidates in Linux, Python, Apache, Wine, PHP, and PostgreSQL… 371 were real faults! => few false positives

- Most wouldn’t have been detected by macroscopic approaches!

Jean-Pierre Lozi — Past activity

Project 2: Hector

Our solution: Hector

- Was the general idea: in practice, smart heuristics in the Control Flow Graph to handle more elaborate cases…

- Found 484 fault candidates in Linux, Python, Apache, Wine, PHP, and PostgreSQL… 371 were real faults! => few false positives

- Most wouldn’t have been detected by macroscopic approaches!

Jean-Pierre Lozi — Past activity

Project 2: Hector

Our solution: Hector

- Was the general idea: in practice, smart heuristics in the Control Flow Graph to handle more elaborate cases…

- Found 484 fault candidates in Linux, Python, Apache, Wine, PHP, and PostgreSQL… 371 were real faults! => few false positives

- Most wouldn’t have been detected by macroscopic approaches!

- One of my main roles in this project: I showed that a malicious user can exploit some of these faults to crash servers!

Jean-Pierre Lozi — Past activity

Project 2: Hector

Publications (11 citations) :

- Dans CFSE ‘9 (national conference)

- Dans DSN ‘13 (international conference, best paper)

Jean-Pierre Lozi — Past activity

Project 3A decade of idle cores: scheduling bugs in Linux

Jean-Pierre Lozi — Past activity

Project 3: scheduling bugs in Linux

Problem:

- OS schedulers keep evolving with hardware- Completely new schedulers once in a while (the O(1) scheduler, CFS scheduler…)

- Many heuristics added over time to overcome issues

- Especially true with recent machines: multicore, NUMA...

- Programs run and end, nobody seems to think there is a problem!

- Linus said :- “And you have to realize that there are not very many things that have aged as well as the

scheduler. Which isjust another proof that scheduling is easy.”

- “I suspect that making the scheduler use per-CPU queues together with some inter-CPU load balancing logic is probably _trivial_. Patches already exist, and I don’t feel that people can screw up the few hundred lines too badly.”

Jean-Pierre Lozi — Past activity

Project 3: scheduling bugs in Linux

Problem:

- OS schedulers keep evolving with hardware- Completely new schedulers once in a while (the O(1) scheduler, CFS scheduler…)

- Many heuristics added over time to overcome issues

- Especially true with recent machines: multicore, NUMA...

- Programs run and end, nobody seems to think there is a problem!

- Linus said :- “And you have to realize that there are not very many things that have aged as well as the

scheduler. Which isjust another proof that scheduling is easy.”

- “I suspect that making the scheduler use per-CPU queues together with some inter-CPU load balancing logic is probably _trivial_. Patches already exist, and I don’t feel that people can screw up the few hundred lines too badly.”

Jean-Pierre Lozi — Past activity

Project 3: scheduling bugs in Linux

Problem:

- OS schedulers keep evolving with hardware- Completely new schedulers once in a while (the O(1) scheduler, CFS scheduler…)

- Many heuristics added over time to overcome issues

- Especially true with recent machines: multicore, NUMA...

- Programs run and end, nobody seems to think there is a problem!

- Linus said :- “And you have to realize that there are not very many things that have aged as well as the

scheduler. Which isjust another proof that scheduling is easy.”

- “I suspect that making the scheduler use per-CPU queues together with some inter-CPU load balancing logic is probably _trivial_. Patches already exist, and I don’t feel that people can screw up the few hundred lines too badly.”

Jean-Pierre Lozi — Past activity

Project 3: scheduling bugs in Linux

So wait, what is the problem again?

- Read the scheduler code: it is actually very complex, full of heuristics thatsometimes seem contradictory, hard to really understand

- No effort to show that any of it is correct. Simply tested on a few microbenchmarks, and feedback from users.

- We were curious and we considered a set of implicit invariants.

- Simple stuff, like “no core should run several threads when there have been idlecores in the system for a long time”, or “two threads with the same load shouldrun for a similar amount of time”

- We wrote “sanity checks” to check these invariants at runtime.

Jean-Pierre Lozi — Past activity

Project 3: scheduling bugs in Linux

So wait, what is the problem again?

- Read the scheduler code: it is actually very complex, full of heuristics thatsometimes seem contradictory, hard to really understand

- No effort to show that any of it is correct. Simply tested on a few microbenchmarks, and feedback from users.

We were curious and we considered a set of implicit invariants.

- Simple stuff, like “no core should run several threads when there have been idlecores in the system for a long time”, or “two threads with the same load shouldrun for a similar amount of time”

- We wrote “sanity checks” to check these invariants at runtime.

Jean-Pierre Lozi — Past activity

Project 3: scheduling bugs in Linux

Our shocking news: the Linux scheduler is rife with “scheduling bugs”, i.e., the inability to maintain the most basic implicit invariants!

Jean-Pierre Lozi — Past activity

Project 3: scheduling bugs in Linux

Example 1: when running two processes in two terminals, one with 2 threads, the other one with 64 threads, many cores idle while other cores overloaded!

- Reason: autogroups, the process with two threads has a higher load. The schedulerjust looks at average loads of nodes, which is balanced. Doesn’t try to go further down in the hierarchy!

Jean-Pierre Lozi — Past activity

Project 3: scheduling bugs in Linux

Example 1: when running two processes in two terminals, one with 2 threads, the other one with 64 threads, many cores idle while other cores overloaded!

- Reason: autogroups, the process with two threads has a higher load. The schedulerjust looks at average loads of nodes, which is balanced. Doesn’t try to go further down in the hierarchy!

Jean-Pierre Lozi — Past activity

Project 3: scheduling bugs in Linux

Example 2: Oracle with TPC-H = many cores remain idle for no apparent reason.

- Reason: initial inter-node load balancing event caused by transient thread, after thatone node with more threads than cores, thread keeps waking up on busy core becauseonly node-local threads considered (faulty cache-optimization heuristic). All threads wait for each other: “holes” everywhere in the execution!

Jean-Pierre Lozi — Past activity

Project 3: scheduling bugs in Linux

Example 2: Oracle with TPC-H = many cores remain idle for no apparent reason.

- Reason: initial inter-node load balancing event caused by transient thread, after thatone node with more threads than cores, thread keeps waking up on busy core becauseonly node-local threads considered (faulty cache-optimization heuristic). All threads wait for each other: “holes” everywhere in the execution!

Jean-Pierre Lozi — Past activity

Project 3: scheduling bugs in Linux

- Other bugs: Many other scheduling bugs found, including bugs where only one node in the system is used because topology not built properly, etc.

- We argue that such bugs will always be added to the scheduler, due to its constant evolution. Using formal proofs would be extremely complex, regression testingwould not find the bugs we found.

- We discuss possible ways of redesigning the scheduler to reduce bugs, but evenwith this, sanity checks needed to ensure more bugs won’t be added.

Jean-Pierre Lozi — Past activity

Project 3: scheduling bugs in Linux

- Other bugs: Many other scheduling bugs found, including bugs where only one node in the system is used because topology not built properly, etc.

- We argue that such bugs will always be added to the scheduler, due to its constant evolution. Using formal proofs would be extremely complex, regression testingwould not find the bugs we found.

- We discuss possible ways of redesigning the scheduler to reduce bugs, but evenwith this, sanity checks needed to ensure more bugs won’t be added.

Jean-Pierre Lozi — Past activity

Project 3: scheduling bugs in Linux

- Other bugs: Many other scheduling bugs found, including bugs where only one node in the system is used because topology not built properly, etc.

- We argue that such bugs will always be added to the scheduler, due to its constant evolution. Using formal proofs would be extremely complex, regression testingwould not find the bugs we found.

- We propose possible ways of redesigning the scheduler to reduce bugs, but evenwith this, sanity checks needed to ensure more bugs won’t be added.

Jean-Pierre Lozi — Past activity

Project 3: scheduling bugs in Linux

Our proposed implementation of sanity checks:- Differs from watchdogs or assertions, no simple test for scheduling issues

- Sanity checks called periodically, start low-overhead recording of movements of threads if candidate issue detected, report a bug if issue persists long enough

- If issue persists long enough, high-overhead profiling of the whole machine for 20ms to generate a bug report that helps understand why the scheduler didn’t reach a satisfactory state again during that time

We argue that sanity checks are the only practical way to efficiently ensure implicitinvariants, should be added to various parts of the kernel!

Paper mostly written, will be submitted to EuroSys next month.Jean-Pierre Lozi — Past activity

Project 3: scheduling bugs in Linux

Our proposed implementation of sanity checks:- Differs from watchdogs or assertions, no simple test for scheduling issues

- Sanity checks called periodically, start low-overhead recording of movements of threads if candidate issue detected, report a bug if issue persists long enough

- If issue persists long enough, high-overhead profiling of the whole machine for 20ms to generate a bug report that helps understand why the scheduler didn’t reach a satisfactory state again during that time

We argue that sanity checks are the only practical way to efficiently ensure implicitinvariants, should be added to various parts of the kernel!

Paper mostly written, will be submitted to EuroSys next month.Jean-Pierre Lozi — Past activity

Project 3: scheduling bugs in Linux

Our proposed implementation of sanity checks:- Differs from watchdogs or assertions, no simple test for scheduling issues

- Sanity checks called periodically, start low-overhead recording of movements of threads if candidate issue detected, report a bug if issue persists long enough

- If issue persists long enough, high-overhead profiling of the whole machine for 20ms to generate a bug report that helps understand why the scheduler didn’t reach a satisfactory state again during that time

We argue that sanity checks are the only practical way to efficiently ensure implicitinvariants, should be added to various parts of the kernel!

Paper mostly written, will be submitted to EuroSys next month.Jean-Pierre Lozi — Past activity

Project 3: scheduling bugs in Linux

Our proposed implementation of sanity checks:- Differs from watchdogs or assertions, no simple test for scheduling issues

- Sanity checks called periodically, start low-overhead recording of movements of threads if candidate issue detected, report a bug if issue persists long enough

- If issue persists long enough, high-overhead profiling of the whole machine for 20ms to generate a bug report that helps understand why the scheduler didn’t reach a satisfactory state again during that time

We argue that sanity checks are the only practical way to efficiently ensure implicitinvariants, should be added to various parts of the kernel!

Paper mostly written, will be submitted to EuroSys next monthJean-Pierre Lozi — Past activity

Project 3: scheduling bugs in Linux

Our proposed implementation of sanity checks:- Differs from watchdogs or assertions, no simple test for scheduling issues

- Sanity checks called periodically, start low-overhead recording of movements of threads if candidate issue detected, report a bug if issue persists long enough

- If issue persists long enough, high-overhead profiling of the whole machine for 20ms to generate a bug report that helps understand why the scheduler didn’t reach a satisfactory state again during that time

We argue that sanity checks are the only practical way to efficiently ensure implicitinvariants, should be added to various parts of the kernel!

Paper mostly written, will be submitted to EuroSys next month.Jean-Pierre Lozi — Past activity

And now?

Jean-Pierre Lozi — Past activity

And now?

Position profile: heterogeneous data warehouses- My profile: more “systems” than most people in the team

- But, data warehouses use multicore processors, which need scheduling, etc.

- Objective: bringing my expertise in systems to projects in the team

Open to all suggestions for collaborations!

Jean-Pierre Lozi

And now?

Position profile: heterogeneous data warehouses- My profile: more “systems” than most people in the team

- But, data warehouses use multicore processors, which need scheduling, etc.

- Objective: bringing my expertise in systems to projects in the team

Open to all suggestions for collaborations!

Jean-Pierre Lozi

top related