linux, locking and ots of - hs-rm.dekaiser/1515_aos/08-linux.pdf · linux, locking and lots of...

154
L INUX ,L OCKING AND L OTS OF P ROCESSORS Peter Chubb peter.chubb @nicta.com.au

Upload: others

Post on 07-Jun-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

LINUX, LOCKING AND LOTS OF

PROCESSORS

Peter Chubb

[email protected]

Page 2: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

A LITTLE BIT OF HISTORY

• Multix in the ’60s

NICTA Copyright c© 2014 From Imagination to Impact 2

Page 3: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

A LITTLE BIT OF HISTORY

• Multix in the ’60s

• Ken Thompson and Dennis Ritchie in 1967–70

NICTA Copyright c© 2014 From Imagination to Impact 2

Page 4: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

A LITTLE BIT OF HISTORY

• Multix in the ’60s

• Ken Thompson and Dennis Ritchie in 1967–70

• USG and BSD

NICTA Copyright c© 2014 From Imagination to Impact 2

Page 5: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

A LITTLE BIT OF HISTORY

• Multix in the ’60s

• Ken Thompson and Dennis Ritchie in 1967–70

• USG and BSD

• John Lions 1976–95

NICTA Copyright c© 2014 From Imagination to Impact 2

Page 6: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

A LITTLE BIT OF HISTORY

• Multix in the ’60s

• Ken Thompson and Dennis Ritchie in 1967–70

• USG and BSD

• John Lions 1976–95

• Andrew Tanenbaum 1987

NICTA Copyright c© 2014 From Imagination to Impact 2

Page 7: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

A LITTLE BIT OF HISTORY

• Multix in the ’60s

• Ken Thompson and Dennis Ritchie in 1967–70

• USG and BSD

• John Lions 1976–95

• Andrew Tanenbaum 1987

• Linux Torvalds 1991

NICTA Copyright c© 2014 From Imagination to Impact 2

Page 8: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

A LITTLE BIT OF HISTORY

• Basic concepts well established

– Process model

– File system model

– IPC

NICTA Copyright c© 2014 From Imagination to Impact 3

Page 9: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

A LITTLE BIT OF HISTORY

• Basic concepts well established

– Process model

– File system model

– IPC

• Additions:

– Paged virtual memory (3BSD, 1979)

NICTA Copyright c© 2014 From Imagination to Impact 3

Page 10: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

A LITTLE BIT OF HISTORY

• Basic concepts well established

– Process model

– File system model

– IPC

• Additions:

– Paged virtual memory (3BSD, 1979)

– TCP/IP Networking (BSD 4.1, 1983)

NICTA Copyright c© 2014 From Imagination to Impact 3

Page 11: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

A LITTLE BIT OF HISTORY

• Basic concepts well established

– Process model

– File system model

– IPC

• Additions:

– Paged virtual memory (3BSD, 1979)

– TCP/IP Networking (BSD 4.1, 1983)

– Multiprocessing (Vendor Unices such as

Sequent’s ‘Balance’, 1984)

NICTA Copyright c© 2014 From Imagination to Impact 3

Page 12: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

ABSTRACTIONS

Linux Kernel

File

s

Th

read

of

Co

ntr

ol

Mem

ory

Sp

ace

NICTA Copyright c© 2014 From Imagination to Impact 4

Page 13: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

PROCESS MODEL

• Root process (init)

• fork() creates (almost) exact copy

– Much is shared with parent — Copy-On-Write

avoids overmuch copying

• exec() overwrites memory image from a file

NICTA Copyright c© 2014 From Imagination to Impact 5

Page 14: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

PROCESS MODEL

• Root process (init)

• fork() creates (almost) exact copy

– Much is shared with parent — Copy-On-Write

avoids overmuch copying

• exec() overwrites memory image from a file

• Allows a process to control what is shared

NICTA Copyright c© 2014 From Imagination to Impact 5

Page 15: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

FORK() AND EXEC()

➜ A process can clone itself by calling fork().

➜ Most attributes copied :

➜ Address space (actually shared, marked copy-on-write)

➜ current directory, current root

➜ File descriptors

➜ permissions, etc.

➜ Some attributes shared :

➜ Memory segments marked MAP SHARED

➜ Open files

NICTA Copyright c© 2014 From Imagination to Impact 6

Page 16: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

FORK() AND EXEC()

Files and Processes:

.

.

0

1

2

3

4

5

6

7

File descriptor table

Process A

NICTA Copyright c© 2014 From Imagination to Impact 7

Page 17: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

FORK() AND EXEC()

Files and Processes:

Open file descriptor

Offset

In−kernel inode

.

.

0

1

2

3

4

5

6

7

File descriptor table

Process A

NICTA Copyright c© 2014 From Imagination to Impact 7

Page 18: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

FORK() AND EXEC()

Files and Processes:

dup()

Open file descriptor

Offset

In−kernel inode

.

.

0

1

2

3

4

5

6

7

File descriptor table

Process A

NICTA Copyright c© 2014 From Imagination to Impact 7

Page 19: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

FORK() AND EXEC()

Files and Processes:

.

.

0

1

2

3

4

5

6

7

File descriptor table

Process B

fork()

dup()

Open file descriptor

Offset

In−kernel inode

.

.

0

1

2

3

4

5

6

7

File descriptor table

Process A

NICTA Copyright c© 2014 From Imagination to Impact 7

Page 20: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

FORK() AND EXEC()

switch (kidpid = fork()) {

case 0: /* child */

close(0); close(1); close(2);

dup(infd); dup(outfd); dup(outfd);

execve("path/to/prog", argv, envp);

_exit(EXIT_FAILURE);

case -1:

/* handle error */

default:

waitpid(kidpid, &status, 0);

}

NICTA Copyright c© 2014 From Imagination to Impact 8

Page 21: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

STANDARD FILE DESCRIPTORS

0 Standard Input

1 Standard Output

2 Standard Error

➜ Inherited from parent

➜ On login, all are set to controlling tty

NICTA Copyright c© 2014 From Imagination to Impact 9

Page 22: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

FILE MODEL

• Separation of names from content.

• ‘regular’ files ‘just bytes’ → structure/meaning

supplied by userspace

• Devices represented by files.

• Directories map names to index node indices

(inums)

• Simple permissions model

NICTA Copyright c© 2014 From Imagination to Impact 10

Page 23: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

FILE MODEL

.

..

bash

shls

whichrnano

busybox

setserial

bzcmp

367

368

402401

265

/ bin / ls

.

..

boot

sbin

bin

dev

var

vmlinux

etc

usr

inode 324

2300300

301

32434

5

76

2

2324

8

125

NICTA Copyright c© 2014 From Imagination to Impact 11

Page 24: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

NAMEI

➜ translate name → inode

➜ abstracted per filesystem in VFS layer

➜ Can be slow: extensive use of caches to speed it up

dentry cache

➜ hide filesystem and device boundaries

➜ walks pathname, translating symbolic links

NICTA Copyright c© 2014 From Imagination to Impact 12

Page 25: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

NAMEI

➜ translate name → inode

➜ abstracted per filesystem in VFS layer

➜ Can be slow: extensive use of caches to speed it up

dentry cache — becomes SMP bottleneck

➜ hide filesystem and device boundaries

➜ walks pathname, translating symbolic links

NICTA Copyright c© 2014 From Imagination to Impact 12

Page 26: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

EVOLUTION

KISS:

➜ Simplest possible algorithm used at first

NICTA Copyright c© 2014 From Imagination to Impact 13

Page 27: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

EVOLUTION

KISS:

➜ Simplest possible algorithm used at first

➜ Easy to show correctness

➜ Fast to implement

NICTA Copyright c© 2014 From Imagination to Impact 13

Page 28: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

EVOLUTION

KISS:

➜ Simplest possible algorithm used at first

➜ Easy to show correctness

➜ Fast to implement

➜ As drawbacks and bottlenecks are found, replace with

faster/more scalable alternatives

NICTA Copyright c© 2014 From Imagination to Impact 13

Page 29: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

C DIALECT

• Extra keywords:

– Section IDs: init, exit, percpu etc

– Info Taint annotation user, rcu, kernel,

iomem

– Locking annotations acquires(X),

releases(x)

– extra typechecking (endian portability) bitwise

NICTA Copyright c© 2014 From Imagination to Impact 14

Page 30: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

C DIALECT

• Extra iterators

– type name foreach()

• Extra accessors

– container of()

NICTA Copyright c© 2014 From Imagination to Impact 15

Page 31: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

C DIALECT

• Massive use of inline functions

• Some use of CPP macros

• Little #ifdef use in code: rely on optimizer to elide

dead code.

NICTA Copyright c© 2014 From Imagination to Impact 16

Page 32: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCHEDULING

Goals:

• O(1) in number of runnable processes, number of

processors

– good uniprocessor performance

• ‘fair’

• Good interactive response

• topology-aware

NICTA Copyright c© 2014 From Imagination to Impact 17

Page 33: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCHEDULING

Implementation:

• Changes from time to time.

• Currently ‘CFS’ by Ingo Molnar.

NICTA Copyright c© 2014 From Imagination to Impact 18

Page 34: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCHEDULING

Dual Entitlement Scheduler

0.5 0.7 0.1

0 0

Expired

Running

NICTA Copyright c© 2014 From Imagination to Impact 19

Page 35: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCHEDULING

1. Keep tasks ordered by effective CPU runtime

weighted by nice in red-black tree

2. Always run left-most task.

NICTA Copyright c© 2014 From Imagination to Impact 20

Page 36: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCHEDULING

1. Keep tasks ordered by effective CPU runtime

weighted by nice in red-black tree

2. Always run left-most task.

Devil’s in the details:

• Avoiding overflow

• Keeping recent history

• multiprocessor locality

• handling too-many threads

• Sleeping tasks

• Group hierarchyNICTA Copyright c© 2014 From Imagination to Impact 20

Page 37: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCHEDULING

(hyper)Thread

NICTA Copyright c© 2014 From Imagination to Impact 21

Page 38: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCHEDULING

Core

NICTA Copyright c© 2014 From Imagination to Impact 21

Page 39: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCHEDULING

(hyper)Threads

Packages

Cores

NICTA Copyright c© 2014 From Imagination to Impact 21

Page 40: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCHEDULING

(hyper)Threads

Packages

Cores

(hyper)Threads

Packages

Cores

(hyper)Threads

Packages

Cores

RAM

RAM

RAM

NUMA Node

NICTA Copyright c© 2014 From Imagination to Impact 21

Page 41: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCHEDULING

Locality Issues:

• Best to reschedule on same processor (don’t move

cache footprint, keep memory close)

NICTA Copyright c© 2014 From Imagination to Impact 22

Page 42: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCHEDULING

Locality Issues:

• Best to reschedule on same processor (don’t move

cache footprint, keep memory close)

– Otherwise schedule on a ‘nearby’ processor

NICTA Copyright c© 2014 From Imagination to Impact 22

Page 43: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCHEDULING

Locality Issues:

• Best to reschedule on same processor (don’t move

cache footprint, keep memory close)

– Otherwise schedule on a ‘nearby’ processor

• Try to keep whole sockets idle

NICTA Copyright c© 2014 From Imagination to Impact 22

Page 44: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCHEDULING

Locality Issues:

• Best to reschedule on same processor (don’t move

cache footprint, keep memory close)

– Otherwise schedule on a ‘nearby’ processor

• Try to keep whole sockets idle

• Somehow identify cooperating threads, co-schedule

on same package?

NICTA Copyright c© 2014 From Imagination to Impact 22

Page 45: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCHEDULING

• One queue per processor (or hyperthread)

• Processors in hierarchical ‘domains’

• Load balancing per-domain, bottom up

• Aims to keep whole domains idle if possible (power

savings)

NICTA Copyright c© 2014 From Imagination to Impact 23

Page 46: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

MEMORY MANAGEMENT

Memory in

zones

Highmem

Normal

DMA

Normal

Physical address 0

16M

900M

DMA

3GLinux kernel

User VM

VirtualPhysical

Iden

tity

Map

ped

wit

h o

ffse

t

NICTA Copyright c© 2014 From Imagination to Impact 24

Page 47: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

MEMORY MANAGEMENT

• Direct mapped pages become logical addresses

– pa() and va() convert physical to virtual for

these

NICTA Copyright c© 2014 From Imagination to Impact 25

Page 48: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

MEMORY MANAGEMENT

• Direct mapped pages become logical addresses

– pa() and va() convert physical to virtual for

these

• small memory systems have all memory as logical

NICTA Copyright c© 2014 From Imagination to Impact 25

Page 49: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

MEMORY MANAGEMENT

• Direct mapped pages become logical addresses

– pa() and va() convert physical to virtual for

these

• small memory systems have all memory as logical

• More memory → ∆ kernel refer to memory by

struct page

NICTA Copyright c© 2014 From Imagination to Impact 25

Page 50: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

MEMORY MANAGEMENT

struct page:

• Every frame has a struct page (up to 10 words)

• Track:

– flags

– backing address space

– offset within mapping or freelist pointer

– Reference counts

– Kernel virtual address (if mapped)

NICTA Copyright c© 2014 From Imagination to Impact 26

Page 51: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

MEMORY MANAGEMENT

File(or swap)

struct

address_space

struct

vm_area_structstruct

vm_area_structstruct

vm_area_struct

struct mm_struct

In virtual address order....

struct task_struct

Pag

e T

able

(har

dw

are

def

ined

)

owner

NICTA Copyright c© 2014 From Imagination to Impact 27

Page 52: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

MEMORY MANAGEMENT

Address Space:

• Misnamed: means collection of pages mapped from

the same object

• Tracks inode mapped from, radix tree of pages in

mapping

• Has ops (from file system or swap manager) to:

dirty mark a page as dirty

readpages populate frames from backing store

writepages Clean pages — make backing store the

same as in-memory copyNICTA Copyright c© 2014 From Imagination to Impact 28

Page 53: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

MEMORY MANAGEMENT

migratepage Move pages between NUMA nodes

Others. . . And other housekeeping

NICTA Copyright c© 2014 From Imagination to Impact 29

Page 54: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

PAGE FAULT TIME

• Special case in-kernel faults

• Find the VMA for the address

– segfault if not found (unmapped area)

• If it’s a stack, extend it.

• Otherwise:

1. Check permissions, SIG SEGV if bad

2. Call handle mm fault():

– walk page table to find entry (populate higher

levels if nec. until leaf found)

– call handle pte fault()

NICTA Copyright c© 2014 From Imagination to Impact 30

Page 55: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

PAGE FAULT TIME

handle pte fault(): Depending on PTE status, can

• provide an anonymous page

• do copy-on-write processing

• reinstantiate PTE from page cache

• initiate a read from backing store.

and if necessary flushes the TLB.

NICTA Copyright c© 2014 From Imagination to Impact 31

Page 56: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

DRIVER INTERFACE

Three kinds of device:

1. Platform device

2. enumerable-bus device

3. Non-enumerable-bus device

NICTA Copyright c© 2014 From Imagination to Impact 32

Page 57: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

DRIVER INTERFACE

Enumerable buses:

static DEFINE_PCI_DEVICE_TABLE(cp_pci_tbl) = {

{ PCI_DEVICE(PCI_VENDOR_ID_REALTEK,PCI_DEVICE_ID_REALTEK_8139)

{ PCI_DEVICE(PCI_VENDOR_ID_TTTECH,PCI_DEVICE_ID_TTTECH_MC322),

{ },

};

MODULE_DEVICE_TABLE(pci, cp_pci_tbl);

NICTA Copyright c© 2014 From Imagination to Impact 33

Page 58: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

DRIVER INTERFACE

Driver interface:

init called to register driver

exit called to deregister driver, at module unload time

probe() called when bus-id matches; returns 0 if driver

claims device

open, close, etc as necessary for driver class

NICTA Copyright c© 2014 From Imagination to Impact 34

Page 59: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

DRIVER INTERFACE

Platform Devices:

static struct platform_device nslu2_uart = {

.name = "serial8250",

.id = PLAT8250_DEV_PLATFORM,

.dev.platform_data = nslu2_uart_data,

.num_resources = 2,

.resource = nslu2_uart_resources,

};

NICTA Copyright c© 2014 From Imagination to Impact 35

Page 60: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

DRIVER INTERFACE

non-enumerable buses: Treat like platform devices

NICTA Copyright c© 2014 From Imagination to Impact 36

Page 61: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SUMMARY

• I’ve told you status today

NICTA Copyright c© 2014 From Imagination to Impact 37

Page 62: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SUMMARY

• I’ve told you status today

– Next week it may be different

NICTA Copyright c© 2014 From Imagination to Impact 37

Page 63: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SUMMARY

• I’ve told you status today

– Next week it may be different

• I’ve simplified a lot. There are many hairy details

NICTA Copyright c© 2014 From Imagination to Impact 37

Page 64: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

FILE SYSTEMS

I’m assuming:

• You’ve already looked at ext[234]-like filesystems

• You’ve some awareness of issues around on-disk

locality and I/O performance

• You understand issues around avoiding on-disk

corruption by carefully ordering events, and/or by the

use of a Journal.

NICTA Copyright c© 2014 From Imagination to Impact 38

Page 65: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

NORMAL FILE SYSTEMS

• Optimised for use on spinning disk

• RAID optimised (especially XFS)

• Journals, snapshots, transactions...

NICTA Copyright c© 2014 From Imagination to Impact 39

Page 66: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

FLASH MEMORY

• NOR Flash

• NAND Flash

NICTA Copyright c© 2014 From Imagination to Impact 40

Page 67: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

FLASH MEMORY

• NOR Flash

• NAND Flash

– MTD

– eMMC, SDHC etc

– SSD, USB

NICTA Copyright c© 2014 From Imagination to Impact 40

Page 68: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

FLASH MEMORY

• NOR Flash

• NAND Flash

– MTD — Memory Technology Device

– eMMC, SDHC etc

– SSD, USB

NICTA Copyright c© 2014 From Imagination to Impact 40

Page 69: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

FLASH MEMORY

• NOR Flash

• NAND Flash

– MTD — Memory Technology Device

– eMMC, SDHC etc — A JEDEC standard

– SSD, USB

NICTA Copyright c© 2014 From Imagination to Impact 40

Page 70: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

FLASH MEMORY

• NOR Flash

• NAND Flash

– MTD — Memory Technology Device

– eMMC, SDHC etc — A JEDEC standard

– SSD, USB — and other disk-like interfaces

NICTA Copyright c© 2014 From Imagination to Impact 40

Page 71: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

NAND CHARACTERISTICS

Erase Block

Page

Interface Circuitry

NAND Flash Chip

NICTA Copyright c© 2014 From Imagination to Impact 41

Page 72: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

FLASH UPDATE

NICTA Copyright c© 2014 From Imagination to Impact 42

Page 73: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

FLASH UPDATE

NICTA Copyright c© 2014 From Imagination to Impact 42

Page 74: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

FLASH UPDATE

NICTA Copyright c© 2014 From Imagination to Impact 43

Page 75: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

FLASH UPDATE

NICTA Copyright c© 2014 From Imagination to Impact 43

Page 76: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

FLASH UPDATE

NICTA Copyright c© 2014 From Imagination to Impact 43

Page 77: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

FLASH UPDATE

NICTA Copyright c© 2014 From Imagination to Impact 43

Page 78: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

FLASH UPDATE

NICTA Copyright c© 2014 From Imagination to Impact 43

Page 79: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

FLASH UPDATE

NICTA Copyright c© 2014 From Imagination to Impact 43

Page 80: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

FLASH UPDATE

NICTA Copyright c© 2014 From Imagination to Impact 43

Page 81: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

FLASH UPDATE

NICTA Copyright c© 2014 From Imagination to Impact 43

Page 82: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

FLASH UPDATE

Erase Block

Page

Interface Circuitry

NAND Flash Chip

NOR flashorEEPROM

RAM

Processor

NICTA Copyright c© 2014 From Imagination to Impact 44

Page 83: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

FLASH UPDATE

NICTA Copyright c© 2014 From Imagination to Impact 45

Page 84: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

THE CONTROLLER:

• Presents illusion of ‘standard’ block device

• Manages writes to prevent wearing out

• Manages reads to prevent read-disturb

• Performs garbage collection

• Performs bad-block management

NICTA Copyright c© 2014 From Imagination to Impact 46

Page 85: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

THE CONTROLLER:

• Presents illusion of ‘standard’ block device

• Manages writes to prevent wearing out

• Manages reads to prevent read-disturb

• Performs garbage collection

• Performs bad-block management

Mostly documented in Korean patents referred to by US

patents!

NICTA Copyright c© 2014 From Imagination to Impact 46

Page 86: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

WEAR MANAGEMENT

Two ways:

• Remap blocks when they begin to fail (bad block

remapping)

NICTA Copyright c© 2014 From Imagination to Impact 47

Page 87: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

WEAR MANAGEMENT

Two ways:

• Remap blocks when they begin to fail (bad block

remapping)

• Spread writes over all erase blocks (wear levelling)

NICTA Copyright c© 2014 From Imagination to Impact 47

Page 88: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

WEAR MANAGEMENT

Two ways:

• Remap blocks when they begin to fail (bad block

remapping)

• Spread writes over all erase blocks (wear levelling)

In practice both are used.

NICTA Copyright c© 2014 From Imagination to Impact 47

Page 89: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

WEAR MANAGEMENT

Two ways:

• Remap blocks when they begin to fail (bad block

remapping)

• Spread writes over all erase blocks (wear levelling)

In practice both are used.

Also:

• Count reads and schedule garbage collection after

some threshhold

NICTA Copyright c© 2014 From Imagination to Impact 47

Page 90: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

PREFORMAT

• Typically use FAT32 (or exFAT for sdxc cards)

• Always do cluster-size I/O (64k)

• First partition segment-aligned

NICTA Copyright c© 2014 From Imagination to Impact 48

Page 91: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

PREFORMAT

• Typically use FAT32 (or exFAT for sdxc cards)

• Always do cluster-size I/O (64k)

• First partition segment-aligned

Conjecture Flash controller optimises for the

preformatted FAT fs

NICTA Copyright c© 2014 From Imagination to Impact 48

Page 92: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

FAT FILE SYSTEMS

Clu

ster

Info

Blo

ck

FAT

Data Area

Root Directory

Bo

ot

Par

amR

eser

ved

NICTA Copyright c© 2014 From Imagination to Impact 49

Page 93: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

FAT FILE SYSTEMS

Conjecture The controller has some number of

buffers it treats specially, to allow more than one write

locus.

NICTA Copyright c© 2014 From Imagination to Impact 50

Page 94: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

TESTING SDHC CARDS

NICTA Copyright c© 2014 From Imagination to Impact 51

Page 95: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SD CARD CHARACTERISTICS

Card Price/G #AU Page size Erase Size

Kingston

Class 10

$0.80 2 128k 4M

Toshiba

Class 10

$1.20 2 64k 8M

SanDisk

Extreme

UHS-1

$5.00 9 64k 8M

SanDisk

Ex-

treme Pro

$6.50 9 16k 4M

NICTA Copyright c© 2014 From Imagination to Impact 52

Page 96: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

WRITE PATTERNS: FILE CREATE

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

0 0.5 1 1.5 2 2.5 3 3.5 4

Blo

ckN

umbe

r

Time

Write 40M File

0

2e+06

4e+06

6e+06

8e+06

1e+07

1.2e+07

1.4e+07

0 2 4 6 8 10 12

Blo

ckN

umbe

r

Time

Write 40M File

(On Toshiba Exceria card)

NICTA Copyright c© 2014 From Imagination to Impact 53

Page 97: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

WRITE PATTERNS: FILE CREATE

8000

10000

12000

14000

16000

18000

20000

22000

0 1 2 3 4 5 6 7

"~/iozone.dat" using 1:2

NICTA Copyright c© 2014 From Imagination to Impact 54

Page 98: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

F2FS

• By Samsung

NICTA Copyright c© 2014 From Imagination to Impact 55

Page 99: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

F2FS

• By Samsung

• ‘Use on-card FTL, rather than work against it’

NICTA Copyright c© 2014 From Imagination to Impact 55

Page 100: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

F2FS

• By Samsung

• ‘Use on-card FTL, rather than work against it’

• Cooperate with garbage collection

NICTA Copyright c© 2014 From Imagination to Impact 55

Page 101: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

F2FS

• By Samsung

• ‘Use on-card FTL, rather than work against it’

• Cooperate with garbage collection

• Use FAT32 optimisations

NICTA Copyright c© 2014 From Imagination to Impact 55

Page 102: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

F2FS

• 2M Segments written as whole chunks

NICTA Copyright c© 2014 From Imagination to Impact 56

Page 103: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

F2FS

• 2M Segments written as whole chunks — always

writes at log head

NICTA Copyright c© 2014 From Imagination to Impact 56

Page 104: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

F2FS

• 2M Segments written as whole chunks — always

writes at log head

— aligned with FLASH allocation units

NICTA Copyright c© 2014 From Imagination to Impact 56

Page 105: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

F2FS

• 2M Segments written as whole chunks — always

writes at log head

— aligned with FLASH allocation units

• Log is the only data structure on-disk

NICTA Copyright c© 2014 From Imagination to Impact 56

Page 106: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

F2FS

• 2M Segments written as whole chunks — always

writes at log head

— aligned with FLASH allocation units

• Log is the only data structure on-disk

• Metadata (e.g., head of log) written to FAT area in

single-block writes

NICTA Copyright c© 2014 From Imagination to Impact 56

Page 107: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

F2FS

• 2M Segments written as whole chunks — always

writes at log head

— aligned with FLASH allocation units

• Log is the only data structure on-disk

• Metadata (e.g., head of log) written to FAT area in

single-block writes

• Splits Hot and Cold data and Inodes.

NICTA Copyright c© 2014 From Imagination to Impact 56

Page 108: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

BENCHMARKS: POSTMARK 32K READ

Kingston Toshiba

Sandisk Extreme SanDisk Extreme Pro

Filesystem

EX

T4

FA

T32

1

0

2

3

5

4

F2F

S

MB

/s

NICTA Copyright c© 2014 From Imagination to Impact 57

Page 109: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

USING NON-F2FS

• Observation: XFS and ext4 already understand RAID

NICTA Copyright c© 2014 From Imagination to Impact 58

Page 110: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

USING NON-F2FS

• Observation: XFS and ext4 already understand RAID

• RAID has multiple chunks, and a fixed stride, so...

NICTA Copyright c© 2014 From Imagination to Impact 58

Page 111: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

USING NON-F2FS

• Observation: XFS and ext4 already understand RAID

• RAID has multiple chunks, and a fixed stride, so...

• Configure FS as if for RAID

NICTA Copyright c© 2014 From Imagination to Impact 58

Page 112: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

USING NON-F2FS

Still running benchmarks, see LCA talk next January for

results!

NICTA Copyright c© 2014 From Imagination to Impact 59

Page 113: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCALABILITY

The Multiprocessor Effect:

• Some fraction of the system’s cycles are not available

for application work:

– Operating System Code Paths

– Inter-Cache Coherency traffic

– Memory Bus contention

– Lock synchronisation

– I/O serialisation

NICTA Copyright c© 2014 From Imagination to Impact 60

Page 114: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCALABILITY

Amdahl’s law:

If a process can be split

such that σ of the running

time cannot be sped up, but

the rest is sped up by

running on p processors,

then overall speedup is

p

1 + σ(p− 1)

T(1−σ ) Tσ

T(1−σ )

T(1−σ )

T(1−σ )

NICTA Copyright c© 2014 From Imagination to Impact 61

Page 115: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCALABILITY

1 processor

Throughput

Applied load

NICTA Copyright c© 2014 From Imagination to Impact 62

Page 116: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCALABILITY

1 processor

Throughput

Applied load

NICTA Copyright c© 2014 From Imagination to Impact 62

Page 117: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCALABILITY

1 processor

Throughput

Applied load

NICTA Copyright c© 2014 From Imagination to Impact 62

Page 118: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCALABILITY

1 processor

Throughput

Applied load

NICTA Copyright c© 2014 From Imagination to Impact 62

Page 119: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCALABILITY

1 processor

Throughput

Applied load

2 processors

3 processors

NICTA Copyright c© 2014 From Imagination to Impact 62

Page 120: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCALABILITY

1 processor

Throughput

Applied load

2 processors

3 processors

NICTA Copyright c© 2014 From Imagination to Impact 62

Page 121: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCALABILITY

1 processor

Throughput

Applied load

2 processors

3 processors

NICTA Copyright c© 2014 From Imagination to Impact 62

Page 122: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCALABILITY

3 processors

2 processors

Applied load

Throughput

Latency

Throughput

NICTA Copyright c© 2014 From Imagination to Impact 63

Page 123: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCALABILITY

Gunther’s law:

C(N) =N

1 + α(N − 1) + βN(N − 1)

where:

N is demand

α is the amount of serialisation: represents Amdahl’s law

β is the coherency delay in the system.

C is Capacity or Throughput

NICTA Copyright c© 2014 From Imagination to Impact 64

Page 124: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCALABILITY

0

2000

4000

6000

8000

10000

0 2000 4000 6000 8000 10000

Thr

ough

put

Load

USL with alpha=0,beta=0

α = 0, β = 0

NICTA Copyright c© 2014 From Imagination to Impact 65

Page 125: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCALABILITY

0

2000

4000

6000

8000

10000

0 2000 4000 6000 8000 10000

Thr

ough

put

Load

USL with alpha=0,beta=0

α = 0, β = 0

0

10

20

30

40

50

60

70

0 2000 4000 6000 8000 10000

Thr

ough

put

Load

USL with alpha=0.015,beta=0

α > 0, β = 0

NICTA Copyright c© 2014 From Imagination to Impact 65

Page 126: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCALABILITY

0

2000

4000

6000

8000

10000

0 2000 4000 6000 8000 10000

Thr

ough

put

Load

USL with alpha=0,beta=0

α = 0, β = 0

0

10

20

30

40

50

60

70

0 2000 4000 6000 8000 10000

Thr

ough

put

Load

USL with alpha=0.015,beta=0

α > 0, β = 0

0

100

200

300

400

500

600

700

0 2000 4000 6000 8000 10000

Thr

ough

put

Load

USL with alpha=0.001,beta=0.0000001

α > 0, β > 0

NICTA Copyright c© 2014 From Imagination to Impact 65

Page 127: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCALABILITY

Queueing Models:

ServerQueue

Poissonarrivals

Poissonservice times

NICTA Copyright c© 2014 From Imagination to Impact 66

Page 128: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCALABILITY

Queueing Models:

ServerQueue

Poissonarrivals

Poissonservice times

ServerQueue

Poissonservice times

Same Server

High Priority

Normal Priority

Sink

NICTA Copyright c© 2014 From Imagination to Impact 66

Page 129: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCALABILITY

Real examples:

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

0 10 20 30 40 50 60 70 80

Thr

ough

put

Load

Postgres TPC throughput

NICTA Copyright c© 2014 From Imagination to Impact 67

Page 130: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCALABILITY

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

0 10 20 30 40 50 60 70 80

Thr

ough

put

Load

USL with alpha=0.342101,beta=0.017430Postgres TPC throughput

NICTA Copyright c© 2014 From Imagination to Impact 68

Page 131: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCALABILITY

0

1000

2000

3000

4000

5000

6000

7000

8000

0 10 20 30 40 50 60 70 80

Thr

ough

put

Load

Postgres TPC throughput, separate log disc

NICTA Copyright c© 2014 From Imagination to Impact 69

Page 132: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCALABILITY

Another example:

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

0 10 20 30 40 50

Jobs

per

Min

ute

Number of Clients

01-way02-way04-way08-way12-way

NICTA Copyright c© 2014 From Imagination to Impact 70

Page 133: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCALABILITY

SPINLOCKS HOLD WAIT

UTIL CON MEAN( MAX ) MEAN( MAX )(% CPU) TOTAL NOWAIT SPIN RJECT NAME

72.3% 13.1% 0.5us(9.5us) 29us( 20ms)(42.5%) 50542055 86.9% 13.1% 0%

find lock page+0x30

0.01% 85.3% 1.7us(6.2us) 46us(4016us)(0.01%) 1113 14.7% 85.3% 0%

find lock page+0x130

NICTA Copyright c© 2014 From Imagination to Impact 71

Page 134: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCALABILITY

struct page *find lock page(struct address space *mapping,

unsigned long offset)

{

struct page *page;

spin lock irq(&mapping->tree lock);

repeat:

page = radix tree lookup(&mapping>page tree, offset);

if (page) {

page cache get(page);

if (TestSetPageLocked(page)) {

spin unlock irq(&mapping->tree lock);

lock page(page);

spin lock irq(&mapping->tree lock);

. . .NICTA Copyright c© 2014 From Imagination to Impact 72

Page 135: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

SCALABILITY

0

20000

40000

60000

80000

100000

120000

140000

160000

180000

0 10 20 30 40 50

Jobs

per

Min

ute

Number of Clients

01-way02-way04-way08-way12-way16-way

NICTA Copyright c© 2014 From Imagination to Impact 73

Page 136: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

TACKLING SCALABILITY PROBLEMS

• Find the bottleneck

NICTA Copyright c© 2014 From Imagination to Impact 74

Page 137: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

TACKLING SCALABILITY PROBLEMS

• Find the bottleneck

– not always easy

NICTA Copyright c© 2014 From Imagination to Impact 74

Page 138: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

TACKLING SCALABILITY PROBLEMS

• Find the bottleneck

• fix or work around it

NICTA Copyright c© 2014 From Imagination to Impact 74

Page 139: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

TACKLING SCALABILITY PROBLEMS

• Find the bottleneck

• fix or work around it

– not always easy

NICTA Copyright c© 2014 From Imagination to Impact 74

Page 140: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

TACKLING SCALABILITY PROBLEMS

• Find the bottleneck

• fix or work around it

• check performance doesn’t suffer too much on the

low end.

NICTA Copyright c© 2014 From Imagination to Impact 74

Page 141: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

TACKLING SCALABILITY PROBLEMS

• Find the bottleneck

• fix or work around it

• check performance doesn’t suffer too much on the

low end.

• Experiment with different algorithms, parameters

NICTA Copyright c© 2014 From Imagination to Impact 74

Page 142: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

TACKLING SCALABILITY PROBLEMS

• Each solved problem

uncovers another

• Fixing performance for

one workload can

worsen another

NICTA Copyright c© 2014 From Imagination to Impact 75

Page 143: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

TACKLING SCALABILITY PROBLEMS

• Each solved problem

uncovers another

• Fixing performance for

one workload can

worsen another

• Performance problems

can make you cry

NICTA Copyright c© 2014 From Imagination to Impact 75

Page 144: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

DOING WITHOUT LOCKS

Avoiding Serialisation:

• Lock-free algorithms

• Allow safe concurrent access without excessive

serialisation

NICTA Copyright c© 2014 From Imagination to Impact 76

Page 145: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

DOING WITHOUT LOCKS

Avoiding Serialisation:

• Lock-free algorithms

• Allow safe concurrent access without excessive

serialisation

• Many techniques. We cover:

– Sequence locks

– Read-Copy-Update (RCU)

NICTA Copyright c© 2014 From Imagination to Impact 76

Page 146: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

DOING WITHOUT LOCKS

Sequence locks:

• Readers don’t lock

• Writers serialised.

NICTA Copyright c© 2014 From Imagination to Impact 77

Page 147: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

DOING WITHOUT LOCKS

Reader:

volatile seq;

do {

do {

lastseq = seq;

} while (lastseq & 1);

rmb();

....

} while (lastseq != seq);

NICTA Copyright c© 2014 From Imagination to Impact 78

Page 148: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

DOING WITHOUT LOCKS

Writer:

spinlock(&lck);

seq++; wmb()

...

wmb(); seq++;

spinunlock(&lck);

NICTA Copyright c© 2014 From Imagination to Impact 79

Page 149: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

DOING WITHOUT LOCKS

RCU: ??

1.

NICTA Copyright c© 2014 From Imagination to Impact 80

Page 150: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

DOING WITHOUT LOCKS

RCU: ??

1. 2.

NICTA Copyright c© 2014 From Imagination to Impact 80

Page 151: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

DOING WITHOUT LOCKS

RCU: ??

1. 2.

3.

NICTA Copyright c© 2014 From Imagination to Impact 80

Page 152: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

DOING WITHOUT LOCKS

RCU: ??

1. 2.

3. 4.

NICTA Copyright c© 2014 From Imagination to Impact 80

Page 153: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

BACKGROUND READING

References

McKenney, P. E. (2004), Exploiting Deferred Destruction:

An Analysis of Read-Copy-Update Techniques in

Operating System Kernels, PhD thesis, OGI School of

Science and Engineering at Oregon Health and

Sciences University.

URL:

http://www.rdrop.com/users/paulmck/RCU/RCUdissertation.2004.07.14e1.pdf

McKenney, P. E., Sarma, D., Arcangelli, A., Kleen, A.,

Krieger, O. & Russell, R. (2002), Read copy update, inNICTA Copyright c© 2014 From Imagination to Impact 81

Page 154: LINUX, LOCKING AND OTS OF - hs-rm.dekaiser/1515_aos/08-linux.pdf · LINUX, LOCKING AND LOTS OF PROCESSORS Peter Chubb peter.chubb@nicta.com.au

BACKGROUND READING

‘Ottawa Linux Symp.’.

URL:

http://www.rdrop.com/users/paulmck/rclock/rcu.2002.07.08.pdf

NICTA Copyright c© 2014 From Imagination to Impact 82