smp/smt - computer science and engineeringcs9242/06/lectures/09-smpsmt.pdf · what problems do...

43
SMP/SMT Daniel Potts [email protected] University of New South Wales, Sydney, Australia and National ICT Australia Home Index – p.

Upload: vantuyen

Post on 26-Apr-2018

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

SMP/SMT

Daniel Potts

[email protected]

University of New South Wales, Sydney, Australia

and

National ICT Australia

Home Index – p.

Page 2: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

Overview

• Today’s multi-processors

• Architecture

• New challenges

• Experience and issues for multi-processing in L4

Home Index – p.

Page 3: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

Multi-processor architectures

What we see today...

• uni processor• multi-processor• multi-core (aka multi-processor in same package)• multi-threaded

• Symmetric multi-threading (SMT)• Interleaved multi-threading (IMT)

• combination?

Home Index – p.

Page 4: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

Processor Configurations

Home Index – p.

Page 5: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

Processor Configurations

• SMP: Share external cache; separate TLB and units

Home Index – p.

Page 6: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

Processor Configurations

• SMP: Share external cache; separate TLB and units• SMT: Share all caches, TLB, ALUs..

Home Index – p.

Page 7: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

Processor Configurations

• SMP: Share external cache; separate TLB and units• SMT: Share all caches, TLB, ALUs..• Multi-core: share level 2 cache, separate TLB ...

Home Index – p.

Page 8: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

Processor Configurations

• SMP: Share external cache; separate TLB and units• SMT: Share all caches, TLB, ALUs..• Multi-core: share level 2 cache, separate TLB ...

Home Index – p.

Page 9: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

Processor Configurations

Processing UnitCoreProcessor

• SMP: Share external cache; separate TLB and units• SMT: Share all caches, TLB, ALUs..• Multi-core: share level 2 cache, separate TLB ...

Home Index – p.

Page 10: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

Processor Configurations

Processing UnitCoreProcessor

• SMP: Share external cache; separate TLB and units• SMT: Share all caches, TLB, ALUs..• Multi-core: share level 2 cache, separate TLB ...• Memory (cache) latency may vary between processing

unitsHome Index – p.

Page 11: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

Processor Configurations

Processing UnitCoreProcessor

• SMP: Share external cache; separate TLB and units• SMT: Share all caches, TLB, ALUs..• Multi-core: share level 2 cache, separate TLB ...• Memory (cache) latency may vary between processing

unitsHome Index – p.

Page 12: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

Processor Configurations

Processing UnitCoreProcessor

• SMP: Share external cache; separate TLB and units• SMT: Share all caches, TLB, ALUs..• Multi-core: share level 2 cache, separate TLB ...• Memory (cache) latency may vary between processing

unitsHome Index – p.

Page 13: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

Multi-core SMP + SMT: OpenSparc T1

• 8 cores• 4 threads per core

Home Index – p.

Page 14: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

Processor Topology Considerations

What problems do these MPs create for us?Generally, two ways of interacting:

• memory — shared data accesses• latency ?• coherency ?

• interrupts• latency ?

local vs remote operations?

Home Index – p.

Page 15: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

Memory: Types of multi-processors(MPs)

Uniform Memory Access (UMA)Access to all memory occurs at the same speedfor all processors

Non-Uniform Memory Access (NUMA)Access to some parts of memory is faster for someprocessors than other parts of memory.We focus on cache coherent NUMA.

Home Index – p.

Page 16: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

NUMA

Topology of inter-connect often leads to different latencyand bandwidth between sets of processors.

Home Index – p.

Page 17: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

Cache Coherency

What happens if one CPU writes to address 0x1234,which gets stored in its cache, and another CPU readsfrom the same address?

Home Index – p.

Page 18: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

Cache Coherency

What happens if one CPU writes to address 0x1234,which gets stored in its cache, and another CPU readsfrom the same address?

• Visibility of writes

Home Index – p.

Page 19: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

SMP Coherency Protocol: MESI [1]

Each cache line is in one of four states (MESI):

• Modified (M)• The line is valid in the cache and in only this cache.• The line is modified with respect to system memory

that is, the modified data in the line has not beenwritten back to memory.

• Exclusive (E)• The addressed line is in this cache only.• The data in this line is consistent with system

memory.

Home Index – p. 10

Page 20: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

SMP Coherency Protocol: MESI [2]

• Shared (S)• The addressed line is valid in the cache and in

at least one other cache.• A shared line is always consistent with system

memory. That is, the shared state isshared-unmodified; there is no shared-modifiedstate.

• Invalid (I)• This state indicates that the addressed line is

not resident in the cache and/or any datacontained is considered not useful.

Home Index – p. 11

Page 21: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

SMP Coherency Protocol: MESI [3]

RH = Read hitRMS = Read miss, sharedRME = Read miss,

exclusiveWH = Write hitWM = Write missSHI = Snoop hit on inv

SHR = Snoop hit on readLRU = replacement alg

Home Index – p. 12

Page 22: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

Shared Kernel State

Data structures we need to consider:

• scheduling queue[s]• thread lists• page tables (and TLB)

Atomic operations on thread state:

• IPC• delete• running (scheduled)?

Home Index – p. 13

Page 23: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

Kernel Locking

• Several CPUs can be executing kernel codeconcurrently.

• Need mutual exclusion on shared kernel data.• Issues:

• Lock implementation• Granularity of locking

Home Index – p. 14

Page 24: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

Mutual Exclusion Techniques

• Disabling interrupts: CLI — STI

• Spin locks

• Lock objects

Kernel may use a variety of these techniques.

Home Index – p. 15

Page 25: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

Mutual Exclusion Techniques

• Disabling interrupts: CLI — STI• Not suitable for MP

• Spin locks• Busy-waiting wastes cycles?

• Lock objects• Flag (lock) indicates object is locked.• Manipulating lock requires mutual exclusion.

Kernel may use a variety of these techniques.

Home Index – p. 15

Page 26: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

Hardware Provided Locking Primi-tives

• test_and_set• compare_and_swap• atomic_increment• load_linked + store_conditional

Home Index – p. 16

Page 27: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

L4 Design Issues

• One scheduling queue per processing unit?• IPC rendezvous and optimisations• How do we share shared data structures?

Does one approach suit all?

Home Index – p. 17

Page 28: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

Scheduler

Scheduler per processing unit:• Keeps data access local• Migration performed explicitly• No load balancing (except at user-level or...)

Global scheduler (shared queue):• Shared queue between processing units• Mutual exclusion required• Load balancing natural

Hybrids: Scheduling domains• Use both approaches to suit processor topology

Home Index – p. 18

Page 29: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

Scheduler: SMP and ccNUMA

Home Index – p. 19

Page 30: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

Scheduler: SMP and ccNUMA

Home Index – p. 19

Page 31: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

Scheduler: SMP and ccNUMA

Home Index – p. 19

Page 32: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

Scheduler: SMP and ccNUMA

• Inter-processor interrupt (IPI) in 4000 cycles

• memory latency >100 cycles

Preference to keep data local.Solution:Use RPC via IPI to perform operation.

Home Index – p. 20

Page 33: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

Scheduler: SMT

• IPI <100 cycles• memory latency <50 cycles

Shared data structures may be viableSolution:Use locking or lock-free data structures

Home Index – p. 21

Page 34: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

L4 IPC PathUP

Activate src

Src == Polling Block waiting

Yes

Transfer message

Block polling

Receive?

Dest = Running

Dest == waiting

return running

return running

IPC with send

IPC without send

No

No

No

Yes

Yes

Home Index – p. 22

Page 35: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

L4 IPC Path

Needs to be atomicBefore continuing, Dest = locked

SMT

Activate src

Src == Polling Block waiting

Yes

Transfer message

Block polling

Receive?

Dest = Running

Dest == waiting

return running

return running

IPC with send

IPC without send

No

No

No

Yes

Yes

Home Index – p. 22

Page 36: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

L4 IPC Path

Needs to be atomic

Needs to be atomicBefore continuing, Dest = locked

SMT

Activate src

Src == Polling Block waiting

Yes

Transfer message

Block polling

Receive?

Dest = Running

Dest == waiting

return running

return running

IPC with send

IPC without send

No

No

No

Yes

Yes

Home Index – p. 22

Page 37: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

L4 IPC Path

Who do we switch to?highest priority?current?both?

SMT Optimisation

Activate src

Src == Polling Block waiting

Yes

Transfer message

Block polling

Receive?

Dest = Running

Dest == waiting

return running

return running

IPC with send

IPC without send

No

No

No

Yes

Yes

Home Index – p. 22

Page 38: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

L4 IPC PathUP

Activate src

Src == Polling Block waiting

Yes

Transfer message

Block polling

Receive?

Dest = Running

Dest == waiting

return running

return running

IPC with send

IPC without send

No

No

No

Yes

Yes

Home Index – p. 22

Page 39: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

L4 IPC Path

RPC(xcpu_recv)

RPC(xcpu_send_done)

SMP

Is local?

RPC(xcpu_send)

Activate src

Src == Polling Block waiting

Yes

Transfer message

Block polling

Receive?

Dest = Running

Dest == waiting

return running

return running

IPC with send

IPC without send

No

No

No

Yes

Yes

Home Index – p. 22

Page 40: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

IPC Summary

• SMP/UP + SMT work together• Some overlap of SMT run-time possible• Atomic operations on SMT path required

Home Index – p. 23

Page 41: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

Priorities

What do priorities mean on an SMP system?

• No longer guarantee lower priorites won’t run• Priority inversion problems

Home Index – p. 24

Page 42: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

Other SMP/SMT considerations

• Interrupt routing

• Cost of atomic instructions and code sequences

• Power management

Home Index – p. 25

Page 43: SMP/SMT - Computer Science and Engineeringcs9242/06/lectures/09-smpsmt.pdf · What problems do these MPs create for us? ... • No load balancing (except at user-level or ... Other

Conclusion / Observation

• No perfect locking mechanism for all situations• No single approach to data structures

Interesting research and development topic!

Home Index – p. 26