faster work–stealing with return barriers€¦ · • higher steal rate correlates to high steal...

33
Vivek Kumar, Stephen M Blackburn The Australian National University Faster Work–Stealing With Return Barriers

Upload: others

Post on 19-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

  • Vivek Kumar, Stephen M Blackburn The Australian National University

    Faster Work–Stealing With Return Barriers

  • •  Commodity processors with parallel execution abilities

    •  A fundamental turn toward concurrency in software

    2

    The New Era of Computing

    Faster Work-Stealing With Return Barriers | Kumar & Blackburn | VMIL’12

    Background

  • •  Modern hardware requires s/w parallelism •  Software parallelism difficult to identify, expose

    –  Hard coded optimizations may get you there…

    •  Hard to realize potential of modern processors

    The Challenge

    Goal: performance and productivity

    3

    3

    Faster Work-Stealing With Return Barriers | Kumar & Blackburn | VMIL’12

    Background

  • •  Language based features to expose parallelism –  Dynamic task parallelism –  Work–stealing scheduler

    •  A runtime to hide the hardware complexities

    4

    Options ?

    Faster Work-Stealing With Return Barriers | Kumar & Blackburn | VMIL’12

    Background

  • Contributions ✔ In-depth analysis

    Of overheads associated with stealing tasks

    ✔ A new design Simple extension to JVM re-using old idea

    ✔ Detailed performance study Using standard work-stealing benchmarks

    ✔ Results That show we can significantly reduce the tasks stealing overhead

    5

    Faster Work-Stealing With Return Barriers | Kumar & Blackburn | VMIL’12

  • Understanding Work–Stealing

    6

    Faster Work-Stealing With Return Barriers | Kumar & Blackburn | VMIL’12

  • Work–Stealing

    7

    Faster Work-Stealing With Return Barriers | Kumar & Blackburn | VMIL’12

  • Work–Stealing

    8

    Faster Work-Stealing With Return Barriers | Kumar & Blackburn | VMIL’12

  • Work–Stealing

    9

    Faster Work-Stealing With Return Barriers | Kumar & Blackburn | VMIL’12

  • Work–Stealing

    10

    Faster Work-Stealing With Return Barriers | Kumar & Blackburn | VMIL’12

  • Initiation

    Termination

    State Management

    Work–Stealing

    11

    Faster Work-Stealing With Return Barriers | Kumar & Blackburn | VMIL’12

  • 12

    Our Prior Work

    Faster Work-Stealing With Return Barriers | Kumar & Blackburn | VMIL’12

  • OOPSLA 2012

    13

    Faster Work-Stealing With Return Barriers | Kumar & Blackburn | VMIL’12

    Our Prior Work

  • •  Sequential overheads –  Initiation –  State management –  Code restructuring

    •  Exploit existing JVM mechanisms –  Initiation: Execution stack for steal initiation –  State management: Extract state from stack & registers –  Code restructuring: Try–catch blocks for control flow

    •  Eliminated most sequential overheads

    Fork–Join: 200% X10: 400%

    12%

    Eliminating Sequential Overheads

    14

    Faster Work-Stealing With Return Barriers | Kumar & Blackburn | VMIL’12

    Our Prior Work

  • Motivating Analysis

    15

    Faster Work-Stealing With Return Barriers | Kumar & Blackburn | VMIL’12

  • 16

    Methodology

    •  Benchmarks –  Jacobi –  LU Decomposition –  Heat Diffusion

    •  High steal ratio

    •  Hardware Platform –  2 Intel Xeon E7530

    •  6 cores each

    •  Software Platform –  Jikes RVM (3.1.2)

    16

    Faster Work-Stealing With Return Barriers | Kumar & Blackburn | VMIL’12

    Motivating Analysis

  • 1.00E-06

    1.00E-05

    1.00E-04

    1.00E-03

    1.00E-02

    1.00E-01

    1.00E+00

    2 3 4 5 6 7 8 9 10 11 12

    Stea

    ls/T

    ask

    Threads

    Heat Jacobi LUD

    Measured using JavaWS (Try–Catch)

    17

    Steals:Task Ratio

    17

    Faster Work-Stealing With Return Barriers | Kumar & Blackburn | VMIL’12

    Motivating Analysis

  • 0

    500

    1000

    1500

    2000

    2500

    3000

    3500

    2 3 4 5 6 7 8 9 10 11 12

    Stea

    ls/s

    econ

    d

    Threads

    Heat Jacobi LUD

    18

    Steal Rate

    18

    Faster Work-Stealing With Return Barriers | Kumar & Blackburn | VMIL’12

    Motivating Analysis

    Measured using JavaWS (Try–Catch)

  • Insight

    •  Steal ratio and steal rate not correlated

    19

    Faster Work-Stealing With Return Barriers | Kumar & Blackburn | VMIL’12

    Motivating Analysis

  • 0

    2

    4

    6

    8

    10

    2 3 4 5 6 7 8 9 10 11 12

    Vict

    im W

    ait T

    ime

    (% C

    PU c

    ycle

    s)

    Threads

    Heat Jacobi LUD

    20

    Steal Overhead

    20

    Faster Work-Stealing With Return Barriers | Kumar & Blackburn | VMIL’12

    Motivating Analysis

    Measured using JavaWS (Try–Catch)

  • Insights

    •  Steal ratio and steal rate not correlated •  Higher steal rate correlates to high steal overhead

    21

    Faster Work-Stealing With Return Barriers | Kumar & Blackburn | VMIL’12

    Motivating Analysis

  • Our Approach

    22

    22

    Faster Work-Stealing With Return Barriers | Kumar & Blackburn | VMIL’12

  • Reducing Stealing Overhead

    •  Two pronged attack –  Reduce the cost of each steal

    •  Return barriers –  Reduce total number of steal events

    •  Steal more than one continuation at a time

    23

    Faster Work-Stealing With Return Barriers | Kumar & Blackburn | VMIL’12

    Our Approach

  • Implementation

    24

    Faster Work-Stealing With Return Barriers | Kumar & Blackburn | VMIL’12

  • 25

    Return Barrier •  Allows runtime to intercept a common event •  Hijack a return and bridge to some other method •  Register and stack state preserved

    E

    D

    C

    B

    A

    TOP

    BASE

    Sta

    ck G

    row

    th D

    irect

    ion Frame

    Pointer Return

    Address

    Frame C Method C Frame C trampoline method

    Frame D

    Hijacked Frame Pointer

    Hijacked Return

    Address

    Frame C Method C

    trampoline method

    Faster Work-Stealing With Return Barriers | Kumar & Blackburn | VMIL’12

    Implementation

  • 26

    E

    D

    C

    B

    A

    TOP

    BASE

    Sta

    ck G

    row

    th D

    irect

    ion

    Yieldpoint Mechanism

    A

    Thief Installs Return Barrier

    Faster Work-Stealing With Return Barriers | Kumar & Blackburn | VMIL’12

    Implementation

  • 27

    D

    C

    B

    A

    TOP

    BASE

    Sta

    ck G

    row

    th D

    irect

    ion

    A

    Victim Moves The Return Barrier

    D

    Faster Work-Stealing With Return Barriers | Kumar & Blackburn | VMIL’12

    Implementation

  • 28

    D

    C

    B

    TOP

    BASE

    Sta

    ck G

    row

    th D

    irect

    ion

    A

    Robbing A Victim With Return Barrier

    D

    B

    Yieldpoint Mechanism

    Faster Work-Stealing With Return Barriers | Kumar & Blackburn | VMIL’12

    Implementation

  • Performance Evaluation

    29

    Faster Work-Stealing With Return Barriers | Kumar & Blackburn | VMIL’12

  • Vict

    im w

    ait t

    ime

    CP

    U c

    ycle

    s (%

    )

    Threads

    Jacobi 30

    0

    2

    4

    6

    8

    10

    2 3 4 5 6 7 8 9 10 11 12 0

    2

    4

    6

    8

    10

    12

    2 3 4 5 6 7 8 9 10 11 12 Threads

    Spe

    edup

    Ove

    r Seq

    uent

    ial

    No Return Barrier With Return Barrier

    30

    Faster Work-Stealing With Return Barriers | Kumar & Blackburn | VMIL’12

    Evaluation

  • Vict

    im w

    ait t

    ime

    CP

    U c

    ycle

    s (%

    )

    Threads

    LUD 31

    0

    2

    4

    6

    8

    10

    2 3 4 5 6 7 8 9 10 11 12 0

    2

    4

    6

    8

    10

    12

    2 3 4 5 6 7 8 9 10 11 12 Threads

    Spe

    edup

    Ove

    r Seq

    uent

    ial

    No Return Barrier With Return Barrier

    31

    Faster Work-Stealing With Return Barriers | Kumar & Blackburn | VMIL’12

    Evaluation

  • Summary

    •  Steal overhead dominated by steal rate •  Two pronged attack

    –  Reduce the cost of each steal •  Return barriers

    –  Reduce total number of steals •  Steal more than one

    •  No change in speedup

    32

    58%

    32

    Faster Work-Stealing With Return Barriers | Kumar & Blackburn | VMIL’12

    Summary

  • Future Work

    •  Steal overhead dominated by steal rate •  Two pronged attack

    –  Reduce the cost of each steal •  Return barriers

    –  Reduce total number of steals •  Steal more than one

    •  No change in speedup •  Merge both the techniques

    33

    58%

    ?%

    ?

    Faster Work-Stealing With Return Barriers | Kumar & Blackburn | VMIL’12

    Future Work