data structures optimisation for many-core systems matthew freeman | supervisor: maciej golebiewski...

20
DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program 2013-14

Upload: andrea-mills

Post on 26-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program 2013-14

DATA STRUCTURES OPTIMISATION FORMANY-CORE SYSTEMSMatthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program 2013-14

Page 2: DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program 2013-14

Presentation title | Presenter name

The Multi-core Age

2 |

Mobile Phone PC Intel Xeon Phi

CSIRO ‘Bragg’ Compute Cluster

2-4 Cores 4-16 Cores 61 Cores 2048 Cores

Page 3: DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program 2013-14

Presentation title | Presenter name

Programming for multi-cores

3 |

Problem

CPU Core 1

CPU Core 2

CPU Core 3

CPU Core 4

Machine Instructions Execution

Divide the problem

Page 4: DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program 2013-14

Presentation title | Presenter name

• The maximum speedup is dependent on % of the problem you can run in parallel

Amdahl's Law

4 |

0 5 10 15 20 25

Single Core Processor

1x Speed

50% 2x speedup

75% 4x speedup

90%

95%

10x speedup

20x speedup

Maximum Speedup

Page 5: DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program 2013-14

Presentation title | Presenter name

Data structures:

5 |

• Memory (data) is still a shared resource.

Memory (data)

CPU core

Single Core Computer

Memory (data)

CPU core

CPU core CPU core

CPU core

4-Core Computer

Page 6: DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program 2013-14

Presentation title | Presenter name

Linked-list (Stack) Data Structure

6 |

DataA link to the next data point

EMPTY

A “node” that holds data.

TOP

Page 7: DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program 2013-14

Presentation title | Presenter name

Add new item (Push)

7 |

Data A EMPTY

We want to add a chunk of data (Data B) to the structure

TOP

Data B

Page 8: DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program 2013-14

Presentation title | Presenter name

Add new item (Push)

8 |

Data A EMPTY

Steps: For new data B

1) Find the start of the structure (TOP)

Data B

TOP

Page 9: DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program 2013-14

Presentation title | Presenter name

Add new item

9 |

Data A EMPTY

Data B

Steps: For new data B

2) Link into the structure.

TOP

Page 10: DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program 2013-14

Presentation title | Presenter name

Add new item

10 |

Data A NULL

TOP (new)

Data B

Steps: For new data B

3) Update TOP.

Page 11: DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program 2013-14

Presentation title | Presenter name

• Like stacking dinner plates• Only need to keep track of where TOP is to access the rest.

Resulting structure

11 |

Data NULLData Data Data Data

TOP

Page 12: DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program 2013-14

Presentation title | Presenter name

What happens in multi-core systems?

12 |

Two threads trying to operate on the stack structure:

Thread 1 attempts at time T.Thread 2 attempts at time T + 1 nanosecond.

Because each of the steps takes time to complete, errors occur.

Page 13: DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program 2013-14

Presentation title | Presenter name

What happens in multi-core systems?

13 |

This causes the interleaving of steps

Thread 1 reads TOP (1)Thread 2 reads TOP (1)Thread 1 sets the next pointer (2)Thread 2 sets the next pointer (2)Thread 1 updates TOP (3)Thread 2 updates TOP (3)

Page 14: DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program 2013-14

Presentation title | Presenter name14 |

Data A EMPTYTOP

Data C

Data B

Data B is lost forever because it is not linked to TOP anymore (Stack failure)

Thread 1

Thread 2

Page 15: DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program 2013-14

Presentation title | Presenter name

• Use “data locks”.• Protect the 3 steps.• One thread at a time is granted access to the stack. • Complete an operation and release the lock.

This is the standard approach for multithreaded structures.

How do we fix this?

15 |

Page 16: DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program 2013-14

Presentation title | Presenter name

Easy to use. 2 lines of code added to fix.- Get Lock- Step 1, 2 ,3.- Release Lock.

× Slow. One thread at a time can use the lock.

This becomes sequential code.This is the code that cannot run in parallel.

Analogy: Merging highway traffic into a single lane.

Locks

16 |

Page 17: DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program 2013-14

Presentation title | Presenter name

New method

• Lock-free data structure.

• Special low-level instructions allows three steps in one computer instruction.

• Removes the need for locks.

• Called a Compare-Exchange.

Lock-free

17 |

Page 18: DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program 2013-14

Presentation title | Presenter name

• Downside: Writing lock-free code is difficult (hence the project).

• The Compare-Exchange operation forms the base for writing lock-free code.

• The project takes specifications from research papers to implement.

Lock-free

18 |

Page 19: DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program 2013-14

Presentation title | Presenter name

Implemented a range of lock-free optimizations for the stack.

Open coding standards (C++, OpenMP)

Benchmarked using a Intel Xeon Phi 61 core processor.

Lock-free structure performed about 2x better for pure stack operations.

Lock-free

19 |

Page 20: DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program 2013-14

Presentation title | Presenter name

Amdahl’s Law shows that it’s important to optimize sequential sections of code.

The shared data structures are often sequential bottlenecks.

Implementing lock-free data structures reduced this bottleneck.

Summary

20 |