adapting convergent scheduling using machine learning diego puppin*, mark stephenson †, una-may...

34
Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson , Una-May O’Reilly , Martin Martin , and Saman Amarasinghe tute for Information Science and Technologies, Italy MMMM MMMMMMMMM MM MMMMMMMMMM, MMM

Upload: loreen-stevens

Post on 28-Dec-2015

221 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

Adapting Convergent Scheduling Using Machine

Learning

Diego Puppin*, Mark Stephenson†, Una-May O’Reilly†, Martin Martin†, and

Saman Amarasinghe†

*Institute for Information Science and Technologies, Italy† , Massachusetts Institute of Technology USA

Page 2: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

Outline

This talk shows how one can apply machine learning techniques to find good phase orderings for an instruction scheduler

First, I’ll introduce the scheduler that we are interested in improving

Then, I’ll discuss genetic programmingThen, I’ll present experimental results

Page 3: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

R4000 likeProcessorCore

Operandnetwork

Clustered Architectures

Memory and registers separated into clustersRAWClustered VLIWs

When scheduling, we try to co-locate data with computation

Page 4: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

Convergent Scheduling

Convergent scheduling passes are symmetric

Each pass takes as input a preference map and outputs a preference map

Passes are modular and can be applied in any order

Page 5: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

Convergent SchedulingPreference Maps

Inst

ruct

ions

Clusters

Tim

e

0 1 2 3

4

5

6

7

0

1

2

3

Each entry is a weightThe weights correspond

to the “confidence” of a space-time assignment for a given instruction

Page 6: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

Four clusters

High confidence

Low confidence

Example Dependence Graph

Page 7: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

Critical Path Strengthening

Page 8: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

Path Propagation

Page 9: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

Parallelism Distribute

Page 10: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

Path Propagation

Page 11: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

Final Schedule

Page 12: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

Convergent Scheduling

“Classical” scheduling passes make absolute decisions that can’t be undone

Convergent scheduling passes make soft decisions in the form of preferencesMistakes made early on can be undone

Passes don’t impose order!

Pass Pass

Page 13: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

Double-Edged Sword

The good news: convergent scheduling does not constrain phase orderNice interface makes writing and integrating

passes easy

The bad news: convergent scheduling does not constrain phase orderLimitless number of phase orders to consider,

some of which are much better than others

Page 14: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

Our Proposal

Use genetic programming to automatically search for a phase ordering that’s catered to a givenArchitectureCompiler

Our inspiration comes from Cooper’s work [Cooper et al., LCTES 1999]

Page 15: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

Genetic Programming

Searching algorithm analogous to Darwinian evolutionMaintain a population of expressions

(sequence INITTIME (sequence PLACE (if imbalanced LOAD COMM)))

Page 16: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

Genetic Programming

Searching algorithm analogous to Darwinian evolutionMaintain a population of expressionsSelection

The fittest expressions in the population are more likely to reproduce

ReproductionCrossing over subexpressions of two expressions

Mutation

Page 17: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

General Flow

Create initial population(initial solutions)

Evaluation

Selection

Randomly generated initial population

Create Variants

done?

Page 18: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

General Flow

Create initial population(initial solutions)

Evaluation

Selection

Create Variants

done?

Compiler is modified to use the given expression as the phase ordering

Each expression is evaluated by compiling and running the benchmark(s)

Fitness is the relative speedup over our original phase ordering on the benchmark(s)

Page 19: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

General Flow

Create initial population(initial solutions)

Evaluation

Selection

Create Variants

done?

Just as with Natural Selection, the fittest individuals are more likely to survive

Page 20: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

General Flow

Create initial population(initial solutions)

Evaluation

Selection

Create Variants

done?

Use crossover and mutation to generate new expressions

And thus, generate new and hopefully improved phase orderings

Page 21: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

Experimental Setup

We use an in-house VLIW compiler (SUIF, MachSUIF) and simulator

Compiler and simulator are parameterized so we can easily change VLIW configurations

Experiments presented here are for clustered architecturesDetails of the architectures are in the paper

Page 22: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

Convergent Scheduling Heuristics

Noise Introduction Initial Time Assignment Preplacement Critical Path Strengthening Communication Minimization Parallelism Distribution Load Balance Dependence Enforcement Assignment Strengthening Functional Unit Distribution Push to first cluster Critical Path Distance Cluster Creation Register Pressure Reduction in Time Register Pressure Reduction in Space

Page 23: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

Hand-Tuned Results4-cluster VLIW, Rich Interconnect

0

0.5

1

1.5

2

2.5

3

3.5

4

vvmul rbsorf yuv tomcatv mxm fir cholesky

Spe

edup

PCC

UAS

Convergent

Page 24: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

Results4-cluster VLIW, Limited Interconnect

Page 25: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

Training an Improved Sequence

Goal: find a sequence that works well for all the benchmarks in the last graph (vmul, rbsorf, yuv, etc.)

Train a sequence using these benchmarks then…For each expression in the population compile

and run all the benchmarks, take the average speedup as fitness

Page 26: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

The Schedule

Evolved sequence is much more conservative in communication

inittime func dep func load func dep func comm dep func comm place

func reduces weights of instructions on overloaded clusters

dep increases probability that dependent instruction scheduled “nearby”

comm tries to keep neighboring instructions in same cluster

Page 27: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

Results4-cluster VLIW, Limited Interconnect

Page 28: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

ResultsLeave-One-Out Cross Validation

Page 29: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

Summary of Results

When we changed the architecture, the hand-tuned sequence failedUAS and PCC outperform convergent

schedulingOur GP system found a sequence that

usually outperforms UAS and PCCCross validation suggests that it is

possible to find a “general-purpose” sequence

Page 30: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

Running Time

Using about 20 machines in a small cluster of workstations it takes about 2 days to evolve a sequence

This is a one-time process!Performed by the compiler vendor

Page 31: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

Disappointing Result

Unfortunately, sequences with conditionals are weeded out of the GP selection processOur system rewards parsimonyConvergent scheduling passes make soft

decisions, so running an extra pass may not be detrimental

We’d like to get to the bottom of this unexpected result

Page 32: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

Conclusions

Using GP we’re able to find architecture-specific, application-independent sequences

We can quickly retune the compiler whenThe architecture changesThe compiler itself changes

Page 33: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *
Page 34: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

Implemented Tests