whole program paths james r. larus. outline 1. find acyclic path fragments 2. convert into...

Post on 17-Jan-2016

219 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Whole Program PathsWhole Program Paths

James R. LarusJames R. Larus

OutlineOutline

1.1. Find acyclic path fragmentsFind acyclic path fragments

2.2. Convert into whole-program pathConvert into whole-program path

3.3. Determine hot subpathsDetermine hot subpaths

Acyclic PathsAcyclic Paths

As per Ball&Larus paper we implementedAs per Ball&Larus paper we implemented

Calculating Acyclic PathsCalculating Acyclic Paths

Instrument chordsInstrument chords

Sum along paths is uniqueSum along paths is unique– Postprocess for functionsPostprocess for functions

Loop iter is new pathLoop iter is new path– New: also function callsNew: also function calls

Dump path ID to fileDump path ID to file

Acyclic Paths OutputAcyclic Paths Output

Acyclic Paths OutputAcyclic Paths Output

OutlineOutline

1.1. Find acyclic path fragmentsFind acyclic path fragments

2.2. Convert into whole-program pathConvert into whole-program path1.1. Compress output stringCompress output string

2.2. Coalesce common substringsCoalesce common substrings

3.3. Store efficientlyStore efficiently

3.3. Determine hot subpathsDetermine hot subpaths

Compress and CoalesceCompress and Coalesce

Grammatical BenefitsGrammatical Benefits

Explain output string as context-free Explain output string as context-free grammar:grammar:– Efficient compression (~20x)Efficient compression (~20x)– Automatic subsequence groupingAutomatic subsequence grouping

Grammar creationGrammar creation– Append symbols to start ruleAppend symbols to start rule– Digrams appear at most onceDigrams appear at most once– Rules must be used at least twiceRules must be used at least twice

Example: 121213121214Example: 121213121214

SEQUITURSEQUITUR

Execution RepresentationExecution Representation

Not a control-flow graph!Not a control-flow graph!

Execution sequence = post-order traversal Execution sequence = post-order traversal of DAGof DAG

Whole PathsWhole Paths

Efficient representationEfficient representation– Create grammar onlineCreate grammar online

Execution context informationExecution context information– e.g., A runs after Be.g., A runs after B

Frequency informationFrequency information

Simple path aggregationSimple path aggregation

OutlineOutline

1.1. Find acyclic path fragmentsFind acyclic path fragments

2.2. Convert into whole-program pathConvert into whole-program path

3.3. Determine hot subpathsDetermine hot subpaths1.1. Find short frequent subsequencesFind short frequent subsequences

2.2. ??????

3.3. Profit!Profit!

OutlineOutline

1.1. Find acyclic path fragmentsFind acyclic path fragments

2.2. Convert into whole-program pathConvert into whole-program path

3.3. Determine hot subpathsDetermine hot subpaths1.1. Find short frequent subsequencesFind short frequent subsequences

2.2. Heavily optimize that 1%Heavily optimize that 1%

3.3. Applies to 75% of cache missesApplies to 75% of cache misses

Hot SubpathsHot Subpaths

Looking for Looking for minimalminimal hot subpaths hot subpaths– L or fewer consecutive acyclic path fragments L or fewer consecutive acyclic path fragments

with cost of C or greaterwith cost of C or greater– Cost = execution frequency x costs of acyclic Cost = execution frequency x costs of acyclic

path fragmentspath fragments– Path fragment cost = number of instructionsPath fragment cost = number of instructions

Finding Hot SubpathsFinding Hot Subpaths

Recursively look for hot minimal subpathsRecursively look for hot minimal subpaths1.1. Split Split

between between childrenchildren

2.2. Processed Processed at lower at lower recursive recursive level level

ResultsResults

Typically:Typically:– 30MB/sec program trace (@200MHz)30MB/sec program trace (@200MHz)– 1 MB/sec program path1 MB/sec program path– 30 grammar rules per path fragment30 grammar rules per path fragment– 100,000 rules in grammar100,000 rules in grammar

Number of hot paths grows slowly with Number of hot paths grows slowly with maximum lengthmaximum lengthSpace sublinear in input size, time Space sublinear in input size, time supralinearsupralinear

ResultsResults

ResultsResults

Typically:Typically:– 30MB/sec program trace (@200MHz)30MB/sec program trace (@200MHz)– 1 MB/sec program path1 MB/sec program path– 30 grammar rules per path fragment30 grammar rules per path fragment– 100,000 rules in grammar100,000 rules in grammar

Number of hot paths grows slowly with Number of hot paths grows slowly with maximum lengthmaximum lengthSpace sublinear in input size, time Space sublinear in input size, time supralinearsupralinear

ResultsResults

SummarySummary

ContributionsContributions– Stream out acyclic path fragments in orderStream out acyclic path fragments in order– Compress and structure with grammarCompress and structure with grammar– Find hot subpaths from whole program pathFind hot subpaths from whole program path

LimitationsLimitations– 15x runtime slowdown15x runtime slowdown– Space-based limits on runtimeSpace-based limits on runtime– High number of hot paths foundHigh number of hot paths found

QuestionsQuestions

What other potentially-useful information What other potentially-useful information does this data structure give?does this data structure give?– Order-dependent code errorsOrder-dependent code errors

What potential for optimization does this What potential for optimization does this open up?open up?– Other applications?Other applications?– Experimental hot-path results?Experimental hot-path results?

top related