analytic models and empirical search: a hybrid approach to code optimization a. epshteyn 1, m....

Analytic Models and Empirical Search: A Hybrid Approach to

Code OptimizationA. Epshteyn1, M. Garzaran1, G. DeJong1,

D. Padua1, G. Ren1, X. Li1,

K. Yotov2, K. Pingali2

1 University of Illinois at Urbana-Champaign2 Cornell University

Two approaches to code optimization:

• Models– E.g., calculate the

best tile size for MM as a function of cache size.

– Fast– May be inaccurate– No verification

through feedback

• Empirical Search– E.g., execute and

measure different versions of MM code with different tile sizes.

– Slow– Accurate because of

feedback

Hybrid Approach

• Faster than empirical search

• More accurate than the model– Use the model as a prior– Use active sampling to minimize the amount

of searching

Why is Speed Important?

• Adaptation may have to be applied at runtime, where running time is critical.

• Adaptation may have to be applied at compile time (e.g., with feedback from a fast simulator)

• Library routines can be used as a benchmark to evaluate alternative machine designs.

Problem: Matrix Multiplication

• Tiling– Improves the locality of references

• Cache Blocking (NB): Matrix is decomposed into smaller subblocks of size NBxNB

• Matrix multiplication - illustrative example for testing the hybrid approach

• Ultimate goal: a learning compiler that specializes itself to its installation environment, user profile, etc.

Empirical Search: ATLAS

• Try tiling parameters NB in the range

in steps of 4)1,80min(...16 sizecacheL

Model (Yotov et. al.)

• Compute NB which optimizes the use of the L1 cache. • Constructed by analyzing the memory access trace of the

matrix multiplication code.• Formula:

• Has been extended to optimize the use of the L2 cache

sizelineL

sizeL

sizelineL

NB

sizelineL

NB

NB

1

11

1*3

1such that

max2

≤+⎥⎥

⎤⎢⎢

⎡+⎥⎥

⎤⎢⎢

⎡

Model in action:

• Performance curve: • Vertical lines: model-predicted L1 and L2 blocking factors

• Whether to tile for the L1 or the L2 cache depends on the architecture and the application

Hybrid approach

• Model performance with a family of regression curves

• Regression (nonparam)

– minimizing the average error

• Regression (ML)

– Distribution over regression curves

– Pick the most likely curve

Regression (Bayesian)

• Prior distribution curve) over regression curves– Make regression curves with model-predicted maxima

more likely

• Posterior distribution given the data (Bayes rule):– P(curve|data)=P(data|curve) (curve)/P(data)

• Pick the maximum a-posteriori curve– Picks curves with peaks in model-predicted locations

when the data sample is small– Picks curves which fit the data best when the sample

is large

Active sampling

• Objectives:1) Sample at lower-tile sizes – takes less time

2) Explore – don’t oversample in the same region

3) Get information about the dominant peak

Solution: Potential Fieldsobjectives 1,2

• Positive charge at the origin

• Negative charges at previously sampled points

• Sample at the point which minimizes the field

Potential Fields objective 3

• Positive charge in the region of the dominant peak

• How do we know which peak dominates:– Distribution over regression curves

• can compute:

P(peak1 is located at x), P(peak2 is located at x),

P(peak1 is of height h), P(peak2 is of height h)• Hence, can compute P(peak1 dominates peak2)• Impose a positive charge in the region of each peak

proportional to its probability of domination

⇒

Results I – Regression Curves

Results II – Time, Performance

Model Hybrid ATLAS

Sparc 376.66 851.04 832.63

SGI 499.81 553.15 505.4

Model Hybrid ATLAS

Sparc 0:00 3:12 8:59

SGI 0:00 14:02 59:00

Performance (MFLOPS) Time (mins)

• Sparc – actual improvement due to the hybrid search for NB: ~10%• SGI – improvement over both the model and ATLAS due to choosing to tile for the L2 cache

Results III – Library Performance

Conclusion

• Approach: incorporates the prior.

• Active sampling: actively picks to sample in the most informative region.

• Decreases the search time of the empirical search, improves on the model’s performance.

analytic models and empirical search: a hybrid approach to code optimization a. epshteyn 1, m....

Documents