analytic models and empirical search: a hybrid approach to code optimization a. epshteyn 1, m....
TRANSCRIPT
Analytic Models and Empirical Search: A Hybrid Approach to
Code OptimizationA. Epshteyn1, M. Garzaran1, G. DeJong1,
D. Padua1, G. Ren1, X. Li1,
K. Yotov2, K. Pingali2
1 University of Illinois at Urbana-Champaign2 Cornell University
Two approaches to code optimization:
• Models– E.g., calculate the
best tile size for MM as a function of cache size.
– Fast– May be inaccurate– No verification
through feedback
• Empirical Search– E.g., execute and
measure different versions of MM code with different tile sizes.
– Slow– Accurate because of
feedback
Hybrid Approach
• Faster than empirical search
• More accurate than the model– Use the model as a prior– Use active sampling to minimize the amount
of searching
Why is Speed Important?
• Adaptation may have to be applied at runtime, where running time is critical.
• Adaptation may have to be applied at compile time (e.g., with feedback from a fast simulator)
• Library routines can be used as a benchmark to evaluate alternative machine designs.
Problem: Matrix Multiplication
• Tiling– Improves the locality of references
• Cache Blocking (NB): Matrix is decomposed into smaller subblocks of size NBxNB
• Matrix multiplication - illustrative example for testing the hybrid approach
• Ultimate goal: a learning compiler that specializes itself to its installation environment, user profile, etc.
Empirical Search: ATLAS
• Try tiling parameters NB in the range
in steps of 4)1,80min(...16 sizecacheL
Model (Yotov et. al.)
• Compute NB which optimizes the use of the L1 cache. • Constructed by analyzing the memory access trace of the
matrix multiplication code.• Formula:
• Has been extended to optimize the use of the L2 cache
sizelineL
sizeL
sizelineL
NB
sizelineL
NB
NB
1
11
1*3
1such that
max2
≤+⎥⎥
⎤⎢⎢
⎡+⎥⎥
⎤⎢⎢
⎡
Model in action:
• Performance curve: • Vertical lines: model-predicted L1 and L2 blocking factors
• Whether to tile for the L1 or the L2 cache depends on the architecture and the application
Hybrid approach
• Model performance with a family of regression curves
• Regression (nonparam)
– minimizing the average error
• Regression (ML)
– Distribution over regression curves
– Pick the most likely curve
Regression (Bayesian)
• Prior distribution curve) over regression curves– Make regression curves with model-predicted maxima
more likely
• Posterior distribution given the data (Bayes rule):– P(curve|data)=P(data|curve) (curve)/P(data)
• Pick the maximum a-posteriori curve– Picks curves with peaks in model-predicted locations
when the data sample is small– Picks curves which fit the data best when the sample
is large
Active sampling
• Objectives:1) Sample at lower-tile sizes – takes less time
2) Explore – don’t oversample in the same region
3) Get information about the dominant peak
Solution: Potential Fieldsobjectives 1,2
• Positive charge at the origin
• Negative charges at previously sampled points
• Sample at the point which minimizes the field
Potential Fields objective 3
• Positive charge in the region of the dominant peak
• How do we know which peak dominates:– Distribution over regression curves
• can compute:
P(peak1 is located at x), P(peak2 is located at x),
P(peak1 is of height h), P(peak2 is of height h)• Hence, can compute P(peak1 dominates peak2)• Impose a positive charge in the region of each peak
proportional to its probability of domination
⇒
Results I – Regression Curves
Results II – Time, Performance
Model Hybrid ATLAS
Sparc 376.66 851.04 832.63
SGI 499.81 553.15 505.4
Model Hybrid ATLAS
Sparc 0:00 3:12 8:59
SGI 0:00 14:02 59:00
Performance (MFLOPS) Time (mins)
• Sparc – actual improvement due to the hybrid search for NB: ~10%• SGI – improvement over both the model and ATLAS due to choosing to tile for the L2 cache
Results III – Library Performance
Conclusion
• Approach: incorporates the prior.
• Active sampling: actively picks to sample in the most informative region.
• Decreases the search time of the empirical search, improves on the model’s performance.