![Page 1: Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization A. Epshteyn 1, M. Garzaran 1, G. DeJong 1, D. Padua 1, G. Ren 1, X. Li 1,](https://reader036.vdocuments.net/reader036/viewer/2022080916/56649eb35503460f94bbb0c6/html5/thumbnails/1.jpg)
Analytic Models and Empirical Search: A Hybrid Approach to
Code OptimizationA. Epshteyn1, M. Garzaran1, G. DeJong1,
D. Padua1, G. Ren1, X. Li1,
K. Yotov2, K. Pingali2
1 University of Illinois at Urbana-Champaign2 Cornell University
![Page 2: Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization A. Epshteyn 1, M. Garzaran 1, G. DeJong 1, D. Padua 1, G. Ren 1, X. Li 1,](https://reader036.vdocuments.net/reader036/viewer/2022080916/56649eb35503460f94bbb0c6/html5/thumbnails/2.jpg)
Two approaches to code optimization:
• Models– E.g., calculate the
best tile size for MM as a function of cache size.
– Fast– May be inaccurate– No verification
through feedback
• Empirical Search– E.g., execute and
measure different versions of MM code with different tile sizes.
– Slow– Accurate because of
feedback
![Page 3: Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization A. Epshteyn 1, M. Garzaran 1, G. DeJong 1, D. Padua 1, G. Ren 1, X. Li 1,](https://reader036.vdocuments.net/reader036/viewer/2022080916/56649eb35503460f94bbb0c6/html5/thumbnails/3.jpg)
Hybrid Approach
• Faster than empirical search
• More accurate than the model– Use the model as a prior– Use active sampling to minimize the amount
of searching
![Page 4: Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization A. Epshteyn 1, M. Garzaran 1, G. DeJong 1, D. Padua 1, G. Ren 1, X. Li 1,](https://reader036.vdocuments.net/reader036/viewer/2022080916/56649eb35503460f94bbb0c6/html5/thumbnails/4.jpg)
Why is Speed Important?
• Adaptation may have to be applied at runtime, where running time is critical.
• Adaptation may have to be applied at compile time (e.g., with feedback from a fast simulator)
• Library routines can be used as a benchmark to evaluate alternative machine designs.
![Page 5: Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization A. Epshteyn 1, M. Garzaran 1, G. DeJong 1, D. Padua 1, G. Ren 1, X. Li 1,](https://reader036.vdocuments.net/reader036/viewer/2022080916/56649eb35503460f94bbb0c6/html5/thumbnails/5.jpg)
Problem: Matrix Multiplication
• Tiling– Improves the locality of references
• Cache Blocking (NB): Matrix is decomposed into smaller subblocks of size NBxNB
• Matrix multiplication - illustrative example for testing the hybrid approach
• Ultimate goal: a learning compiler that specializes itself to its installation environment, user profile, etc.
![Page 6: Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization A. Epshteyn 1, M. Garzaran 1, G. DeJong 1, D. Padua 1, G. Ren 1, X. Li 1,](https://reader036.vdocuments.net/reader036/viewer/2022080916/56649eb35503460f94bbb0c6/html5/thumbnails/6.jpg)
Empirical Search: ATLAS
• Try tiling parameters NB in the range
in steps of 4)1,80min(...16 sizecacheL
![Page 7: Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization A. Epshteyn 1, M. Garzaran 1, G. DeJong 1, D. Padua 1, G. Ren 1, X. Li 1,](https://reader036.vdocuments.net/reader036/viewer/2022080916/56649eb35503460f94bbb0c6/html5/thumbnails/7.jpg)
Model (Yotov et. al.)
• Compute NB which optimizes the use of the L1 cache. • Constructed by analyzing the memory access trace of the
matrix multiplication code.• Formula:
• Has been extended to optimize the use of the L2 cache
sizelineL
sizeL
sizelineL
NB
sizelineL
NB
NB
1
11
1*3
1such that
max2
≤+⎥⎥
⎤⎢⎢
⎡+⎥⎥
⎤⎢⎢
⎡
![Page 8: Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization A. Epshteyn 1, M. Garzaran 1, G. DeJong 1, D. Padua 1, G. Ren 1, X. Li 1,](https://reader036.vdocuments.net/reader036/viewer/2022080916/56649eb35503460f94bbb0c6/html5/thumbnails/8.jpg)
Model in action:
• Performance curve: • Vertical lines: model-predicted L1 and L2 blocking factors
• Whether to tile for the L1 or the L2 cache depends on the architecture and the application
![Page 9: Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization A. Epshteyn 1, M. Garzaran 1, G. DeJong 1, D. Padua 1, G. Ren 1, X. Li 1,](https://reader036.vdocuments.net/reader036/viewer/2022080916/56649eb35503460f94bbb0c6/html5/thumbnails/9.jpg)
Hybrid approach
• Model performance with a family of regression curves
• Regression (nonparam)
– minimizing the average error
• Regression (ML)
– Distribution over regression curves
– Pick the most likely curve
![Page 10: Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization A. Epshteyn 1, M. Garzaran 1, G. DeJong 1, D. Padua 1, G. Ren 1, X. Li 1,](https://reader036.vdocuments.net/reader036/viewer/2022080916/56649eb35503460f94bbb0c6/html5/thumbnails/10.jpg)
Regression (Bayesian)
• Prior distribution curve) over regression curves– Make regression curves with model-predicted maxima
more likely
• Posterior distribution given the data (Bayes rule):– P(curve|data)=P(data|curve) (curve)/P(data)
• Pick the maximum a-posteriori curve– Picks curves with peaks in model-predicted locations
when the data sample is small– Picks curves which fit the data best when the sample
is large
![Page 11: Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization A. Epshteyn 1, M. Garzaran 1, G. DeJong 1, D. Padua 1, G. Ren 1, X. Li 1,](https://reader036.vdocuments.net/reader036/viewer/2022080916/56649eb35503460f94bbb0c6/html5/thumbnails/11.jpg)
Active sampling
• Objectives:1) Sample at lower-tile sizes – takes less time
2) Explore – don’t oversample in the same region
3) Get information about the dominant peak
![Page 12: Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization A. Epshteyn 1, M. Garzaran 1, G. DeJong 1, D. Padua 1, G. Ren 1, X. Li 1,](https://reader036.vdocuments.net/reader036/viewer/2022080916/56649eb35503460f94bbb0c6/html5/thumbnails/12.jpg)
Solution: Potential Fieldsobjectives 1,2
• Positive charge at the origin
• Negative charges at previously sampled points
• Sample at the point which minimizes the field
![Page 13: Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization A. Epshteyn 1, M. Garzaran 1, G. DeJong 1, D. Padua 1, G. Ren 1, X. Li 1,](https://reader036.vdocuments.net/reader036/viewer/2022080916/56649eb35503460f94bbb0c6/html5/thumbnails/13.jpg)
Potential Fields objective 3
• Positive charge in the region of the dominant peak
• How do we know which peak dominates:– Distribution over regression curves
• can compute:
P(peak1 is located at x), P(peak2 is located at x),
P(peak1 is of height h), P(peak2 is of height h)• Hence, can compute P(peak1 dominates peak2)• Impose a positive charge in the region of each peak
proportional to its probability of domination
⇒
![Page 14: Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization A. Epshteyn 1, M. Garzaran 1, G. DeJong 1, D. Padua 1, G. Ren 1, X. Li 1,](https://reader036.vdocuments.net/reader036/viewer/2022080916/56649eb35503460f94bbb0c6/html5/thumbnails/14.jpg)
Results I – Regression Curves
![Page 15: Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization A. Epshteyn 1, M. Garzaran 1, G. DeJong 1, D. Padua 1, G. Ren 1, X. Li 1,](https://reader036.vdocuments.net/reader036/viewer/2022080916/56649eb35503460f94bbb0c6/html5/thumbnails/15.jpg)
Results II – Time, Performance
Model Hybrid ATLAS
Sparc 376.66 851.04 832.63
SGI 499.81 553.15 505.4
Model Hybrid ATLAS
Sparc 0:00 3:12 8:59
SGI 0:00 14:02 59:00
Performance (MFLOPS) Time (mins)
• Sparc – actual improvement due to the hybrid search for NB: ~10%• SGI – improvement over both the model and ATLAS due to choosing to tile for the L2 cache
![Page 16: Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization A. Epshteyn 1, M. Garzaran 1, G. DeJong 1, D. Padua 1, G. Ren 1, X. Li 1,](https://reader036.vdocuments.net/reader036/viewer/2022080916/56649eb35503460f94bbb0c6/html5/thumbnails/16.jpg)
Results III – Library Performance
![Page 17: Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization A. Epshteyn 1, M. Garzaran 1, G. DeJong 1, D. Padua 1, G. Ren 1, X. Li 1,](https://reader036.vdocuments.net/reader036/viewer/2022080916/56649eb35503460f94bbb0c6/html5/thumbnails/17.jpg)
Conclusion
• Approach: incorporates the prior.
• Active sampling: actively picks to sample in the most informative region.
• Decreases the search time of the empirical search, improves on the model’s performance.