acceleration targets: a study of popular benchmark...

16
Acceleration Targets: A Study of Popular Benchmark Suites Lisa Wu and Martha A. Kim, Department of Computer Science, Columbia University {lisa,martha}@cs.columbia.edu 1 Introduction Using dark silicon [14, 3] to deploy specialized ac- celerators is an idea that is gaining traction in the architecture community [4, 5, 6, 1]. The underlying rationale is that specialized hardware and its atten- dant efficiency is the most effective way to draw per- formance in the anticipated power-limited scenarios. Given the cost associated with designing, verifying, and deploying an accelerator, conventional wisdom dictates that a particular operation becomes an eco- nomical and realistic acceleration target when it is used across a range of applications. In this study, we survey a set of popular benchmark suites, assessing the potential of several acceleration targets within them. In particular, we explore the following three questions: Do the benchmarks exhibit any common func- tionality at or above the function level? What impact does the language or programming environment have on the potential acceleration of a suite of applications? How many unique accelerators would be required to see benefits across a particular benchmark suite? Does this change across suites and source programming languages? 2 Methodology To explore these questions, we profile four bench- mark suites: SPEC2006 (C) [11], SPECJVM (Java) [12], Dacapo (Java) [2], and Unladen-Swallow (Python) [13]. Each source language provides a slightly different set of potential acceleration targets. For example, SPEC2006 is written in C and offers two target granularities: individual functions or en- tire applications. In contrast, a Java benchmark of- fers three granularities: methods, classes (i.e., all of the methods for a particular class), and entire appli- cations. We classify each of these potential targets as fine, medium, or coarse granularity according to Table 1. For each class of acceleration targets, we sort the targets by decreasing execution time across the en- tire benchmark suite. Assuming that building an ac- celerator for a particular target (1) provides infinite speedup of the target, and (2) incurs no data or con- trol transfer overhead upon invocation or return, we compute an upper bound on the speedup of the over- all suite for the most costly target(s). We repeat this Benchmark Granularity Suite fine medium coarse SPEC2006 function application SPECJVM method class package DACAPO method class package UNLADEN-SWALLOW function object Table 1: Acceleration Targets for Each Suite analysis for each target granularity in each bench- mark suite, as outlined in Table 1. 3 Results and Analysis Our results show that popular benchmark suites exhibit minimal functional level commonality. For example, it would take 500 unique, idealized accel- erators to gain a 48X speedup across the SPEC2006 benchmark suite. The C code is simply not mod- ular for acceleration, and few function accelerators can be re-used across a range of applications. For benchmarks written in Java, however, we see more commonality as language level constructs such as classes encapsulate operations for easy re-use. The question remains whether building 20 accelerators for SpecJVM or 50 accelerators for Dacapo is worth the investment for the 10X speedups to be had. In the particular Python benchmark suite we used, we found that the applications made minimal use of the built- ins (e.g., dict or file) resulting in very minimal op- portunity for acceleration beyond the methods them- selves. Our intuition is that this may be an artifact of a computationally-oriented performance benchmark suite, and is likely not reflective of the overall space of Python workloads. 4 Conclusion Our analyses of SPEC2006 confirm what C- cores [14], ECO-cores [10], and DYSER [5] also found: that when accelerating unstructured C code, the best targets are large swaths of highly-application-specific code. Our Java analyses indicate some hope for com- mon acceleration targets in classes, though the ad- vantage of targeting classes over individual methods appears modest. Across the board, our data show that filling dark silicon with specialized accelerators will require systems containing tens or even hundreds of accelerators. In light of this, we believe the infras- tructure associated with these accelerators (e.g., net- works, memory models [7, 9, 8], and toolchains[14]) will only increase in importance. 1

Upload: nguyenphuc

Post on 16-Feb-2018

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Acceleration Targets: A Study of Popular Benchmark Suitesarcade.cs.columbia.edu/accels-dasi12-slides.pdf · Acceleration Targets: A Study of Popular Benchmark Suites ... N. Goulding-Hotta

Acceleration Targets: A Study of Popular Benchmark SuitesLisa Wu and Martha A. Kim,

Department of Computer Science, Columbia University

{lisa,martha}@cs.columbia.edu

1 IntroductionUsing dark silicon [14, 3] to deploy specialized ac-

celerators is an idea that is gaining traction in thearchitecture community [4, 5, 6, 1]. The underlyingrationale is that specialized hardware and its atten-dant efficiency is the most effective way to draw per-formance in the anticipated power-limited scenarios.Given the cost associated with designing, verifying,and deploying an accelerator, conventional wisdomdictates that a particular operation becomes an eco-nomical and realistic acceleration target when it isused across a range of applications.In this study, we survey a set of popular benchmark

suites, assessing the potential of several accelerationtargets within them. In particular, we explore thefollowing three questions:

• Do the benchmarks exhibit any common func-tionality at or above the function level?

• What impact does the language or programmingenvironment have on the potential accelerationof a suite of applications?

• How many unique accelerators would be requiredto see benefits across a particular benchmarksuite? Does this change across suites and sourceprogramming languages?

2 MethodologyTo explore these questions, we profile four bench-

mark suites: SPEC2006 (C) [11], SPECJVM(Java) [12], Dacapo (Java) [2], and Unladen-Swallow(Python) [13]. Each source language provides aslightly different set of potential acceleration targets.For example, SPEC2006 is written in C and offerstwo target granularities: individual functions or en-tire applications. In contrast, a Java benchmark of-fers three granularities: methods, classes (i.e., all ofthe methods for a particular class), and entire appli-cations. We classify each of these potential targetsas fine, medium, or coarse granularity according toTable 1.For each class of acceleration targets, we sort the

targets by decreasing execution time across the en-tire benchmark suite. Assuming that building an ac-celerator for a particular target (1) provides infinitespeedup of the target, and (2) incurs no data or con-trol transfer overhead upon invocation or return, wecompute an upper bound on the speedup of the over-all suite for the most costly target(s). We repeat this

Benchmark GranularitySuite fine medium coarse

SPEC2006 function – application

SPECJVM method class package

DACAPO method class package

UNLADEN-SWALLOW function – object

Table 1: Acceleration Targets for Each Suite

analysis for each target granularity in each bench-mark suite, as outlined in Table 1.

3 Results and AnalysisOur results show that popular benchmark suites

exhibit minimal functional level commonality. Forexample, it would take 500 unique, idealized accel-erators to gain a 48X speedup across the SPEC2006benchmark suite. The C code is simply not mod-ular for acceleration, and few function acceleratorscan be re-used across a range of applications. Forbenchmarks written in Java, however, we see morecommonality as language level constructs such asclasses encapsulate operations for easy re-use. Thequestion remains whether building 20 accelerators forSpecJVM or 50 accelerators for Dacapo is worth theinvestment for the 10X speedups to be had. In theparticular Python benchmark suite we used, we foundthat the applications made minimal use of the built-ins (e.g., dict or file) resulting in very minimal op-portunity for acceleration beyond the methods them-selves. Our intuition is that this may be an artifact ofa computationally-oriented performance benchmarksuite, and is likely not reflective of the overall spaceof Python workloads.

4 ConclusionOur analyses of SPEC2006 confirm what C-

cores [14], ECO-cores [10], and DYSER [5] also found:that when accelerating unstructured C code, the besttargets are large swaths of highly-application-specificcode. Our Java analyses indicate some hope for com-mon acceleration targets in classes, though the ad-vantage of targeting classes over individual methodsappears modest. Across the board, our data showthat filling dark silicon with specialized acceleratorswill require systems containing tens or even hundredsof accelerators. In light of this, we believe the infras-tructure associated with these accelerators (e.g., net-works, memory models [7, 9, 8], and toolchains[14])will only increase in importance.

1

Page 2: Acceleration Targets: A Study of Popular Benchmark Suitesarcade.cs.columbia.edu/accels-dasi12-slides.pdf · Acceleration Targets: A Study of Popular Benchmark Suites ... N. Goulding-Hotta

Figure 1: Max speedup of benchmark suite for {fine, medium, and coarse}-granular acceleration targets.

References[1] C. Cascaval et al. A taxonomy of accelerator ar-

chitectures and their programming models. IBMJournal of Research and Development, 54(5):1–10, 2010.

[2] The Dacapo Benchmark Suite. http://dacapobench.org/.

[3] H. Esmaeilzadeh et al. Dark silicon and the endof multicore scaling. In ISCA, pages 365–376,2011.

[4] N. Goulding-Hotta et al. GreenDroid : A mobileapplication processor for a future of dark silicon.IEEE Micro, 31(2):86–95, 2011.

[5] V. Govindaraju et al. Dynamically specializeddatapaths for energy efficient computing. InHPCA, pages 503–514, 2011.

[6] R. Hameed et al. Understanding sources of inef-ficiency in general-purpose chips. In ISCA, pages37–47, June 2010.

[7] J. Kelm et al. Cohesion: a hybrid memory modelfor accelerators. In ISCA, pages 429–440, June2010.

[8] M. Lyons et al. The accelerator store framework

for high-performance, low-power accelerator-based systems. IEEE Computer ArchitectureLetters, 9(2):53–56, Feb. 2010.

[9] B. Saha et al. Programming model for a hetero-geneous x86 platform. In Proceedings of the 2009ACM SIGPLAN Conference on ProgrammingLanguage Design and Implementation (PLDI).ACM, 2009.

[10] J. Sampson et al. Efficient complex operatorsfor irregular codes. In Proceedings of the 17thInternational Symposium on High PerformanceComputer Architeture (HPCA), pages 491–502.ACM, Feb 2011.

[11] Standard Performance Evaluation Corporation.http://www.spec.org/cpu2006/.

[12] Standard Performance Evaluation Corporation.http://www.spec.org/jvm2008/.

[13] Unladen Swallow Benchmarks. http://code.google.com/p/unladen-swallow/wiki/Benchmarks.

[14] G. Venkatesh et al. Conservation cores: reducingthe energy of mature computations. In ASPLOS,pages 205–218, Mar. 2010.

2

Page 3: Acceleration Targets: A Study of Popular Benchmark Suitesarcade.cs.columbia.edu/accels-dasi12-slides.pdf · Acceleration Targets: A Study of Popular Benchmark Suites ... N. Goulding-Hotta

Columbia UniversityJune 10, 2012

Acceleration Targets:A Study of Popular Benchmark SuitesLisa Wu and Martha A. Kim

Tuesday, June 19, 2012

Page 4: Acceleration Targets: A Study of Popular Benchmark Suitesarcade.cs.columbia.edu/accels-dasi12-slides.pdf · Acceleration Targets: A Study of Popular Benchmark Suites ... N. Goulding-Hotta

Columbia University

But what should we accelerate?

The story starts like this:

Princess Ruruna and her helper Cain have a problem: To face dark silicon head on, they want to find applications that have acceleration potential. But what can they do to tackle the problem?

Let’s see what Tico the fairy has to say...

2

Tuesday, June 19, 2012

Page 5: Acceleration Targets: A Study of Popular Benchmark Suitesarcade.cs.columbia.edu/accels-dasi12-slides.pdf · Acceleration Targets: A Study of Popular Benchmark Suites ... N. Goulding-Hotta

Columbia University

Let’s start by looking at some popular benchmark suites

!Do the benchmarks exhibit any common functionality?

!If so, is it at or above the function level?

1

3

Tuesday, June 19, 2012

Page 6: Acceleration Targets: A Study of Popular Benchmark Suitesarcade.cs.columbia.edu/accels-dasi12-slides.pdf · Acceleration Targets: A Study of Popular Benchmark Suites ... N. Goulding-Hotta

Columbia University

I will profile SPEC2006 and see if I can answer this question...

If the hottest function runs lightening fast, how much faster would the suite be?

!"

!#"

!##"

!###"

!####"

#" $##" !###" !$##" %###" %$##"

&'(")*+

+,-*

"./")-01+"

2304-+"566+7+8'1.89"

):;<%##="

/-36>.3"

4

!"

!#"

!##"

!###"

#" !##" $##" %##" &##" '##" (##" )##" *##" +##" !###"

,-."/01

1230

"45"/3671"

896:31";<<1=1>-74>?"

/@AB$##("C44D12"E9"

539<F49"

To get a 10X speedup, we need to accelerate over 189

unique functions!

Tuesday, June 19, 2012

Page 7: Acceleration Targets: A Study of Popular Benchmark Suitesarcade.cs.columbia.edu/accels-dasi12-slides.pdf · Acceleration Targets: A Study of Popular Benchmark Suites ... N. Goulding-Hotta

Columbia University

Hmm...

What if we accelerated a bigger

target?

!"

!#"

!##"

!###"

!####"

#" $##" !###" !$##" %###" %$##"

&'(")*+

+,-*

"./")-01+"

2304-+"566+7+8'1.89"

):;<%##="

/-36>.3"

'**706'>.3"

!"

!#"

!##"

!###"

#" $%" %#" &%" !##" !$%" !%#" !&%" $##"

'()"*+,

,-.+

"/0"*.12,"

3415.,"677,8,9(2/9:"

*;<=$##>"?//@,-"A4"

0.47B/4"

(++817(B/4"

Good! It only takes 21!

Oh wait...we need to accelerate 21 different applications for a 12X

speedup?!21 different applications

5

Tuesday, June 19, 2012

Page 8: Acceleration Targets: A Study of Popular Benchmark Suitesarcade.cs.columbia.edu/accels-dasi12-slides.pdf · Acceleration Targets: A Study of Popular Benchmark Suites ... N. Goulding-Hotta

Columbia University

2!What about benchmark suites that are not written in C?

!What impact does the language or programming environment have on acceleration potential?

It seems that SPEC2006 cannot be accelerated easily...

How about other benchmark suites?

6

Tuesday, June 19, 2012

Page 9: Acceleration Targets: A Study of Popular Benchmark Suitesarcade.cs.columbia.edu/accels-dasi12-slides.pdf · Acceleration Targets: A Study of Popular Benchmark Suites ... N. Goulding-Hotta

Columbia University

fine

medium

coarse

Each source language provides a

slightly different set of potential acceleration targets.

Using dark silicon [14, 3] to deploy specialized ac-celerators is an idea that is gaining traction in thearchitecture community [4, 5, 6, 1]. The underlyingrationale is that specialized hardware and its atten-

ective way to draw per-formance in the anticipated power-limited scenarios.

Benchmark GranularitySuite fine medium coarse

SPEC2006 function – application

SPECJVM method class package

DACAPO method class package

UNLADEN-SWALLOW function – object

Table 1: Acceleration Targets for Each Suite

7

Tuesday, June 19, 2012

Page 10: Acceleration Targets: A Study of Popular Benchmark Suitesarcade.cs.columbia.edu/accels-dasi12-slides.pdf · Acceleration Targets: A Study of Popular Benchmark Suites ... N. Goulding-Hotta

Columbia University

!"

!#"

!##"

!###"

!####"

#" $#" !##" !$#" %##"

&'(")*+

+,-*

"./")-01+"

2304-+"566+7+8'1.89"

:5;5<="

>+1?.,"

67'99"

*'6@'A+"

org.apache.axis.transport.http.HTTPSender.readHeadersFromSocket()org.apache.axis.transport.http.HTTPSenderorg.apache.axis.transport.httpJava? Go for

it!

Okay...it takes78 methods,59 classes, or33 packages

to get a 10X speedup on Dacapo suite.

8

Tuesday, June 19, 2012

Page 11: Acceleration Targets: A Study of Popular Benchmark Suitesarcade.cs.columbia.edu/accels-dasi12-slides.pdf · Acceleration Targets: A Study of Popular Benchmark Suites ... N. Goulding-Hotta

Columbia University

!"

!#"

!##"

!###"

!####"

#" $#" %#" &#" '#" !##"

()*"+,-

-./,

"01"+/23-"

4526/-"788-9-:)30:;"

+,-8<=("

>-3?0."

89);;"

,)8@)A-"

Cain, would you help me with other Java benchmark

suites?

Of course!

How about SpecJVM?

9

23 methods

18 classesor 14 packages. SpecJVM is better than

Dacapo!

Tuesday, June 19, 2012

Page 12: Acceleration Targets: A Study of Popular Benchmark Suitesarcade.cs.columbia.edu/accels-dasi12-slides.pdf · Acceleration Targets: A Study of Popular Benchmark Suites ... N. Goulding-Hotta

Columbia University10

What have we concluded from our study today?

! Unstructured C code can only be accelerated in swaths of highly application-specific codes.

! Java has potential to use classes as targets. Is accelerating fifty unique classes worth a 10X performance gain?

! Filling dark silicon will require tens to hundreds of specialized accelerators.

C-cores

Eco-coresand DySER had the

same conclusion

Tuesday, June 19, 2012

Page 13: Acceleration Targets: A Study of Popular Benchmark Suitesarcade.cs.columbia.edu/accels-dasi12-slides.pdf · Acceleration Targets: A Study of Popular Benchmark Suites ... N. Goulding-Hotta

Columbia University11

! What are the benchmarks we should use to evaluate potential accelerators?

! The infrastructure associated with accelerators are increasingly important!

! What happens when we factor in actual costs?

We have some open questions...

Looks like we have a lot

more research to

do!

Tuesday, June 19, 2012

Page 14: Acceleration Targets: A Study of Popular Benchmark Suitesarcade.cs.columbia.edu/accels-dasi12-slides.pdf · Acceleration Targets: A Study of Popular Benchmark Suites ... N. Goulding-Hotta

Columbia University12

Good work today! Hope you learned something about

accelerating targets!

Questions?

Yeah, it was quite a day. Thank you

Tico!

Thanks!

Tuesday, June 19, 2012

Page 15: Acceleration Targets: A Study of Popular Benchmark Suitesarcade.cs.columbia.edu/accels-dasi12-slides.pdf · Acceleration Targets: A Study of Popular Benchmark Suites ... N. Goulding-Hotta

Columbia University13

fin

Tuesday, June 19, 2012

Page 16: Acceleration Targets: A Study of Popular Benchmark Suitesarcade.cs.columbia.edu/accels-dasi12-slides.pdf · Acceleration Targets: A Study of Popular Benchmark Suites ... N. Goulding-Hotta

Columbia University14

!"

!#"

!##"

!###"

!####"

#" $#" !##" !$#" %##" %$#"

&'(")*+

+,-*

"./")-01+"

2304-+"566+7+8'1.89"

237',+3:);'77.;"

<+1=.,"

67'99"

Tuesday, June 19, 2012