keynote (dr chien-ping lu) - how many cores will we need? - by dr chien-ping lu, sr director –...
DESCRIPTION
Keynote presentation, How Many Cores Will We Need?, by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc., at the AMD Developer Summit (APU13), Nov. 11-13, 2013.TRANSCRIPT
HOW MANY CORES WILL WE NEED? IN SEARCH OF PARALLEL KILLER APPS
CHIEN-PING LU, PHD MEDIATEK INC
| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 2
A GROUP OF HIPPOS IS CALLED …
A Crash
| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 3
A GROUP OF CROWS IS CALLED …
A Murder
| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 4
A GROUP OF GIRAFFES IS CALLED …
A Tower
From Wikipedia
| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 5
SO, IT IS NOT SURPRISING THAT WE USE
“A Parade” of elephants “An Army” of ants “A Herd” of sheep
| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 6
FROM FREQUENCY TO MULTICORE SCALING
pe
rform
ance
Time Power wall: 2005
Multi-core Single-core
Po
we
r
Po
we
r
Freq
ue
ncy
| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 7
IT SEEMS INEVITABLE THAT WE WILL NEED A MASSIVE NUMBER OF CORES
pe
rform
ance
Time
Moderate Massive
| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 8
pe
rform
ance
Time
2x
4x 3x
8x 4x 16x 4x
DARK SILICON (OR DARK CORES)?
| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 9
HOW TO LIGHT UP THE CORES?
po
we
r
Degree of Parallelism
Power ceiling
SIMT “cores”
Parallelism wall
Little cores
Big cores
Redefine the cores to be heterogeneous
Search for parallel killer apps
H.264 encoding Ray tracing
| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 10
Fron
t End
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
Fron
t End
ALU
ALU
ALU
A
LU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
Fron
t End
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ARMY OF ANTS: SIMT CORES FOR SIMT (SINGLE-INSTRUCTION-MULTIPLE-THREAD ) EXECUTION
A branch is emulated thru divergence
SIMT is the execution model of HSA and implemented in modern GPUs, with MIMD flexibility and SIMD efficiency
A cluster of SIMT cores shares one front end in a SIMD manner
Parallel.For (…)
If (…) then
…
…
…
… Else
A SIMT core runs 1 iteration of the parallel loop
SPE
SPE
Specialized Processing Engines A
LU
ALU
ALU
ALU
ALU
Wider SIMT
| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 11
MASSIVELY PARALLEL WORKLOADS
• Problem size N can keep growing
• Visible serial workload s can be kept constant
• Parallel workload is speeded up by P, the number of cores
• Reduction overhead is proportional to log P (by a factor of r)
• "Embarrassingly" parallel, when there is no reduction overhead (r=0)
N/P r log P
N
s
s
Time saved by P cores
| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 12
1
10
100
1 2 4 8
16
32
64
12
8
25
6
51
2
10
24
20
48
40
96
81
92
Spe
ed
up
Degree of Parallelism (P)
s1=50%, r=50%
N=16
N=64
N=256
1
10
100
1 2 4 8
16
32
64
12
8
25
6
51
2
10
24
20
48
40
96
81
92
Spe
ed
up
Degree of Parallelism (P)
s1=50%, r=50%
N=16
N=64
N=256
P=N1
10
100
1000
10000
1 2 4 8
16
32
64
12
8
25
6
51
2
10
24
20
48
40
96
81
92
Spe
ed
up
Degree of Parallelism (P)
s=50%, r=50%
N=16
N=64
N=256
P=N
REVISITING AMDAHL'S LAW
1log
Prs
PsSpeedup
PNPrs
NsSpeedup
/log
| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 13
GRAPHICS KEEP MOVING
Pac-man, 1980
GL benchmark 2.1 Egypt
GL benchmark 2.5 Egypt
GFX bench 2.7 T-Rex
GFX bench 3.0 Manhattan
Mobile 3D Graphics
Highest grossing video game of all-time Recognized by 94% of American Consumers
| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 14
MEDIATEK FACE BEAUTIFICATION WHEN IT COMES TO BEAUTY, THERE SEEMS TO BE NO LIMIT
Before Skin tone adjustment Wrinkle removal Thinner face, bigger eyes
| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 15
HPC from 1993 to 2012
‒GFLOPS ~ 130,000x
‒Cores ~ 11,000x
‒GHz ~ 10x
HIGH-PERFORMANCE COMPUTING (HPC) KEEPS SCALING OUT
Higher grid resolution
More time steps
More atoms
0
1
10
100
1,000
10,000
100,000
1,000,000
1990 1995 2000 2005 2010 2015
Re
lati
ve t
o 1
99
3
Top of Top500 1993-2012
GFLOPS
Cores
GHz
| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 16
Higher frequency
THE MISSING LINKS
Moore’s law
Bigger problems
Bigger data Better user experience
More cores
IN SEARCH OF PARALLEL KILLER APPS
More complex software
What bigger problems to solve with bigger data?
How solving bigger problems leads to better user experience?
Mining bigger data with Machine Learning
| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 17
MACHINE LEARNING: TREND PREDICTION WITH POWERFUL MODELS
Powerful models (with many knobs) tend to over-fit the noise if the data set is not sufficiently large
The explosive growth of data has made powerful models feasible
A model with 1 billion knobs, trained with 10 million images from YouTube was used in Google Brain experiment to figure out the concepts of cats and human faces by itself
-50
0
50
100
150
200
250
300
350
0 2 4 6
Samples Data
Linear Poly. (2nd order)
Poly. (6th order)
6th-order polynomial undulates excessively with only 4 samples
Source: Le et al., Building High-level Features Using Large Scale Unsupervised Learning
| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 18
HOW TO DISTINGUISH CATS FROM DOGS?
ASIRRA Animal Species Image Recognition for Restricting Access (from Microsoft Research)
| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 19
CAN ASIRRA BE CRACKED?
| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 20
WHY IS IT HARD?
Source: training set of Kaggle.com Dogs vs. Cats competition
| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 21
IS THERE A MODEL FINDING OUT THAT THESE ARE THE SAME DOG?
Prancer, a 5-years-old toy poodle, before and after grooming
| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 22
MINE THE SOLUTIONS FROM THE DATA
Do
g-Cat
classifier
Theory of the differences between dogs and cats?
Learn from many (12,500) photos labeled as dogs or cats
Machine Learning
| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 23
Smarter Client Client
Sensing Better Sensing
Connectivity Better
Connectivity Cloud
Answer Powerful
Model Machine Learning
Better Answer
Bigger Machine Learning
Bigger Model
Big Data Bigger Data
SMART AND SMARTER CLIENTS IN THE ERA OF BIG DATA
Big Training Set
Input data
Bigger Training Set
In the cloud or the clients
Local Machine Learning
| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 24
PARALLEL COMPUTING IN THE CLOUD AND AT THE CLIENTS
),( nn yx
ia
x y
Knobs
Samples
x iaModel
f
Machine Learning
Tweak to minimize the error between
nyand
ia
nx iaModel
f
dog/cat photos dog or cat
Sensor readings jogging, walking or driving
Cloud Parallel Computing with more samples
Examples:
Client Parallel Computing with more knobs
| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 25
Machine learning happens in the cloud and at the clients
Models run in the cloud or at the clients
Need same ease of programming and write-once-run-everywhere for heterogeneous cores
WHY HSA?
Mediatek is one of the cofounders of HSA Foundation
MediaTek is the first to introduce in mobile SoC
True Octa-Core
Heterogeneous Multiprocessing (HMP)
| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 26
• Carbon footprint of US datacenters is at the same level as the airline industry
• A 1,000m2 datacenter consumes 1.5MW, enough to power 1,000 US homes per year
In order to scale out, we need to scale in with heterogeneous cores in the cloud and in our palms
Typical 1,000 homes in US
SCALE OUT AND SCALE IN WITH HETEROGENEOUS CORES
• Both the cloud and mobile clients are limited by power
• Mobile devices need to keep cool in our palms
• Data centers need to keep our environment clean
BACKUP
| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 28
THE NEW VIRTUOUS CYCLE
Moore’s law and beyond
Bigger data Better user experience
More heterogeneous cores
Mining bigger data with Machine Learning
PERHAPS, LEADING TO COMPUTING LIKE OUR BRAIN
| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 29
MASSIVELY PARALLEL WORKLOADS
• Can keep growing the problem size N
• The serial workload s can be kept constant
• The parallel workload is speeded up by P, the number of cores
• The reduction overhead is proportional to log P (by a factor of r)
• "Embarrassingly" parallel, when there is no reduction overhead (r=0)
N/P r log P
N
s
s
Time saved by P cores
| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 30
Fron
t End
Fron
t End
Fron
t End
Fron
t End
Fron
t End
Fron
t End
ALU
ALU
ALU
ALU
ALU
ALU
THE ELEPHANTS: CPU CORES FOR MULTIPLE-INSTRUCTION-MULTIPLE-DATA (MIMD) EXECUTION
A CPU core runs 1 iteration of the parallel loop
The same color means the same piece of code
Fron
t End
Fron
t End
Fron
t End
Fron
t End
Fron
t End
Fron
t End
ALU
ALU
ALU
ALU
ALU
ALU
Retrofitted for moderately parallel workloads, and not very efficient for massively parallel workloads Parallel.For (i)
If (…)
…
…
…
… Else
| HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 31
DISCLAIMER & ATTRIBUTION
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
ATTRIBUTION
© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). Other names are for informational purposes only and may be trademarks of their respective owners.