big.little tc2
TRANSCRIPT
-
8/12/2019 Big.littLE TC2
1/28
1
Update on big.LITTLE on TC2
Morten Rasmussen
Technology Researcher
-
8/12/2019 Big.littLE TC2
2/28
2
Agenda
big.LITTLE Software solutions overview
ARM's Test Chip 2 overview Benh!ar"ing Metho#olog$ an# %se Cases
I&S status up#ate
big.LITTLE M status up#ate
-
8/12/2019 Big.littLE TC2
3/28
big.LITTLE o!er!ie"
erfor!ane an# power effiien$ in one s$ste!(
Corte#$A1% !s Corte#$A&'er(ormance
Corte#$A& !s Corte#$A1%Energy E((iciency
)hrystone 1.*# .%#
+)CT 2.# .,#
IM)CT .-# .-#
MemCopy L1 1.*# 2.#
MemCopy L2 1.*# .#
-
8/12/2019 Big.littLE TC2
4/28
I/0 solution asics
In)&ernel Swither *I&S+(
Targete# first generation big.LITTLE pro#uts.
Corte#$A&
Corte#$A1%
/ernel
scheduler I/0
Tas3 1
Tas3 2
Logical C'U 4
-
8/12/2019 Big.littLE TC2
5/28
%
M' solution
Corte#$A&
Corte#$A1%
/ernel
scheduler
Tas3 1
Tas3 2
4
-
8/12/2019 Big.littLE TC2
6/28
,
ARMs Test Chip 2 (TC#2): An Overview
A Versatile Express core tilepublically available:
Capabilities
2 x A15 (r2p1) @ up to 1.2 Ghz
x A! (r"p1) @ up to 1Ghz
CC#$%&C$G#C$A%' (r"p")
%&A (")
2G' exter*al %%+2 ,e,ory@ -""&hz
-/ i*ter*al 0+A&
Coresiht ebu (i*clui* 34AGa* #4& trace but *o 04&)
o G6
cpu7re8 support: #*epe*e*t 7oreach cluster 9ith li,ite voltaescali*
cpuile support: Cluster po9erati*
TC2
-
8/12/2019 Big.littLE TC2
7/28-
Benchmarking Methodoog!
Results
erfor!ane
ower
Configurable(- CCI- ftrae- strea!line
C0V co*7i:- 6se case- 0cheuli* ,oel- u,bers o7 cores to use- 0cali* over*ors
Auto,ate syste, 7or
ru**i* user 9or/loaso* taret evice
Choose 9or/loa
Choose C6 ,oe:
CortexA!; CortexA15; &iratio*(cluster or C6); or &
Choose active cores i* eachcluster4C2: 12 bi; 1 #44E
Choose %V
-
8/12/2019 Big.littLE TC2
8/28,
I/0 solution
Targete# first generation big.LITTLE pro#uts.
Corte#$A&
Corte#$A1%
/ernel
scheduler I/0
Tas3 1
Tas3 2
Logical C'U 4
-
8/12/2019 Big.littLE TC2
9/28C56+I)E6TIAL*
I/07 C'U Migration
big.LITTLE eten#s /01S
/01S algorith! !onitors loa# on eahC%
hen loa# is low it an be han#le# on a
LITTLE proessor
hen loa# is high the ontet is
transferre# to a big proessor
The unuse# proessor an be powere#
#own
hen all proessors in a luster are
inative the luster an# its L2 ahe an
be powere# #own
-
8/12/2019 Big.littLE TC2
10/28C56+I)E6TIAL1-
I/07 C'U Migration
big.LITTLE eten#s /01S
/01S algorith! !onitors loa# on eahC%
hen loa# is low it an be han#le# on a
LITTLE proessor
hen loa# is high the ontet is
transferre# to a big proessor
The unuse# proessor an be powere#
#own
hen all proessors in a luster are
inative the luster an# its L2 ahe an
be powere# #own
-
8/12/2019 Big.littLE TC2
11/28
-
8/12/2019 Big.littLE TC2
12/2812
I/07 Results (or Audio on TC2
ower o!pare# to eeuting the use ase on A56
I&S #oes not use A56s #uring Au#io run
-78 saving
TC27
A1% up to 1.2 9:;A& up to 1 9:;
etter results e#pected on
representati!e silicon.
-
8/12/2019 Big.littLE TC2
13/281
I/07 Results (or ench < Audio on TC2
erfor!ane is !easure# as fro! page loa#ing ti!es ofBBenh
Results nor!alise# to power an# perfor!ane onsu!e# on
sa!e use ase run on A56 onl$
BBenh page 9 Au#io
TC27
A1% up to 1.2 9:;A& up to 1 9:;
etter results e#pected on
representati!e silicon.
-
8/12/2019 Big.littLE TC2
14/281
I/07 5''s on TC2
-
8/12/2019 Big.littLE TC2
15/281%
I/07 Interacti!e go!ernor on TC2
if (cpu_load >= go_hispeed_load){
... new_freq = max_freq * cpu_load / 100
...
!
else {
...
new_freq = hispeed_freq*cpu_load/100
... !
1or A56 on TC2 with a go:highspee# at ;68 *#efault+ this algorith!onl$ uses over#rive setion of A56
Approah is to intro#ue a seon# point of infletion(highspee#2
-
8/12/2019 Big.littLE TC2
16/281=
I/07 :ispeed2
-
8/12/2019 Big.littLE TC2
17/281&
I/07 Results7 bench < Audio
ower i!proves with no perfor!ane ost
BBenh page 9 Au#io
TC27
A1% up to 1.2 9:;A& up to 1 9:;
etter results e#pected on
representati!e silicon.
-
8/12/2019 Big.littLE TC2
18/281,
M' solution
Corte#$A&
Corte#$A1%
/ernel
scheduler
Tas3 1
Tas3 2
4
-
8/12/2019 Big.littLE TC2
19/281*
M' solution more details
She#uler !o#ifiations(
Treat big an# LITTLE pus asseparate she#uling #o!ains.
%se
-
8/12/2019 Big.littLE TC2
20/282-
M'7 E#perimental Implementation
She#uler !o#ifiations(
Appl$ balance load>balance
select>tas3>r?>(air@8
+orced migration
-
8/12/2019 Big.littLE TC2
21/2821
M'7 ARM TC27 Audio
or"loa#( Au#io *!p> pla$ba"+
erfor!ane?Energ$ target( A- energ$
Status(
Au#io relate# tas" #o not use A56s@ but
the power onsu!ption is stillsignifiantl$ !ore than A- alone.
M not as power effiient as I&S $et
To#o(
Target spurious wa"e)ups on A56. Allthe etra power o!es fro! the A56's
whih shoul#n't be use# at all. Energy
A& -.&*B
M' *.,=B
7
57
27
>7
7
67
,7
-7
;7
7
577Au#io
A56
A- 2C%
I&S
M
Energ$
TC27
A1% up to 1.2 9:;
A& up to 1 9:;
etter results e#pected onrepresentati!e silicon.
-
8/12/2019 Big.littLE TC2
22/2822
M'7 Audio "or3load analysis
here is the etra energ$ spent
with M Dee# a loo" at wh$ A56's onsu!e
power when the$ are not neessar$.
A- M
7
7.2
7.
7.,
7.;
5
5.2
5.
5.,Au#io energ$ brea"#own
A56 luster
A- luster
Energ$
hrtimer (unctions cpu- cpu1 cpu2 cpu cpu
hrtimer>"a3eup 2 2 1212 1& 1*-
tic3>sched>timer - %, , %-& &&*
D (unctions cpu- cpu1 cpu2 cpu cpu
!mstat>update - 2 2& 2% 2,
cache>reap 1% 2 1 1 1
phy>state>machine 1 - - - -
Enter idle cpu- cpu1 cpu2 cpu cpu
- = 2 2&* 2=- 2
1 ,-1 ,-& ,1= *& *=%2
TC27
A1% up to 1.2 9:;
A& up to 1 9:;
etter results e#pected on
representati!e silicon.
-
8/12/2019 Big.littLE TC2
23/28
2
0cale in!ariant load
Loa# au!ulation rate #oes not sale with available
o!pute apait$ *fre3uen$@ big?LITTLE pu+ Currentl$@ there is no lin" between pufre3 an# the she#uler
Tas"s !a$ be !igrate# awa$ fro! a pu at low fre3uen$ b$ the
she#uler before pufre3 has inrease# the fre3uen$ to !ath the
pu loa#.
Saling the tra"e# loa# au!ulation to !ath the urrent
fre3uen$ !itigates this issue.
Tas"s annot au!ulate enough loa# at low fre3uen$ to trigger
!igration an# !ust wait for pufre3 to reat first.
+re? # +re? 2#
-
8/12/2019 Big.littLE TC2
24/28
2
0cale in!ariant load
!!?2.1 !!?2.2 !!?2. !!?2.- !!?2.5 !!?2.
"
2""
-""
""
?""
1"""
!2.5 !."5 !.15 !.25 !.5 !.-5
"
2""
-""
""
?""
1"""
5riginal +re?uency in!ariant
-
8/12/2019 Big.littLE TC2
25/28
2%
Load accumulation rate
1or so!e wor"loa#s tra"e# loa# saturates too fast an# lea#s
to unneessar$ tas" !igrations. Eten#ing the tra"e# loa# histor$ re#ues tra"e# loa#
variations #ue to su##en hanges in the loa# harateristis.
Inreasing the $ fator in the loa# epression #ereases the
loa# au!ulation an# #ea$ rates.
load=u0+u1y+u2y
2++uny
n
1024+y+y2
++yn
+1
5 25 5 ,5 ;5 575,
555, 2,
>5>, ,
656, ,,
-5-, ;,
5, 57,
55555,
52552,
5>55>,
555,
565
7
7.5
7.2
7.>
7.
7.6
7.,
7.-
7.;
7.
5$7.-;6
Ti!e F!sG
y
-
8/12/2019 Big.littLE TC2
26/28
2=
Load accumulation rate
Inreasing $ lea#s to a !ore onservative tra"e# loa#
Shoul# lea# to less up?#own !igrations
Inreases up?#own !igrations #ela$ for tas"s that nee#s to be
!igrate#.
5 - 5> 5 26 >5 >- > 66 ,5 ,- -> - ;6 5 - 57 5, 22 2; > 7 , 62 6; , -7 -, ;2 ;; 577
57>57,
57552
55655;
52552
52-5>7
5>>5>,
5>52
565;
56556
56-5,7
5,>5,,
5,5-2
5-65-;
5;55;
5;-57
5>5,
5
Loa# au!ulation rate
Tas"
$7.-;6
$7.;
$7.22
Ti!e F!sG
Tra"e#
loa#
-
8/12/2019 Big.littLE TC2
27/28
2&
M' Top Issues
Spurious wa"eups
A56s are wo"en up b$ she#uler ti"s *!ainl$+
or"3ueues
Ti!ers
RC%
pu wa"eup prioritisation i" the heapest target pu
Hlobal balaning
Sprea# loa# to A-s when A56s are overloa#e#
a" vs. sprea#
Cluster aware pufre3 governors
-
8/12/2019 Big.littLE TC2
28/28
Duestions4