title of presentation second line of title subheadings (if...
TRANSCRIPT
All Rights Reserved © Alcatel-Lucent 2012
Anand, Mobicom ‘12
Anand Muralidhar ([email protected])
Sourjya Bhaumik, Shoban Chandrabose, Kashyap Jataprolu, Gautam Kumar, Paul Polakos, Vikram Srinivasan , Thomas Woo
CloudIQ
All Rights Reserved © Alcatel-Lucent 2012
Anand, Mobicom ‘12
Baseband processing at a Base station
Base station (BTS) consists of Antenna
and baseband unit (BBU)
BBU processes cellular signals
Provisioned to handle peak load
Needs active cooling, maintenance
But typical cells have 30% load…
Can we exploit variations in load to
save energy?
Energy constitutes 25% of total cost
[C-Ran White paper]
BBU
0102030405060708090
100
1 3 5 7 9 11 13 15 17 19 21 23
Lo
ad
(%)
Hour
Cell D: Daily Load Vs Hour
2/22
All Rights Reserved © Alcatel-Lucent 2012
Anand, Mobicom ‘12
Cloud-RAN: Cellular processing in the cloud
Cloud-Radio access network (C-RAN)
centralized baseband processing
Backhaul IQ samples from BTS
Many advantages
OPEX savings – site visits, upgrades, cooling
[up to 50%, C-RAN white paper]
Higher spectral efficiency – Network MIMO
[up to 50%, C-RAN white paper]
Real-time constraints limit fiber length
Fiber
BBU1 BBU2
BBU3 BBU4
Switch
3/22
All Rights Reserved © Alcatel-Lucent 2012
Anand, Mobicom ‘12
[C-RAN white paper]
Transition towards programmable hardware
From ASIC/SoC to general purpose processors (GPP)
GPPs have increased ability to
process baseband signals
Multi-core, SIMD, cache, DVFS
Performance-per-watt is increasing
GPP allows open platforms
Cellular operator not bound to a vendor
Can we replace BBUs with GPPs?
4/22
All Rights Reserved © Alcatel-Lucent 2012
Anand, Mobicom ‘12
Cellular processing over GPPs
Baseband processing on GPPs in data center
Exploit variations in BTS load
Pooling: process multiple BTS on single GPP
Energy savings: switch it off if not used
What are the potential pooling gains?
Analyze real-world data
CloudIQ: A resource management framework
How do we build the system?
Switch
GPP1
BBU
GPP2
GPP3 GPP4
Data center
5/22
All Rights Reserved © Alcatel-Lucent 2012
Anand, Mobicom ‘12
Outline of the talk
LTE profiling results
Analysis of real-world traffic
CloudIQ: A resource management framework
System Design
6/22
All Rights Reserved © Alcatel-Lucent 2012
Anand, Mobicom ‘12
OFDMA: spectrum is divided to physical resource blocks (PRB)
Subframe processed every 1 ms
Send ACK/NACK after 3 ms
PHY layer is compute intensive – FFT/IFFT, Turbo codes
Ack
0
3
A brief primer on LTE
1 ms
Subframe
time
freq
PRB
... ... SF2
SF1
SF0
0 1 2 time
Ack
1
4
Ack
2
5
3 ms
7/22
All Rights Reserved © Alcatel-Lucent 2012
Anand, Mobicom ‘12
Profile processing load in LTE
OpenAir: LTE, 5MHz
Open source, from Eurecom – France
Executes on GPPs
Profile code by varying modulation and coding schemes (MCS)
Load = time to process a subframe
8/22
All Rights Reserved © Alcatel-Lucent 2012
Anand, Mobicom ‘12
Observations after profiling code
Load is (almost) linear → base load + dynamic load
Large variation – offers potential for pooling many BTS
Extend LTE observations to WCDMA
9/22
All Rights Reserved © Alcatel-Lucent 2012
Anand, Mobicom ‘12
Analysis of real-world traffic
WCDMA in dense urban setting
175 base stations, 2 weeks
Downlink logs, aggregated at 15 mins
QAMs used, codes available, total traffic
Derive “distribution” on load in subframe
10/22
All Rights Reserved © Alcatel-Lucent 2012
Anand, Mobicom ‘12
Hard guarantee – every subframe is processed correctly
Cannot guarantee if multiple BTSs processed in one GPP
BTS load can suddenly peak and miss deadlines
“Statistical guarantees” – pick failure prob PF
Calculate load L for BTS
Load exceeds L with probability PF
Acceptable model for cellular systems
PF
L
Guarantees for processing multiple BTS on GPP
11/22
All Rights Reserved © Alcatel-Lucent 2012
Anand, Mobicom ‘12
PF
L
Potential gains from resource pooling
Choose failure prob PF – select load L for each BTS
Compute total load across all BTSs
All signals processed by one computing resource
Assume it can handle peak load
How much is the resource utilized?
22% pooling gains at PF = 10-8
Conservative estimate on total load
Hour
12/22
All Rights Reserved © Alcatel-Lucent 2012
Anand, Mobicom ‘12
CloudIQ: A resource management framework
Set of BTSs to be scheduled
Set of compute resources – multi-core GPPs
Solve two coupled problems:
PART: partition BTSs to sets to be scheduled on GPP
SCHED: real-time schedule to process each set on a GPP
Design real-time system with statistical guarantees
Separation principle: decouple PART and SCHED
Design around a simple cyclic schedule
GPP1 GPP2 GPP3
13/22
All Rights Reserved © Alcatel-Lucent 2012
Anand, Mobicom ‘12
GPP
SF3
SF2
M BTSs need to be scheduled on N-core GPP
BTS processes subframe every 1 ms
Processing load = p ms, ( p > 1 )
Deadline = d ms, ( d > p > 1 )
Most real-time systems consider ( p < d = 1)
Cyclic schedule
Offline schedule, subframe processed in a core
At time t,
SF1
??
Cyclic schedule Core 1
Core 2
Core 3
Core N
Core ((tM + j) mod (N) + 1)
j-th BTS
SF0
0 1 2 3 time
p
d
14/22
All Rights Reserved © Alcatel-Lucent 2012
Anand, Mobicom ‘12
6
5
Example of cyclic schedule
Cores = 4, BTS = 3
Bi(t) = i-th BTS’s job at time t
Each BTS has p = 4/3, d = 2
Same order repeats across cores
Core1 Core4 Core3 Core2
B2(5) B3(5)
B1(5)
2
3
B1(4) B2(4)
B3(4)
4
0
1 B1(0) B2(0) B3(0)
B2(1) B3(1) B1(1)
B1(3) B2(3)
B3(3)
B3(2)
B1(2) B2(2)
B1(6) B2(6)
B3(6)
Time
15/22
All Rights Reserved © Alcatel-Lucent 2012
Anand, Mobicom ‘12
More examples of cyclic schedules
5
Core1 Core4 Core3 Core2
B2(5) B3(5)
B1(5)
2
3
B1(4) B2(4)
B3(4)
4
0
1 B1(0) B2(0) B3(0)
B2(1) B3(1) B1(1)
B1(3) B2(3)
B3(3)
B3(2)
B1(2) B2(2)
4
2
Core1 Core4 Core3 Core2
0
1 B1(0) B2(0)
B1(1) B2(1)
3 B1(2) B2(2)
B1(3) B2(3)
5 B1(4) B2(4)
B1(5) B2(5)
Many configurations possible for different loads
One configuration = one GPP
How many GPPs (or configurations) do we need?
16/22
All Rights Reserved © Alcatel-Lucent 2012
Anand, Mobicom ‘12
Choose PF, compute load for each BTS
How many GPPs do we need?
Also, cyclic schedule allows many configs
Scheduling BTS on GPPs
PF
L
GPP1 GPP2
GPP3
p = 2 p = 4/3 p = 1
17/22
All Rights Reserved © Alcatel-Lucent 2012
Anand, Mobicom ‘12
PART: Construct “super-BTS”
Solve variable size bin-packing
Sets of super BTSs are allocated to GPP
SCHED: schedule the super-BTSs on GPPs using cyclic schedule
Delay guarantees come for free!
Super-BTS creation is conservative
Deadline is missed only if total load is exceeded
Solving PART and SCHED
GPP1 GPP2
18/22
All Rights Reserved © Alcatel-Lucent 2012
Anand, Mobicom ‘12
CloudIQ solution
Pick a PF and compute load for each BTS from its ccdf
Solve PART and SCHED
Variable-size bin packing and cyclic schedule
59 processors for peak load
16% savings at PF = 10-8
PF
L
Hour
19/22
All Rights Reserved © Alcatel-Lucent 2012
Anand, Mobicom ‘12
System design
Hardware: Intel Xeon W3690, 3.47 GHz, 6 cores
OS: Linux 2.6.31 with PREEMPT_RT
Made OpenAir multi-threaded
Need to make it cache conscious for better isolation
Super BTS can take less time to execute than total time
Cache misses are fewer
20/22
All Rights Reserved © Alcatel-Lucent 2012
Anand, Mobicom ‘12
Summary
Analyzed WCDMA traces, pooling gains can exceed 20%
Statistical guarantees in processing signals
Developed CloudIQ framework – achieves gains of up to 16%
Simple cyclic schedule for real-time guarantee
Prototyped system on GPP
Multithreaded implementation of LTE
21/22
All Rights Reserved © Alcatel-Lucent 2012
Anand, Mobicom ‘12
Turbo, FFT/IFFT
Ongoing efforts
Heterogeneous systems
GPP + GPU + FPGA
Algorithms, architecture?
Pooling decisions at smaller time scales
10 ms, 100ms?
What are the savings in energy?
Thanks!
Switch
GPP1
BBU
GPP2
FPGA1 FPGA2
Data center
GPU1 GPU2
22/22