title of presentation second line of title subheadings (if...

22
All Rights Reserved © Alcatel-Lucent 2012 Anand, Mobicom ‘12 Anand Muralidhar ([email protected] ) Sourjya Bhaumik, Shoban Chandrabose, Kashyap Jataprolu, Gautam Kumar, Paul Polakos, Vikram Srinivasan , Thomas Woo CloudIQ

Upload: dodat

Post on 08-Mar-2018

215 views

Category:

Documents


2 download

TRANSCRIPT

All Rights Reserved © Alcatel-Lucent 2012

Anand, Mobicom ‘12

Anand Muralidhar ([email protected])

Sourjya Bhaumik, Shoban Chandrabose, Kashyap Jataprolu, Gautam Kumar, Paul Polakos, Vikram Srinivasan , Thomas Woo

CloudIQ

All Rights Reserved © Alcatel-Lucent 2012

Anand, Mobicom ‘12

Baseband processing at a Base station

Base station (BTS) consists of Antenna

and baseband unit (BBU)

BBU processes cellular signals

Provisioned to handle peak load

Needs active cooling, maintenance

But typical cells have 30% load…

Can we exploit variations in load to

save energy?

Energy constitutes 25% of total cost

[C-Ran White paper]

BBU

0102030405060708090

100

1 3 5 7 9 11 13 15 17 19 21 23

Lo

ad

(%)

Hour

Cell D: Daily Load Vs Hour

2/22

All Rights Reserved © Alcatel-Lucent 2012

Anand, Mobicom ‘12

Cloud-RAN: Cellular processing in the cloud

Cloud-Radio access network (C-RAN)

centralized baseband processing

Backhaul IQ samples from BTS

Many advantages

OPEX savings – site visits, upgrades, cooling

[up to 50%, C-RAN white paper]

Higher spectral efficiency – Network MIMO

[up to 50%, C-RAN white paper]

Real-time constraints limit fiber length

Fiber

BBU1 BBU2

BBU3 BBU4

Switch

3/22

All Rights Reserved © Alcatel-Lucent 2012

Anand, Mobicom ‘12

[C-RAN white paper]

Transition towards programmable hardware

From ASIC/SoC to general purpose processors (GPP)

GPPs have increased ability to

process baseband signals

Multi-core, SIMD, cache, DVFS

Performance-per-watt is increasing

GPP allows open platforms

Cellular operator not bound to a vendor

Can we replace BBUs with GPPs?

4/22

All Rights Reserved © Alcatel-Lucent 2012

Anand, Mobicom ‘12

Cellular processing over GPPs

Baseband processing on GPPs in data center

Exploit variations in BTS load

Pooling: process multiple BTS on single GPP

Energy savings: switch it off if not used

What are the potential pooling gains?

Analyze real-world data

CloudIQ: A resource management framework

How do we build the system?

Switch

GPP1

BBU

GPP2

GPP3 GPP4

Data center

5/22

All Rights Reserved © Alcatel-Lucent 2012

Anand, Mobicom ‘12

Outline of the talk

LTE profiling results

Analysis of real-world traffic

CloudIQ: A resource management framework

System Design

6/22

All Rights Reserved © Alcatel-Lucent 2012

Anand, Mobicom ‘12

OFDMA: spectrum is divided to physical resource blocks (PRB)

Subframe processed every 1 ms

Send ACK/NACK after 3 ms

PHY layer is compute intensive – FFT/IFFT, Turbo codes

Ack

0

3

A brief primer on LTE

1 ms

Subframe

time

freq

PRB

... ... SF2

SF1

SF0

0 1 2 time

Ack

1

4

Ack

2

5

3 ms

7/22

All Rights Reserved © Alcatel-Lucent 2012

Anand, Mobicom ‘12

Profile processing load in LTE

OpenAir: LTE, 5MHz

Open source, from Eurecom – France

Executes on GPPs

Profile code by varying modulation and coding schemes (MCS)

Load = time to process a subframe

8/22

All Rights Reserved © Alcatel-Lucent 2012

Anand, Mobicom ‘12

Observations after profiling code

Load is (almost) linear → base load + dynamic load

Large variation – offers potential for pooling many BTS

Extend LTE observations to WCDMA

9/22

All Rights Reserved © Alcatel-Lucent 2012

Anand, Mobicom ‘12

Analysis of real-world traffic

WCDMA in dense urban setting

175 base stations, 2 weeks

Downlink logs, aggregated at 15 mins

QAMs used, codes available, total traffic

Derive “distribution” on load in subframe

10/22

All Rights Reserved © Alcatel-Lucent 2012

Anand, Mobicom ‘12

Hard guarantee – every subframe is processed correctly

Cannot guarantee if multiple BTSs processed in one GPP

BTS load can suddenly peak and miss deadlines

“Statistical guarantees” – pick failure prob PF

Calculate load L for BTS

Load exceeds L with probability PF

Acceptable model for cellular systems

PF

L

Guarantees for processing multiple BTS on GPP

11/22

All Rights Reserved © Alcatel-Lucent 2012

Anand, Mobicom ‘12

PF

L

Potential gains from resource pooling

Choose failure prob PF – select load L for each BTS

Compute total load across all BTSs

All signals processed by one computing resource

Assume it can handle peak load

How much is the resource utilized?

22% pooling gains at PF = 10-8

Conservative estimate on total load

Hour

12/22

All Rights Reserved © Alcatel-Lucent 2012

Anand, Mobicom ‘12

CloudIQ: A resource management framework

Set of BTSs to be scheduled

Set of compute resources – multi-core GPPs

Solve two coupled problems:

PART: partition BTSs to sets to be scheduled on GPP

SCHED: real-time schedule to process each set on a GPP

Design real-time system with statistical guarantees

Separation principle: decouple PART and SCHED

Design around a simple cyclic schedule

GPP1 GPP2 GPP3

13/22

All Rights Reserved © Alcatel-Lucent 2012

Anand, Mobicom ‘12

GPP

SF3

SF2

M BTSs need to be scheduled on N-core GPP

BTS processes subframe every 1 ms

Processing load = p ms, ( p > 1 )

Deadline = d ms, ( d > p > 1 )

Most real-time systems consider ( p < d = 1)

Cyclic schedule

Offline schedule, subframe processed in a core

At time t,

SF1

??

Cyclic schedule Core 1

Core 2

Core 3

Core N

Core ((tM + j) mod (N) + 1)

j-th BTS

SF0

0 1 2 3 time

p

d

14/22

All Rights Reserved © Alcatel-Lucent 2012

Anand, Mobicom ‘12

6

5

Example of cyclic schedule

Cores = 4, BTS = 3

Bi(t) = i-th BTS’s job at time t

Each BTS has p = 4/3, d = 2

Same order repeats across cores

Core1 Core4 Core3 Core2

B2(5) B3(5)

B1(5)

2

3

B1(4) B2(4)

B3(4)

4

0

1 B1(0) B2(0) B3(0)

B2(1) B3(1) B1(1)

B1(3) B2(3)

B3(3)

B3(2)

B1(2) B2(2)

B1(6) B2(6)

B3(6)

Time

15/22

All Rights Reserved © Alcatel-Lucent 2012

Anand, Mobicom ‘12

More examples of cyclic schedules

5

Core1 Core4 Core3 Core2

B2(5) B3(5)

B1(5)

2

3

B1(4) B2(4)

B3(4)

4

0

1 B1(0) B2(0) B3(0)

B2(1) B3(1) B1(1)

B1(3) B2(3)

B3(3)

B3(2)

B1(2) B2(2)

4

2

Core1 Core4 Core3 Core2

0

1 B1(0) B2(0)

B1(1) B2(1)

3 B1(2) B2(2)

B1(3) B2(3)

5 B1(4) B2(4)

B1(5) B2(5)

Many configurations possible for different loads

One configuration = one GPP

How many GPPs (or configurations) do we need?

16/22

All Rights Reserved © Alcatel-Lucent 2012

Anand, Mobicom ‘12

Choose PF, compute load for each BTS

How many GPPs do we need?

Also, cyclic schedule allows many configs

Scheduling BTS on GPPs

PF

L

GPP1 GPP2

GPP3

p = 2 p = 4/3 p = 1

17/22

All Rights Reserved © Alcatel-Lucent 2012

Anand, Mobicom ‘12

PART: Construct “super-BTS”

Solve variable size bin-packing

Sets of super BTSs are allocated to GPP

SCHED: schedule the super-BTSs on GPPs using cyclic schedule

Delay guarantees come for free!

Super-BTS creation is conservative

Deadline is missed only if total load is exceeded

Solving PART and SCHED

GPP1 GPP2

18/22

All Rights Reserved © Alcatel-Lucent 2012

Anand, Mobicom ‘12

CloudIQ solution

Pick a PF and compute load for each BTS from its ccdf

Solve PART and SCHED

Variable-size bin packing and cyclic schedule

59 processors for peak load

16% savings at PF = 10-8

PF

L

Hour

19/22

All Rights Reserved © Alcatel-Lucent 2012

Anand, Mobicom ‘12

System design

Hardware: Intel Xeon W3690, 3.47 GHz, 6 cores

OS: Linux 2.6.31 with PREEMPT_RT

Made OpenAir multi-threaded

Need to make it cache conscious for better isolation

Super BTS can take less time to execute than total time

Cache misses are fewer

20/22

All Rights Reserved © Alcatel-Lucent 2012

Anand, Mobicom ‘12

Summary

Analyzed WCDMA traces, pooling gains can exceed 20%

Statistical guarantees in processing signals

Developed CloudIQ framework – achieves gains of up to 16%

Simple cyclic schedule for real-time guarantee

Prototyped system on GPP

Multithreaded implementation of LTE

21/22

All Rights Reserved © Alcatel-Lucent 2012

Anand, Mobicom ‘12

Turbo, FFT/IFFT

Ongoing efforts

Heterogeneous systems

GPP + GPU + FPGA

Algorithms, architecture?

Pooling decisions at smaller time scales

10 ms, 100ms?

What are the savings in energy?

Thanks!

Switch

GPP1

BBU

GPP2

FPGA1 FPGA2

Data center

GPU1 GPU2

22/22