dynamiq processor solutions (using cortex- a75 & 5 © 2017 arm limited 4g logical ran...
TRANSCRIPT
© 2017 Arm Limited Arm Tech Symposia 2017
DynamIQ Processor Solutions (Using Cortex-
A75 & Cortex-A55) for 5G Networks & Mobile
Satoshi Nakajima | FAE manager| Arm
Norio Kisumi | Sr Manager, Operator Relations| Arm
© 2017 Arm Limited
5G Networks
3 © 2017 Arm Limited
5G Technical Requirement (3GPP Release-15 and beyond)
eMBB(enhanced Mobile BroadBand)
URLLC(Ultra-Reliable and Low Latency Communications)
mMTC(massive Machine Type Communications)
Throughput >10Gbps
Mobility 500km/h
Latency <1mS
Reliability 99.999%
Density 1M-device/km2
Battery Life >10years
100x
1.5x
30 – 50x
10x
100-1000x
100x energy efficiency
5G Target Enhancement from 4G
4 © 2017 Arm Limited
5G Dynamics Across the Network: Edge to Cloud
Devices
ACC
ACS ion
Storage
ion
Storage
Packet Flows Packet Flows
Acceleration
Storage
Compute
Packet Flows
S
ACS
C
AC
A
S
SA
CIoT, Automotive,
Industrial, Medical
Core / DCAccess/Edge EdgeCompute
• New Radio Technology• Heterogeneous Network• Network Virtualization • Network Slicing• Mobile Edge Computing
• Flexible• Scalable• Diverse • Performance• Energy efficient
require
computing platform
to be more …
5 © 2017 Arm Limited
4G logical RAN architecture
RU
RU
RU
RU
RU
DU
EPC
DU
DU
DU
RU
RU
RU
RU
DU
RU
RU
DU
RU
eNB
eNB
eNB eNB
Radio
RadioL1 / Phy
MAC RLC PDCPRadio
Radio
Management and control
CPRI
Distributed Unit (DU)Remote Unit (RU)
eNodeB
MME
PGWSGW
Evolved Packet Core (EPC)
S11
S5
HSS
S6a
PCRF
Gx
RU : Radio Unit
DU : Distributed Unit
EPC : Evolved Packet Core
Source : Ericsson review, 3GPP TR 38.801
6 © 2017 Arm Limited
RURU
RU
RU
RU
RU
Next gen logical RAN architecture
RU
DU
CUEPC/NGC
DU
DU
DU
DU
DU
RU
RU
RU
RURU
RU : Radio Unit
DU : Distributed Unit
EPC : Evolved Packet Core
CU: Central Unit
NGC : Next Generation Core
gNBPDCP
Central Unit
(CU)
Switch
Management
L1 / Phy
MAC RLC
Management
Distributed Unit (DU)
Source : Ericsson review, 3GPP TR 38.801
7 © 2017 Arm Limited
Network/Wireless Infrastructure:Three Distinct Segments – Two Distinct Approaches
5G Antenna MEC Core Network/Data Centre
Server
NetworkOffload
Server
NetworkAnd RAN
Offload
CPU
Air Interface
Processing
Typically standardized platform designs
Typically customized platform designs
Base Station
8 © 2017 Arm Limited
Network/Wireless Infrastructure: Edge Access
Arm: Meet diverse needs with heterogeneous platforms
• Mix of latency, throughput and power
• Supplement sub-systems approach with accelerator offload.
• Sub-system/packet processing platform
• ASIC based designs and dedicated platforms
• Merchant Silicon from multiple suppliers
CCIX and connection to FPGA technology
Edge Network Architecture
5G Air Interface Management• Antenna, RRH, DFE, Distributed Base Band,
Multi-Access Compute Acceleration.• IoT Gateway and End applications.• Optimized power, performance, latency
5G Antenna
CPU
Air Interface
Processing
Base Station
Hardware acceleration for advanced signal processing and terabit throughput.
Customized platform designs
Packet processing / front end / edge design .
9 © 2017 Arm Limited
5G Network Hierarchy Design
• Multi-Access Edge Compute, Distributed Evolved Packet Core, Distributed Applications Processing, Distributed Content Delivery
• Application of virtualized/containerized workloads
Network/Wireless Infrastructure: Core Network
Leverage energy efficient Arm servers with specifics for network infrastructure
Linaro Server group activity, VNF porting activity, OPNFV actions, AIDC, OSEC
Standardized general purpose compute platforms on Arm
Performance/Density/Scalability
• Support dedicated offload through silicon/software.
• DPDK/ODP support
Control plane functions, content and applications delivery can use general purpose virtualized cloud computing.
Core Network/Data Centre
Server
NetworkOffload
Server-like Architecture – more efficient on Arm
10 © 2017 Arm Limited
Network/Wireless Infrastructure: Mobile Edge Compute
• MEC will be critical in meeting 5G goals
• Low latency, low power required
• E2E Automotive assisted driving, IoT and industrial use cases
• 5G network Slicing dependent
• Designs range from chassis, multi chip to SOC designs
Unique designs tailored to MEC form factor will prevail.
Mix of: • Core network needs (software
driven, compute performance)• Edge Network needs (Latency,
throughput, power optimized)
5G network slicing – power and latency
MEC
Server
NetworkAnd RAN
Offload
Workload optimized general purpose compute and accelerators for user plane processing. Edge Compute Architecture
11 © 2017 Arm Limited
5G Phase
Wireless Infrastructure – Market Influence & Timescales
2017 2018 2019 2020 2021 2022
Early “5G” deployments5G Specifications
Distributed L1/L2) processing
5G Mesh antenna
LTE-Advanced Pro
NFV Early Deployment/Core
Edge Cloud/MEC Early Deployments
Broadband Mesh & Metro WiFi (WiGIG)
WRC-15 Frequencies (sub 6GHz) WRC-19 Frequencies (5G cm & mm up to 100GHz)
OPNFV Deployments/Core
Enhanced Security Requirements
Pre-5G Phase
Broad-band Mesh
Common 3G/4G Edge and Core Network
TechnologyPreparation
5G NG Cellular drivers:• 5G/LTE-Pro as wireline replacement for last mile.
Speed over features • LTE-Pro as a testing ground for new antenna
algorithms. Beam forming and MIMO• Re-Farmed spectrum and channel aggregation
allows 4G to stretch to high throughput needs• MEC driven by new uses and traffic type
Massive MIMO, Beamforming
5G vol. ramp
© 2017 Arm Limited
DynamIQ Processor Solutions
© 2017 Arm Limited 13
Networking requires diverse platforms
GPP
Mem
GPP
GPP
DSP NPUGPP
GPP NPU
DSP PLD
PLDAccel
Networking
GPP
Storage
Acceleration
NPU
…Wireless-access-point / vCPE …Switch / router blade
…Base station / modem …Scale out / server
GPP Accel
DSP
Networking
GPP
Storage
Acceleration
…Edge content caching
& optimization
…vCPE
Networking
GPP
Acceleration
…NFV applianceStorage
GPP GPP GPP GPP
GPP GPP GPP GPP
GPP GPP GPP GPP
GPP GPP GPP GPP
© 2017 Arm Limited 14
New DynamIQ-based CPUs for new possibilities
All comparisons at ISO process and frequency
Baseline to Cortex-A72 Baseline to Cortex-A53
All comparisons at ISO process and frequency
1.97x
1.42x
1.21x
1
2
3
1.30x
1.39x
1.23x
1
2
3
Arm Cortex-A55Arm Cortex-A75
© 2017 Arm Limited 15
Next-generation features
Dot product and half-precision float for AI/ML processing.
Virtualized Host Extensions (VHE) offering Type-2 hypervisor (KVM) performance improvements.
Cache stashing and atomic operations improves multicore networking performance and improves latency.
Cache clean to persistence to support storage class memory.
Infrastructure class RAS enhancement including data poisoning and improved error management.
© 2017 Arm Limited 16
DynamIQ cluster
0 - 7 CoresCore 0
Snoop filter
PowerManagement
L3 Cache
BusI/F
ACP andperipheral
port I/F
Core 7
Asynchronous bridges
DynamIQ Shared Unit (DSU)
DynamIQ Shared Unit (DSU)
Streamlines traffic across bridges
Advanced power management features
Latency and bandwidth optimizations
Support for multiple performance domains
Scalable interfaces for edge to cloud applications
Supports large amounts of local memory
Low latency interfaces for closely coupled accelerators
© 2017 Arm Limited 17
Increasing performance through cache stashing
Enables reads/writes into the shared L3 cache or per-core L2 cache.
Allows closely coupled accelerators and I/O agents to gain access to core memory.
AMBA 5 CHI and Accelerator Coherency Port (ACP) can be used for cache stashing.
More throughput with Peripheral Port (PP) for acceleration, network, storage use-cases.
Acceleratoror I/O
CoreLink CMN-600
DMC-620 DMC-620
Agile System Cache
DDR4 DDR4
L3 Cache
L2 Cache
CPU
L2 Cache
CPU
L2 Cache
CPU
L2 Cache
Cortex-A
Agile System Cache
L3 Cache
L2 Cache
CPU
L2 Cache
CPU
L2 Cache
CPU
L2 Cache
Cortex-A
Stash critical data to any cache level
DynamIQ cluster
0–7 Cores
Core0
Snoop filter
PowerMngmt
L3 Cache
BusI/F
ACP andperipheral
port I/F
Core7
Asynchronous bridges
DynamIQ Shared Unit (DSU)
L1 cache
L2 cache
L1 cache
L2 cache
© 2017 Arm Limited 18
Networking: Compute and packet-processing solution
Build the right mix of compute
• High-performance Cortex-A75
• High-efficiency throughput Cortex-A55
• DSP, accelerators
Scale CoreLink CMN-600 for application
• Access points: 2~5W
• Macro BTS: 20~30W
• Telecom server: 60~100W
DSP
DMC-620
Cortex-A75
Level-3 Cache
DMC-620
Data plane processing• Throughput compute• Optional local accelerators
Scalable coherent SoC memory system
Control plane processing• Performance compute• Timing critical processing
IO IO
Ethernet
Accelerator
Agile System Cache Agile System Cache
Local
AcceleratorCortex-A55
Level-3 Cache
CoreLink CMN-600
© 2017 Arm Limited 19
Data center server blade: Maximize compute density
Cortex-A75 can deliver 1.4x to 2.9x higher performance .
• >1.4x performance with 32 core Cortex-A75 + CMN-600 vs. 32 core Cortex-A72 + CCN-512
• 2.9x performance with 64 core Cortex-A75 +CMN-600
• Higher rate performance than competitive architectures per socket
• 1-1.5W/core enables sustained operation at maximum performance
Power: 60W compute.
Cache Cache Cache Cache
Cache Cache Cache Cache
Cache Cache Cache Cache
Cache Cache Cache CacheD
MC
-60
0
DD
R4
-32
00
DM
C-6
00
DD
R4
-32
00
DMC-600
DDR4-3200
DMC-600
DDR4-3200
DM
C-6
00
DD
R4
-32
00
DM
C-6
00
DD
R4
-32
00
DMC-600
DDR4-3200
DMC-600
DDR4-3200100GbE
CM
L
PCIe
CM
L
PC
IeP
CIe
20 © 2017 Arm Limited
まとめ
5Gでモバイルネットワークは大きな変換期を迎えます。
✓ 多様なユースケースに対応する仮想化やネットワークスライス技術
✓ モバイルエッジコンピューティング(MEC)に要求される相反する機能の両立。サーバ的な要求(ソフトウェアベース、高いコンピューティング性能、汎用性)とエッジネットワークの要求(低遅延、高スループット、電力効率)。
✓ RAN~Edge~Core~Data Centerプラットホームの互換性、柔軟性、およびスケーラビリティ
アームでは、
✓ DynamIQは、高いコンピューティング性能と消費電力の最適化を両立させ、かつ優れた拡張性を提供することで、真のスケーラブルなSoC開発を可能にします。
✓ Armとエコシステム・パートナーより提供されるソフトウェアによって、5Gネットワーク開発を支援します。
2121
Thank You!Danke!Merci!谢谢!ありがとう!Gracias!Kiitos!
© 2017 Arm Limited
2222 © 2017 Arm Limited
The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners.
www.arm.com/company/policies/trademarks