Prepared:OELPW Anders ÅhlanderApproved:
Checked:
Date:2015-02-08Confidentiality Class:COMPANY UNCLASSIFIED
Document Number:enRevision:PA1Document Name:
High Performance EmbeddedComputing
Challenges in long lifetime applications
Chalmers Computing Lab Tech TalksFebruary 8 2015
COMPANY UNCLASSIFIED
Page 2 (33)PAGE 2PAGE 2ELECTRONIC DEFENCE SYSTEMSPAGE 2
ELECTRONIC DEFENCE SYSTEMS
SURFACE RADARSYSTEMS
FUTURE SENSORSYSTEMS
AIRBORNESURVEILLANCE
SYSTEMS
SOURCING &PRODUCTION
EW SYSTEMS
SeniorVice
President
Stafffunctions
COMPANY UNCLASSIFIED
Page 3 (33)PAGE 3
World-leading centre of competence for microwave andantenna technology. Advanced airborne, ground-based andnaval radar systems as well as radar upgrade expertise.
Full range of assets in the Electronic Warfare area, forsignals intelligence, warning and self-protection.
MSEK 2013 2012 2011
Order intake 7,620 2,739 3,229Sales 4,161 4,276 4,561No employees 2,588 2,620 2,557
MSEK 2013 2012 2011
Order intake 7,620 2,739 3,229Sales 4,161 4,276 4,561No employees 2,588 2,620 2,557
KEY CAPABILITIESAND FIGURES
COMPANY UNCLASSIFIED
Page 4 (33)PAGE 4PAGE 4
GLOBAL OPERATIONS
Competence centres in five countries.
Systems in operation in more than 30 countries.
Key markets: Sweden, Germany, UK, US, Brazil, Middle East,India, Thailand, South Korea.
COMPANY UNCLASSIFIED
Page 5 (33)
Outline
Long-lifetime HPEC applications• Particular challenges• Aspects of cost-effective application development
Exploiting the processing technology development
Possibilities to obtain cost-efficiency• Previous and ongoing “FoT” projects
An ongoing project: ESCHER
Summary
enRev PA1
COMPANY UNCLASSIFIED
Page 6 (33)
Active Electronically Scanned Array
Performance
Time
before AESA after AESA
GFLOPS
TFLOPS
Cost:High demands on signal processorHigh complexity
en
Example: AESA based sensor systems
Benefit:Powerful system operation
Rev PA1
COMPANY UNCLASSIFIED
Page 7 (33)
en
Processing challenges
AESA SP performance in the same “box”as for conventional systems, considering• Physical size• Power dissipation• Physical robustness
Cost-effective development of the processing• Engineering efficiency; mastering complexity• Flexibility; easy and efficient enabling of various options• Sustainability; application support over many years of lifetime
Contradictory?
Rev PA1
COMPANY UNCLASSIFIED
Page 8 (33)
en
Engineering efficiency
Mastering complexity• high computational load – high parallelism• multi-functional – complex resource management and scheduling
The aggregated complexity scales differently with the system• More channels ->
- same complexity on algorithm level (only larger matrices)- more complexity in realization (more processors)
• More functions/new algorithms ->- more complex control and interactions
Technology improvement may make it possible to trade hardwareperformance for reduced development time• use higher levels of abstraction
Rev PA1
COMPANY UNCLASSIFIED
Page 9 (33)
en
Flexibility
Possibility to enable software options for differentproducts or for different users• different functions• different types and number of sensors• efficient testing and deployment of the systems
Possibility to configure the system for the mission at hand
Capability to meet changing demands on the fly
Rev PA1
COMPANY UNCLASSIFIED
Page 10 (33)
en
Sustainability
Technology insertion• Possible to upgrade the SW and refresh the
HW over lifetime
Scalability• Easy to scale up or down functionality/performance in order to tolerate
different hardware implementations of the system• Easy to “ride on Moore´s law”
Layered application development• A modern, sustainable codebase• A clear separation of hardware features from the application requirement
is made
General-purpose vs. special-purpose hardware• Certain functions can be implemented in acceleration hardware as long
as there is a clear path (and a clean interface) to replace the hardwarewhen it becomes obsolete
Rev PA1
COMPANY UNCLASSIFIED
Page 11 (33)
en
Desired platform properties
Possible to
Take advantage of the rapid technology development to shortenthe application development timeDecouple system SW from the HW implementationMultiple implementation options for any given applicationA simple way to replace or add hardware modulesLayered development of application SW and support for re-use ofSW componentsScalability in terms of problem size as well as technologydevelopment
Rev PA1
COMPANY UNCLASSIFIED
Page 12 (33)
How to combine• high processing efficiency with• high engineering efficiency?
enRev PA1
COMPANY UNCLASSIFIED
Page 13 (33)Rev PA1en
Generality-performance trade-off
COTS
custom
time
enoughgenerality
for applicationdomain
architectureadvantage
performancegaintime lead
performance
generality
scales withtechnologydevelopment
Combine the best of two worlds- performance, engineering efficiency
COMPANY UNCLASSIFIED
Page 14 (33)
high performance
The application domain
linear operations
MM FIR FFTdataset
multi-functional
high complexity
Rev PA1en
COMPANY UNCLASSIFIED
Page 15 (33)
Technology development
The International Technology Roadmap for Semiconductors (ITRS)
• Includes time-lines up to about 15 years into the future
• ITRS is sponsored by- the European Semiconductor Industry Association- the Japan Electronics and Information Technology
Industries Association- the Korean Semiconductor Industry Association- the Taiwan Semiconductor Industry Association- the United States Semiconductor Industry Association
enRev PA1
COMPANY UNCLASSIFIED
Page 16 (33)Rev PA1en
ITRS market drivers
ITRS identifies different market drivers for the technologydevelopment
Portable/consumer, Medical, Defense, Automotive, etc.
The market drivers drive the development of, e.g., microprocessorsand System-on-a-chip (SoC) devices
Portable/consumer market driver is chosen here• High power efficiency is paramount in HPEC
COMPANY UNCLASSIFIED
Page 17 (33)
Number of processing cores on a chip
Suits luckily typically our applications well
en
Trend according to ITRS
Rev PA1
COMPANY UNCLASSIFIED
Page 18 (33)
Performance trend
0,0
0,5
1,0
1,5
2,0
2,5
3,0
3,5
4,0
4,5
5,0
2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024
Year
log(
norm
aliz
edpe
rform
ance
)
Trend: Performance Requirement: Performance
Soc Consumer Portable Processing Performance Trends (source ITRS)
enRev PA1
COMPANY UNCLASSIFIED
Page 19 (33)
Power trend
0
2
4
6
8
10
12
14
2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024
Year
Pow
er(W
att)
Trend: Total chip power Requirement: Total chip power
Soc Consumer Portable total power trend (source ITRS)
Total power = static power (gate leakage etc.) + dynamic power (switching)
enRev PA1
COMPANY UNCLASSIFIED
Page 20 (33)
Performance/power ratio trend
0
5
10
15
20
25
30
35
40
45
50
2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024
Year
Norm
aliz
edra
tio
Trend: Performance/power ratio
Consumer Portable performance per power trend, normalized to 2009
enRev PA1
COMPANY UNCLASSIFIED
Page 21 (33)
Sensor signal processing
0
5
10
15
20
25
30
35
40
45
50
2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024
Year
Nor
mal
ized
ratio
Trend: Performance/power ratio
32 channels
1 channel
short-term long-term
Illustration, requirements on SP power efficiency vs. HW trend
enRev PA1
COMPANY UNCLASSIFIED
Page 22 (33)
Energy efficiency
Different aspects of energy efficiency
Larger high-end systems• Many watts of available power but very high
computational performance• E.g. multi-channel adaptive processing in AESA based systems
Small handheld systems• Short range - low transmit power, thus more focus on
SP power• Often high frequency - many resolution cells, thus hiigh
SP performance• Battery powered
en
(illustration)
Rev PA1
COMPANY UNCLASSIFIED
Page 23 (33)
How to exploit the technology development
The processor chips goes towards many-core (100+ cores)
How shall we efficiently use all the cores?• Application programming• Mapping on the processor architecture
Joint industry/academy research projectsaddress this
enRev PA1
COMPANY UNCLASSIFIED
Page 24 (33)
Research projects
en
1995 2014
REMAP HSSP EEE
TELLUSSPREWS
national/international joint industry/academic projects
EDA projects2009 2014
TELLUS 2
SMECY ESCHEREPC JUMP
… and others
Rev PA1
ERTCENS
COMPANY UNCLASSIFIED
Page 25 (33)
High Speed Signal ProcessingExample of project result
en
• Mainly COTS Application development environment• Commercial RTOS• ANSI C• In-house algorithmic library
• A TFLOPS system realization proposal• Five BYB601 cassettes• LSI Logic G13 ASICs with commercial RISC masters• LVDS ring network, 1.6 GB/s data channel + control
channel• Realizable in year 2001
Rev PA1
COMPANY UNCLASSIFIED
Page 26 (33)
An ongoing project: ESCHER
Embedded Streaming Computations onHeterogeneous Energy-efficient aRchitectures
KK-foundation HÖG project, 2014-2016Lead: CERES at Halmstad University
enRev PA1
COMPANY UNCLASSIFIED
Page 27 (33)
Processing eras
Single-Core Era
Enabled by:+Moore’s Law+Voltage Scaling+Micro Architecture/RISC
Constrained by:– Power– Complexity
Time
we arehere
Sin
gle
thre
adpe
rform
ance
27
Multi-Core Era
Enabled by:+Moore’s Law+Desire for throughput+20 years of SMP
Constrained by:– Power– Parallel SW available– Scalability
Time (# processors)
we arehere
Thro
ughp
utpe
rform
ance
Many-Core/Heterogeneous
Systems EraEnabled by:
+Power efficiency throughhigh parallelism+Moore’s Law
Currently Constrained by:– Power– Programming models– Communication overhead
Time
we arehere
Targ
eted
appl
icat
ion
perfo
rman
ce
Assembler => C => C++/Java pthreads => OpenMP/TBB ... OpenCL/CUDA, StreamIt, CAL,Occam-Pi, Chapel, ZPL,...
Inspired by "The Future Is Heterogeneous Computing",Mike Houston, Advanced Micro Devices, 2010
enRev PA1
COMPANY UNCLASSIFIED
Page 28 (33)
ESCHER
Two main parts:
Application Development support and Languages forHeterogeneous Manycore Architectures
Heterogeneous Many-core Architectures for Real-TimeEmbedded Streaming Systems
enRev PA1
COMPANY UNCLASSIFIED
Page 29 (33) 29
Embedded Real-Time StreamingComputations
The Applications of the Industrial Partners are often in the form ofEmbedded Streaming Applications like:
Sensor systems
Autonomous Vehicles
Vision/Video
Communication
enRev PA1
COMPANY UNCLASSIFIED
Page 30 (33)
ESCHER
Maximize the “four Ps”• Programmer Productivity• Program Portability• Performance• Power efficiency
enRev PA1
COMPANY UNCLASSIFIED
Page 31 (33)
Applications
ScientificComputing
Streaming Real-time Processing
Data Mining
Machine Learning
Architectures
Many-core
GPGPU
Course-grainedReconfig. Arch.
FPGA
Programmability
Inspired by Kevin J. Brown, et al.,“A Heterogeneous Parallel Framework for Domain-Specific Languages”,PACT 2011
Parallel Programming Language
enRev PA1
COMPANY UNCLASSIFIED
Page 32 (33)
Applications
ScientificComputing
Streaming Real-time Processing
Data Mining
Machine Learning
Architectures
Many-core
GPGPU
Course-grainedReconfig. Arch.
FPGA
Programmability Chasm
Inspired by Kevin J. Brown, et al.,“A Heterogeneous Parallel Framework for Domain-Specific Languages”,PACT 2011
enRev PA1
COMPANY UNCLASSIFIED
Page 33 (33) 33
Applications
ScientificEngineering
Streaming Real-time Processing
Data Mining
Machine Learning
Architectures
Many-Core
GPGPU
Course-grainedReconfig. Arch.
FPGA
How to bridge theProgrammability Chasm?
enRev PA1
COMPANY UNCLASSIFIED
Page 34 (33)
Saab EDS in ESCHER
Brings in expertise• Signal and data processing• HW and SW development for sensor processing
Provides application use casesEvaluates the developed design environments
Goals: get knowledge if the studied development approaches willoutperform traditional approaches in aspects such as engineeringefficiency, performance, and powerA concrete goal: a proof of concept realization of an AESA signalprocessing chain using the design tool flows on real hardware
enRev PA1
COMPANY UNCLASSIFIED
Page 35 (33)
Summary
High performance efficiency and high engineering efficiency musttypically be combined in HPECTypically a lifetime mismatch between SP technology and sensorapplicationsPossibilities• Cost-effective sustainable solutions are possible• Possible to ride on Moore’s law – scale in problem size and function
Risks• Applications are not portable to new hardware platforms
Solutions• Layered application development• Domain specific languages with the right abstraction level• Intermediate representations that support portability over hardware• HW architecture that is “general enough”
enRev PA1