Universität Dortmund
Lab of Hardware-Software
Design of Embedded Systems
Davide Rossi
DEI University of Bologna
AA 2016-2017
Universität Dortmund
Objective of this course
• Design of digital circuits with HardwareDescription Languages (HDL) based digitaldesign flows
• Programming and debugging of embeddedarchitectures based on micro-controllers(MCUs)
• Extending MCU-based embeddedarchitectures implemented on FPGAs withcustom HDL blocks
Universität Dortmund
Free access to the lab
• Monday 11 - 18• Tuesday -• Wednesday -• Thursday 14 - 18• Friday 9 - 18
• Please, fill the paper with:
Name, Surname, Matricola, e-mail address, courses schedule
Universität Dortmund
Notes on the course• Web: http://www-micrel.deis.unibo.it/LABMPHSENG
• Teacher: Prof. [email protected]
• Organization
– Lectures - theory and examples (Covered in MPHS)
– Labs - practical expeirence and exercise (mostly covered in this course)
• Examination
– Project + Oral presentations
• Pre-requisites
– Basic digital electronics
– Basic computer architecture
– C programming
– Basic linux operating system
Universität Dortmund
Structure of this course
HW design
HDL-based flow
(HW part of the class)
Extending Microcontrollers
Architecture on FPGAs
(HW + SW)
Programming
Microcontrollers
(SW part of the CLASS)
Universität Dortmund
Relations with Hardware-SoftwareDesign of Embedded Systems
Hardware Design flows
with Hardware
Description Languages
IntroIntro
HDL Coding, Simulation
and Verification
Microcontrollers
Architectures
Microcontrollers
Programming
and debugging
Extending MCU
architecture
on FPGAs
Parallel
Programming on GP-GPUs
HW-SW design of
embedded systemsLab of HW-SW design of
embedded systems
Universität Dortmund
Embedded systems overview
• Computing systems are everywhere
• Most of us think of “desktop” computers
– PC’s
– Laptops
– Mainframes
– Servers
• But there’s another type of computing system
– Far more common...
Universität Dortmund
Embedded systems overview
• Embedded computing systems– Computing systems embedded
within electronic devices
– Hard to define. Nearly any computing system other than a desktop computer
– Billions of units produced yearly, versus millions of desktop units
– Perhaps 50 per household and per automobile
Computers are in here...
and here...
and even here...
Lots more of these,
though they cost a lot
less each.
Universität Dortmund
A “short list” of embedded systems
And the list goes on and on
Anti-lock brakes
Auto-focus cameras
Automatic teller machines
Automatic toll systems
Automatic transmission
Avionic systems
Battery chargers
Camcorders
Cell phones
Cell-phone base stations
Cordless phones
Cruise control
Curbside check-in systems
Digital cameras
Disk drives
Electronic card readers
Electronic instruments
Electronic toys/games
Factory control
Fax machines
Fingerprint identifiers
Home security systems
Life-support systems
Medical testing systems
Modems
MPEG decoders
Network cards
Network switches/routers
On-board navigation
Pagers
Photocopiers
Point-of-sale systems
Portable video games
Printers
Satellite phones
Scanners
Smart ovens/dishwashers
Speech recognizers
Stereo systems
Teleconferencing systems
Televisions
Temperature controllers
Theft tracking systems
TV set-top boxes
VCR’s, DVD players
Video game consoles
Video phones
Washers and dryers
Universität Dortmund
Some common characteristics of embedded systems
• Single-functioned– Executes a single program, repeatedly
• Tightly-constrained– Low cost, low power, small, fast, etc.
• Reactive and real-time– Continually reacts to changes in the system’s
environment
– Must compute certain results in real-time without delay
Universität Dortmund
An embedded system example -- a digital camera
Microcontroller
CCD preprocessor Pixel coprocessorA2D
D2A
JPEG codec
DMA controller
Memory controller ISA bus interface UART LCD ctrl
Display ctrl
Multiplier/Accum
Digital camera chip
lens
CCD
• Single-functioned -- always a digital camera
• Tightly-constrained -- Low cost, low power, small, fast
• Reactive and real-time -- only to a small extent
Universität Dortmund
Design challenge –optimizing design metrics
• Obvious design goal:– Construct an implementation with desired
functionality
• Key design challenge:– Simultaneously optimize numerous design metrics
• Design metric– A measurable feature of a system’s
implementation
– Optimizing design metrics is a key challenge
Universität Dortmund
Design challenge –optimizing design metrics
• Common metrics
– Unit cost: the monetary cost of manufacturing each copy of the system,
excluding NRE cost
– NRE cost (Non-Recurring Engineering cost): The one-
time monetary cost of designing the system
– Size: the physical space required by the system
– Performance: the execution time or throughput of the system
– Power: the amount of power consumed by the system
– Flexibility: the ability to change the functionality of the system without
incurring heavy NRE cost
Universität Dortmund
Design challenge –optimizing design metrics
• Common metrics (continued)
– Time-to-prototype: the time needed to build a working version of
the system
– Time-to-market: the time required to develop a system to the point
that it can be released and sold to customers
– Maintainability: the ability to modify the system after its initial
release
– Correctness, safety, many more
Universität Dortmund
Design metric competition –improving one may worsen others
• Expertise with both software and hardware is needed to optimize design metrics– Not just a hardware or
software expert, as is common
– A designer must be comfortable with various technologies in order to choose the best for a given application and constraints
SizePerformance
Power
NRE cost
Microcontroller
CCD preprocessor Pixel coprocessorA2D
D2A
JPEG codec
DMA controller
Memory controller ISA bus interface UART LCD ctrl
Display ctrl
Multiplier/Accum
Digital camera chip
lens
CCD
Hardware
Software
Universität Dortmund
Three key embedded system technologies
• Technology
– A manner of accomplishing a task, especially using technical processes, methods, or knowledge
• Three key technologies for embedded systems
– Processor technology
– Design technology
– IC technology
Universität Dortmund
Processor technology
• The architecture of the computation engine used to implement a system’s desired functionality
• Processor does not have to be programmable– “Processor” not equal to general-purpose processor
Application-specific
Registers
Custom
ALU
DatapathController
Program memory
Assembly code
for:
total = 0
for i =1 to …
Control logic
and State
register
Data
memory
IR PC
Single-purpose (“hardware”)
DatapathController
Control
logic
State
register
Data
memory
index
total
+
IR PC
Register
file
General
ALU
DatapathController
Program
memory
Assembly code
for:
total = 0
for i =1 to …
Control
logic and
State register
Data
memory
General-purpose (“software”)
Universität Dortmund
Processor technology
• Processors vary in their customization for the problem at hand
total = 0
for i = 1 to N loop
total += M[i]
end loop
General-purpose
processor
Single-purpose
processor
Application-specific
processor
Desired
functionality
Universität Dortmund
General-purpose processors
• Programmable device used in a variety of applications– Also known as “microprocessor”
• Features– Program memory
– General datapath with large register file and general ALU
• User benefits– Low time-to-market and NRE costs
– High flexibility
• “Pentium” the most well-known, but there are hundreds of others
IR PC
Register
file
General
ALU
DatapathController
Program
memory
Assembly code
for:
total = 0
for i =1 to …
Control
logic and
State register
Data
memory
Universität Dortmund
Single-purpose processors
• Digital circuit designed to execute exactly one program– a.k.a. coprocessor, accelerator or peripheral
• Features– Contains only the components needed to
execute a single program
– No program memory
• Benefits– Fast
– Low power
– Small size
DatapathController
Control
logic
State
register
Data
memory
index
total
+
Universität Dortmund
Application-specific processors
• Programmable processor optimized for a particular class of applications having common characteristics– Compromise between general-purpose and
single-purpose processors
• Features– Program memory
– Optimized datapath
– Special functional units
• Benefits– Some flexibility, good performance, size and
power
IR PC
Registers
Custom
ALU
DatapathController
Program
memory
Assembly code
for:
total = 0
for i =1 to …
Control
logic and
State register
Data
memory
Universität Dortmund
IC technology
• The manner in which a digital (gate-level) implementation is mapped onto an IC
– IC: Integrated circuit, or “chip”
– IC technologies differ in their customization to a design
– IC’s consist of numerous layers (perhaps 10 or more)
• IC technologies differ with respect to who builds each layer and when
source drainchannel
oxide
gate
Silicon substrate
IC package IC
Universität Dortmund
IC technology
• Three types of IC technologies
– Full-custom/VLSI
– Semi-custom ASIC (gate array and standard cell)
– FPGAs (Field Programmable Gate Array)
Universität Dortmund
Full-custom/VLSI
• All layers are optimized for an embedded system’s particular digital implementation– Placing transistors
– Sizing transistors
– Routing wires
• Benefits– Excellent performance, small size, low power
• Drawbacks– High NRE cost (e.g., $300k), long time-to-market
• Only used for timing/power critical or analog blocks
Universität Dortmund
Semi-custom
• Lower layers are fully or partially built– Designers are left with routing of wires and maybe
placing some blocks
• Benefits– Good performance, good size, less NRE cost than a
full-custom implementation
• Drawbacks– Still require weeks to months to develop
• De facto standard for digital circuits design
Universität Dortmund
FPGAs• All layers already exist
– Designers can purchase an IC
– Connections on the IC are programmed to implement desired functionality
• Benefits– Low NRE costs, almost instant IC availability
• Drawbacks– Bigger, expensive (perhaps $30 per unit), power
hungry, slower
• De facto standard for prototyping of digital ICs
Universität Dortmund
Independence of processor and IC technologies
• Basic tradeoff– General vs. custom
– With respect to processor technology or IC technology
– The two technologies are independent
General-
purpose
processor
ASIPSingle-
purpose
processor
Semi-customFPGAs Full-custom
General,
providing improved:
Customized,
providing improved:
Power efficiency
Performance
Size
Cost (high volume)
Flexibility
Maintainability
NRE cost
Time- to-prototype
Time-to-market
Cost (low volume)
Target technology of this course
Universität Dortmund
• It is primarily a semiconductor device that can be configured by the user (customer or designer) after the manufacturing process has been completed
• The term "field-programmable" means the device is programmed by the customer, not the manufacturer.
• Can be programmed using a logic circuit diagram or source code in VHDL or Verilog
• It offers partial re-configuration of a portion of design
What is an FPGA?
Universität Dortmund
• An FPGA (Field Programmable Gate Array) is a reprogrammable chip which contains hundreds of thousands of logic gates that internally connects together to build complex digital circuitry.
ZedBoard
Xilinx ZYNQ FPGA
What is an FPGA?
Universität Dortmund
FPGAs vs. general-purpose Processors
• FPGAs excel at computing non-data dependent algorithms in parallel.
• Customizable data path and ALU allow very large amounts of data to be transferred and computed within several clock cycles.
• Despite lower clock frequencies, FPGA’s can outperform conventional CPU’s on certain data processing tasks
Universität Dortmund
• Advantages of FPGAs over ASICs:
– Shorter time to market
– Can be re-programmed in the field to fix bugs, and lower engineering costs
– Hardware can be developed on ordinary FPGAs, leading to a finalized version that can no longer be modified after the design has been decided
FPGAs vs. ASICs
Universität Dortmund
• Power consumption– FPGAs fundamentally use a lot more power than ASICs
• Price– they also fundamentally cost more
• Speed– ASICs can still blow any FPGA away in speed although
design techniques can help with this issue
• Density– ASICs can still pack a lot more logic into a single chip
than an FPGA
The reasons they may NOT fit your design are:
Universität Dortmund
Architectural Features of FPGAs• Common FPGA architecture involves:
– Configurable Logic Blocks (CLBs)
– I/O blocks
– Routing PathsRouting
Paths
Universität Dortmund
IOB IOB IOB IOB
CLB CLB
CLB CLB
IOB
IOB
IOB
IOB
Wiring Channels
Xilinx Programmable Gate Arrays• CLB - Configurable Logic Block
– 5-input, 1 output function– or 2 4-input, 1 output functions– optional register on outputs– Built-in fast carry logic– Can be used as memory
• Three types of routing– direct– general-purpose– long lines of various lengths
• RAM-programmable– can be reconfigured
Universität Dortmund
CLB Slice Structure• Each slice contains two sets of
the following:– Four-input LUT
• Any 4-input logic function,• or 16-bit x 1 sync RAM (SLICEM only)• or 16-bit shift register (SLICEM only)
– Carry & Control• Fast arithmetic logic• Multiplier logic• Multiplexer logic
– Storage element• Latch or flip-flop• Set and reset• True or inverted inputs• Sync. or Async. control
Universität Dortmund
Details of One Virtex Slice
Universität Dortmund
4-input
function
3-input
function;
registered
Universität Dortmund
e.g. 9-
input parity
Implement Some Larger Functions
Universität Dortmund
LUT (Look-Up Table) Functionality
• Look-Up tables
are primary
elements for
logic
implementation
• Each LUT can
implement any
function of
4 inputsx1 x2 x3 x4
y
x1 x2
y
LUT
x1x2x3x4
y
0
x1
0
x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y
0100010101001100
0
x1
0
x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y
1111111111110000
x1 x2 x3 x4
y
x1 x2 x3 x4
y
x1 x2
y
x1 x2
y
LUT
x1x2x3x4
y
0
x1
0
x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y
0100010101001100
0
x1
0
x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y
0100010101001100
0
x1
0
x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y
1111111111110000
0
x1
0
x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y
1111111111110000
Universität Dortmund
• The general architecture of Xilinx FPGAs consists of a two-dimensional array of programmable blocks, called Configurable Logic Blocks – CLBs
• with horizontal and vertical routing channels between CLB’s rows and columns.
Xilinx Routing
Universität Dortmund
Island Style Architecture
Universität Dortmund
Flexibility of
Connection, Fc = 2,
Can A connect to B?
Connection boxes
Universität Dortmund
Switch BoxesFs, defines for a wiring segment
entering the S block the number
of other wiring segments it can
be connected to
Universität Dortmund
Routings using C and S Boxes
Universität Dortmund
Programming Technologies• Fuse and anti-fuse
– fuse makes or breaks link between two wires
– one-time programmable
• Flash– High density
– Process issues
• RAM-based (most commonly used)– memory bit controls a switch that connects/disconnects
two wires
– can be programmed and re-programmed easily (tested at factory)
Universität Dortmund
• Logic optimization. – Performs two-level or multi-level minimization of
the Boolean equations to optimize area, delay, or a combination of both.
• Technology mapping. – Transforms the Boolean equations into a circuit of
FPGA logic blocks. This step also optimizes the total number of logic blocks required (area optimization) or the number of logic blocks in time-critical paths (delay optimization).
CAD process to implement a circuit in an FPGA
Universität Dortmund
• Placement– Selects the specific location for each logic block in the
FPGA, while trying to minimize the total length of interconnect required.
• Routing– Connects the available FPGA’s routing resources1 with
the logic blocks distributed inside the FPGA by the placement tool, carrying signals from where they are generated to where they are used.
CAD process to implement a circuit in an FPGA
Universität Dortmund
Summary
• Embedded systems are everywhere
• Key challenge: optimization of design metrics– Design metrics compete with one another
• A unified view of hardware and software is necessary to improve productivity
• Three key technologies– Processor: general-purpose, application-specific, single-purpose
– Design: Compilation/synthesis, libraries/IP, test/verification
– IC: Full-custom, semi-custom, FPGAs
• FPGA architectural features– main technology target of this course