1. motion control, human machine interface, industrial automation, smart grid, safety,...
TRANSCRIPT
1
TI EMBEDDED PROCESSOR SOLUTIONS
Motion control, human machine interface,
industrial automation, smart grid, safety,
transportation,industrial & medical
Motor control, digital power, lighting,
renewable energy, smart grid
High performance real-time computing, video security and analytics,
video communications, multimedia infrastructure
Connected audio/voice,video, fingerprint biometrics,
portable medical, sensors
Measurement,sensing, general
purpose, consumer, medical
32-bit MCUs
16-bit Ultra-Low Power
& Value Line MCUs
32-bit Real-Time MCUs
32-bitMulticore DSPs
16/32-bitSingle-core DSPs
Stellaris® ARM MCU & Hercules™ Safety ARM
MCU
MSP™ MCU C2000™ MCU
Embedded Processing Portfolio
Microcontroller (MCU)Portfolio at-a-glance
ARM® Portfolio at-a-glance
Digital Signal Processor (DSP)Portfolio at-a-glance
Software, Tools, Kits & BoardsMCU DSP & ARM® MPU
32-bit Microprocessor
s
Sitara™
• ARM Cortex-A8
• ARM9™
Industrial automation,point-of-service,
human machine interface, portable navigation
• ARM Cortex™-M • ARM Cortex™-R
• Delfino, Piccolosingle-core MCU
• Concerto C28x+ ARM Cortex™-M
C6000™ & C5000™
single-core DSP
C6000™-based multicore DSP ARM MPU
• MSP430 MCUFixed/floating-point:• DSP + ARM• C66x multicore
DSP • DaVinci video
processors
• C6000 high performance fixed/floating-point DSP
• C5000 ultra-low power fixed-point DSP
3
C67x Architecture and Features
’C62x Fixed-Point CPU Core
Data Path 1
D1M1S1L1
A Register File
Data Path 2
L2S2M2D2
B Register File
Instruction Decode
Instruction Dispatch
Program Fetch
Interrupts
Control Registers
Control Logic
Emulation
Test
C6x VLIW CPU Core• DSP architecture challenge:
– DSP algorithms have a high degree of parallelism
– Cost-effective control of parallelism is difficult
• VLIW architecture solution:– Provides simple, cost-effective
control of parallelism• fetches 8 instructions/cycle• executes 1-8 instructions/cycle
reducing– code size– program fetches– power consumption
– Can support high-performance compilers• 3x improvement in efficiency
based on DSP benchmark suite– Can scale to support architectural
enhancements
5
C67x Floating point core• Performance (Comm/Ind)
– IEEE Floating Point Format•Double Precision•Single Precision
– 668 Multiplies & Accumulates- Single-Precision•2 Multipliers (334 MFLOPS)•2 ALUs (334 MFLOPS)
– 420 MFLOPS, Double Precision – 250 Multiplies & Accumulates- Double-Precision•1 Result/4 Cycles (83.5 MFLOPS)•1 Result/2 Cycles (167 MFLOPS)
6
VelociTITM: Speed with efficiency
Fully Serial
Serial/Parallel
FullyParallel
• Execute: CPU executes 1 to 8 instructions/cycle• As a result, fetch packets can contain
multiple execute packets• Parallelism is determined at
compile/assembly time and can be:
7
Floating Point DSP Comparison
MIPSMFLOPs
167 x8= 13361000
1600 1200
20001500
Architecture C67x C67x C67x+
Memory 64KB Data Memory 64KB Program Memory
4KB L1-P, 4KB L1-D, 256KB L2 Cache/SRAM
32KB L1-P, 256KB L2 SRAM, 384KB ROM
HPI HPI-16 1 32/16-bit 1 UHPI 32/16-bit
EMIF 100MHz 32-bit (SDRAM) 100MHz 32-bit (SDRAM) 100MHz 32-bit (SDRAM)
DMA 4-ch DMA 16-ch EDMA 16-ch dMAX
McBSP 2 2 0
McASP 0 2 3
I2C 0 2 3
SPI 0 0 2 (10MHz)
Package 429-pin Ceramic BGA(27mm, 1.27mm)352-pin Plastic BGA, (35.2mm, 1.27mm)
272-pin PBGA27x27xmm, 1.27mm
256-pin PBGA16x16mm, 1.0mm(Ceramic Package TBD)
C6701B167 MHz
C6713B200 MHz
C6727250 MHz
Software Compatible
8
TMS320C672x Device Overview
Large on-chip memory
384KB on-chip ROM 256KB on-chip RAM 32KB Inst. cache (Int Mem + EMIF) EMIF for expansion
Enhanced Audio IO
16 serial data pins Up to 6 different clock rates dMAX
- Support for dma, circular and multi-tap memory delay (for Reverb) HPI supports mux A/D and non-
mux A/D
300 MHz DSP core 300 MHz 67x+™ core 64 Reg + Additional FP instructions Code Compatible with 6713 Devices
TMS320C672x Floating-Point DSP
SPI 0
RTI TImer
IIC 1
McASP 2
IIC 0
McASP 1
McASP 0
SPI 1
C67x+TM DSP Core
InstructionCache
32K Bytes
768K Bytes ROM
256K Bytes SRAM
Memory Controller
EMIF
HPI
Switch
dMax
Config
DMA
Max Max
Control
9
• New memory architecture– Improved Instruction cache
• Size increased from 4KB to 32KB• Cache miss penalty to Internal Memory reduced
40%• Supports internal RAM/ROM and EMIF
– Direct single level flat memory for data, Single Cycle access (ROM and RAM)
– All RAM and ROM is accessible as pgm or data (like C6713)
Memory Architecture
10
• Changes in 67x+– All changes are backwards compatible to 67x CPU (C6713)– General Purpose Registers increased from 32 to 64– New MPYSPDP instruction – SP x DP into DP– New MPYSP2DP instruction – SP x SP into DP– Additional ADDSP/DP, ADDDP, SUBSP, SUBDP in S unit
• Now have 4 floating point add or subtracts in parallel– Execution packets can span Fetch Packets (64x feature)
• Code size reduction (5 to 10% reduction) since no padding with NOPs
Enhancements – DP, Code Density
11
Benchmark Performance
12
Performance: The BDTImarkTM
TM Berkeley Design Technology, Inc - Berkeley, CA
BDTImark
Real block FIR filterComplex block FIR filterSingle-sample LMS-adaptive FIR filterSingle-sample real FIR filterSingle-sample IIR filterVector dot productVector addVector maximumIS-54 convolutional encoderFinite state machine256-point FFT
13
’C67x: Floating point performance*
*Commercial Temp
BDTImarkTM: A DSP Speed MetricSource www.BDTI.com. ©1999 BDTI
TI TMS320C67x1 GFLOPS
TI TMS320C4X25 MIPS, 60 MFLOPS
TI TMS320C3X30 MIPS, 80 MFLOPS
ADI ADSP-2106x60 MIPS
Intel Pentium200 MHz
23
17
9
7
65
TM Berkeley Design Technology, Inc - Berkeley, CA
14
’C67x: Benchmark performance*
Floating-Point PerformanceExecution time (in Sec)
Matrix VectorMultiply
Convolution
Block FIR
Complex Radix4 FFT
108.33
0.420
0.828
13.296
Typical Floating-Point DSP(60 MFLOPS)
TI TMS320C67011 GFLOPS
149
16.6
1.25
1,672
*Commercial Temp
C28x Digital Signal Controller
TMS320F2812
Memory Bus
128Kw Flash+ 1Kw OTP
4Kw Boot ROM
18Kw RAM
XINTF
32-Bit Register File
Real-TimeJTAG
32-bit Timers (3)
150 MIPs C28xTM 32-bit DSP
32x32 bit Multiplier
RMWAtomicALU
Interrupt Management
Event Mgr A
Event Mgr B
12-Bit ADC
Watchdog
GPIO
McBSP
CAN 2.0B
SCI/UART-A
SCI/UART-B
SPI
Peripheral Bus
17
TMS320F2812 Features and Benefits
Features Benefits
150-MHz C28x 32-bit DSP core
C28x 32-bit DSP core enables high-speed execution of control algorithms. Faster control code execution gives headroom for advanced control techniques enabling great efficiency and cutting-edge features
Unique control peripherals
12-bit high-speed dual-sample-hold ADC allow for simultaneous sampling of power system currents and voltages; Event Manager modules provide a hardware interface for sensored or sensorless three-phase inverter control.
On-chip communication peripherals
CAN, I2C, SPI, UART, and external memory interface allow for a full system implementation.
18
C28x CPU • 32-bit fixed-point DSP • RISC instruction set • 8-stage protected pipeline • 32x32 bit fixed-point MAC for single-cycle
32-bit multiply • Dual 16x16 bit fixed-point MACs • Single-cycle instruction execution
Modified Harvard Bus Architecture • Separate data and instruction buss
• Two data buses – one for read, one for write • Enables fetch, read, and write in a single cycle • Essential to maximizing single-cycle MAC
Emulation Logic • Real-time emulation allows interrupt
servicing even when main program is halted • Debug host has direct access to registers
and memory • Multiple hardware debug events and
breakpoints
C28x Core: Bus Structure
Data Address Bus (32)
Data Data Bus (32)
Program Data Bus (32)
Program Address Bus (22)
Execution
R-M-WAtomicALU
Real-TimeEmulation&TestEngine
JTAG
XAR0toXAR7
SPARAU MPY32x32
XTPACC
ALU
Registers Debug
Data Write Bus (32)
Program Write Bus (32)
Memory
Data (4 G * 16)
Program(4 M* 16)
StandardPeripherals
ExternalInterfaces
Register Bus
DP @X
The C28x multiple bus architecture makes better use of the processor cycles: Instruction fetch, decode and execute can happen on the same clock cycle
C28x Core: Protected Pipeline
W
W
W
W
W
W
W
W
Protected Pipeline Order of results are as written in source code Programmer need not worry about the
pipeline
Writes: ?are “free”
F1F2D1D2R1R2XW
Instruction addressInstruction contentDecode instructionResolve operand addressOperand addressGet operandCPU doing “real” workStore content to memory
8-stage pipeline
F1 F2 D1 D2 R1 R2 X
F1 F2 D1 D2 R1 R2 X
F1 F2 D1 D2 R1 R2 X
F1 F2 D1 D2 R1 R2 X
F1 F2 D1 D2 R1 R2 X
F1 F2 D1 D2 R1 R2 X
A
B
C
D
E
F
G
F1 F2 D1 D2 R1 R2 X
F1 F2 D1 D2 R1 R2 X
R1 R2 X W
D2 R1 R2 X W
E & G accesssame address
Many MCUs Shared bus for program and data address and
content Typically results in only one instruction in 4
cycles
Read/Modify/Write and Atomic OperationOffers sufficient hardware resources to efficiently handle control algorithms
WRITE
Registers
LOAD
STORE
READ
Me
mo
rySETC INTM MOV AL,*XAR2AND AL,#1234hMOV *XAR2,ALCLRC INTM
6 words/ 5 cycles
RISC Read/Modify/Write
Atomic Instructions Benefits:Simpler programmingSmaller, faster code Non-interruptible operations
CPU ALU / MPY
SETC INTM AND AL,*XAR2,#1234hMOV *XAR2,ALCLRC INTM
5 words/ 4 cycles
DSP Read/Modify/Write
AND *XAR2,#1234h
2 words/ 1 cycle
C28x Atomic Operation
Atomic
C28x Core: Instruction set for Control
PIE: Peripheral Interrupt Expansion
EV and Non-EVPeripherals(EV, ADC, SPI,SCI, McBSP, CAN)
Internal Sources
External Sources
XINT1
XINT2
PDPINTx
RS
XNMI_XINT13
NMI
C28x Core
INT1
INT13
INT2INT3
INT12
INT14
RS
•••
TINT2TINT1TINT0
PIE (PeripheralInterruptExpansion)
C28x Core: Fast Interrupt Response
INTx
Decode 1st ISRinstruction
Latency
Vector fetchAuto context save
8
Latency: time between when an interrupt occurs to decoding (D2) the first ISR instruction
Minimum latency:
Internal peripherals: 10-14 cycles (100 ns @100MHz)
External signals: 11 cycles (110 ns @ 100 MHz)
Maximum latency: depends on wait states, ready, INTM, etc.
Interrupt jammedinto pipeline
SetIFR
1
SetPIEIFR
1PIE HW SyncInternal Signal
SyncInterrupt Signal
2
External Signal
C28x Core: Fast Interrupt ResponseLatency is Minimized
C2000™ real-time controllers software
ControlSuite™ Software
Software infrastructure and tools for every stage of development and evaluation
Allows customers to focus on differentiation, not basics
Key Functional Areas: Device Support (Bit fields, API Drivers, Examples) Library Repository (Math Library, DSP Library,
Application Library, Utilities) Development Kits (Hardware Package, Software
Examples, Complete System Framework, Graphical User Interfaces)
Debug and Software Tools (IDE, RTOS, Emulation Integrated Development Environment (IDE)
Eclipse-based Code Composer Studio™ IDE supports all
Application Specific Software: Motor Control Software Library
Supports multiple motor types and control techniques (ex: FOC (sensored and sensorless) for ACI, PMSM
Digital Power Software Library Library for both C28x Core and CLA
Tools/Reference Designs
ControlSticksControlCardsEvaluation Kits
Software Highlights
ControlSuiteApplication NotesUsers Guide
Getting Started
25
Development Tools
26
Tools• Code Composer is an Integrated Development
Environment (IDE) similar to MS Visual C++ and built specifically for DSP
• DSP/BIOS is a library of scheduling, instrumentation, and communications functions that provides real-time analysis and RTDXTM (Real-Time Data Exchange)
• Hardware Emulation, and Evaluation tools allow code debug on actual silicon and low-cost analysis of performance in early stages of development cycle
• Code Composer Studio provides an extensible tool plug-in and seamless integration between the host and target DSP tools
10/19/11 27
CCSv4/v5
Tabbed editor windows
Tab data displays togetherto save space
Fast view windows don’t displayUntil you click on them
Perspectives contain separatewindow arrangements dependingon what you are doing.
Customize toolbars & menus
Code Composer Studio v5
CCSv5 is split into two phases 5.0
Not a replacement for CCSv4Targeted at users who are using devices running Linux & multi-core
C6000Addresses a need (Linux debug) that is not supported by CCSv4Available today
5.1 replacement for CCSv4 and is targeted at all usersAvailable fall 2011
Supports both Windows & Linux Note that not all emulators will be supported on Linux
SD DSK/EVM onboard emulators, XDS560 PCI are not supported Most USB/LAN emulators will be supported
XDS100, SD 510USB/USB+, 560v2, BH 560m/bp/lan http://processors.wiki.ti.com/index.php/Linux_Host_Support
Code Composer Studio v4
• Easy to use, Eclipse based IDE: Compiler, linker, more
• Supports all MSP430 MCUs• Enhancements since CCE v3:
– Speed
– Code size improvements
– Auto-updating
• $495 for CCS v4 MCU Edition• Free for apps <16KB• Identical look and feel as Code
Composer Essentials
http://wiki.msp430.com/wiki/index.php?title=Category:Code_Composer_Studio_v4
30
Analyze: Visualize data
– View signals in native format
– Change variables on the flyand see their effects
– Numerous application-specific graphical plots• FFT waterfall• Eye diagram• Constellation plot• Image displays & more
– Requires no additional code
Graphical Signal Analysis:
31
BACKUP
C6701 DSP Block Diagram
33
C672x DSP Block Diagram
34
THANK YOU