id 112c:mcu architecture evolution now better than … · id 112c:mcu architecture evolution –...

51
Renesas Electronics America Inc. © 2010 Renesas Electronics America Inc. All rights reserved. ID 112C:MCU Architecture Evolution – Now Better than Ever – So who’s the Best? Mark Rootz Sr. Marketing Manager 12 October 2010 Version: 1.2

Upload: dinhthu

Post on 13-Apr-2018

221 views

Category:

Documents


2 download

TRANSCRIPT

Renesas Electronics America Inc.

© 2010 Renesas Electronics America Inc. All rights reserved.

ID 112C: MCU Architecture Evolution – Now Better than Ever – So who’s the Best?

Mark Rootz

Sr. Marketing Manager

12 October 2010

Version: 1.2

2 © 2010 Renesas Electronics America Inc. All rights reserved.

Mark Rootz

Renesas Sr. Marketing Manager, 32-bit MCUs

Definition and Promotion of 32-bit MCUs, N. America

BSEE and MSEE from University of Missouri – Rolla

Seven years at STMicroelectronics

Marketing Manager, STR9 32-bit ARM9 MCU line (France)

Product Marketing Manager, uPSD 8-bit 8051 MCU (San Jose CA)

Product definition, technical marketing, business mgt, infrastructure

Three years at Waferscale Inc

Applications Manager, uPSD MCUs

Tools, software, training, documentation, solutions, silicon validation

Three years at Hypertech Inc

Project Manager and engineering

Automotive powertrain controller software and hardware

Twelve years at McDonnell Aircraft (now Boeing)

Project Manager and engineering

F15/F18 fighter avionics systems engineering (weapons, radar, navigation)

Real-time simulation/test environment for complete avionics suite

Embedded MCUs, MPUs, PLDs software and hardware design

3 © 2010 Renesas Electronics America Inc. All rights reserved.

Renesas Technology and Solution Portfolio

Microcontrollers

& Microprocessors #1 Market share

worldwide *

Analog and

Power Devices #1 Market share

in low-voltage

MOSFET**

Solutions

for

Innovation ASIC, ASSP

& Memory Advanced and

proven technologies

* MCU: 31% revenue

basis from Gartner

"Semiconductor

Applications Worldwide

Annual Market Share:

Database" 25

March 2010

** Power MOSFET: 17.1%

on unit basis from

Marketing Eye 2009

(17.1% on unit basis).

4 © 2010 Renesas Electronics America Inc. All rights reserved.

4

Renesas Technology and Solution Portfolio

Microcontrollers

& Microprocessors #1 Market share

worldwide *

Analog and

Power Devices #1 Market share

in low-voltage

MOSFET**

ASIC, ASSP

& Memory Advanced and

proven technologies

* MCU: 31% revenue

basis from Gartner

"Semiconductor

Applications Worldwide

Annual Market Share:

Database" 25

March 2010

** Power MOSFET: 17.1%

on unit basis from

Marketing Eye 2009

(17.1% on unit basis).

Solutions

for

Innovation

5 © 2010 Renesas Electronics America Inc. All rights reserved.

5

Microcontroller and Microprocessor Line-up

Superscalar, MMU, Multimedia Up to 1200 DMIPS, 45, 65 & 90nm process

Video and audio processing on Linux

Server, Industrial & Automotive

Up to 500 DMIPS, 150 & 90nm process

600uA/MHz, 1.5 uA standby

Medical, Automotive & Industrial

Legacy Cores

Next-generation migration to RX

High Performance CPU, FPU, DSC

Embedded Security

Up to 10 DMIPS, 130nm process

350 uA/MHz, 1uA standby

Capacitive touch

Up to 25 DMIPS, 150nm process

190 uA/MHz, 0.3uA standby

Application-specific integration

Up to 25 DMIPS, 180, 90nm process

1mA/MHz, 100uA standby

Crypto engine, Hardware security

Up to 165 DMIPS, 90nm process

500uA/MHz, 2.5 uA standby

Ethernet, CAN, USB, Motor Control, TFT Display

High Performance CPU, Low Power

Ultra Low Power General Purpose

7 © 2010 Renesas Electronics America Inc. All rights reserved.

7

RX: Performance without Sacrafice

High Performance CPU, FPU, DSC

High Performance CPU, Low Power

Superscalar, MMU, Multimedia Up to 1200 DMIPS, 45, 65 & 90nm process

Video and audio processing on Linux

Server, Industrial & Automotive

Up to 500 DMIPS, 150 & 90nm process

600uA/MHz, 1.5 uA standby

Medical, Automotive & Industrial

High Performance CPU, Low Power

Legacy Cores

Next-generation migration to RX

Up to 165 DMIPS, 90nm process

500uA/MHz, 2.5 uA standby

Ethernet, CAN, USB, Motor Control, TFT Display

Key Attributes

8 © 2010 Renesas Electronics America Inc. All rights reserved.

There are many 32-bit MCU/DSP Architectures

covering varied capabilities

RX Innovation – Single Chip Enablement

PIC32

CortexM3/M4 Coldfire

Kinetis

TMS320

ARM7/9 AVR32

In a single Family

of devices, RX will

Encompass / Exceed

these Capabilities

9 © 2010 Renesas Electronics America Inc. All rights reserved.

A single RX MCU can:

• Interpret a multitude of analog and digital input sources

• Generate precision analog and digital outputs in real time

RX Innovation – Single Chip Enablement

10 © 2010 Renesas Electronics America Inc. All rights reserved.

RX Innovation – Single Chip Enablement

One MCU family for many applications

* Photos are examples of end-products that

could use an RX600 MCU. RX600 MCUs not

necessarily used in these products.

11 © 2010 Renesas Electronics America Inc. All rights reserved.

RX Microcontrollers … Best of the Best

RX MCUs were conceived and designed from the best CPU

architecture and technology available in the industry today

delivering the perfect blend of:

• CPU and Memory Performance

• Analog and DSP Capability

• Power and Memory Efficiency

• Scalability

• Connectivity

• System Cost

“Best of the Best”

12 © 2010 Renesas Electronics America Inc. All rights reserved.

Agenda

Traditional Architectures

32-bit Choices

RX Architecture

Memory Speed vs. Performance

Comparing with Other 32-bit MCUs

Who’s the Best?

Q & A

13 © 2010 Renesas Electronics America Inc. All rights reserved.

Key Takeaways

By the end of this session you will be able to:

Understand Key MCU Architectural Elements

Understand RX Architecture

Compare RX with Other Architectures

Make an Informed Decision

14 © 2010 Renesas Electronics America Inc. All rights reserved.

MCU, DSP, Digital Signal Controller … What’s the Difference?

Traditional MCUs

• Single-Chip Device

• Interrupt Management System

• Fast Interrupt Response

• Efficient General Instructions

• Fine Power Management

• Wide Connectivity Choice

• Rich Supervisory Functions

• Easily Programmed in C

• Simple Low-Cost Tools

• Broad Ecosystem

• Simple Integer Math

Traditional DSPs

• Multi-Chip Solution

• Single-Task Oriented

• Slower Interrupt Response

• Very Specific Instructions

• High Power Consumption

• Limited Connectivity Choice

• Few Supervisory Functions

• Complex Software

• More Expensive Special Tools

• Narrow Selection of 3rd Parties

• Hardware Multiply and Divide

• Saturating Math

• 1-Cycle, wide Multiply-Accumulate

• Barrel Shifters

• Simultaneous Code/Data Access

• Floating Point Unit

DSC Optimum Blend of

MCU and DSP

Traditional MCUs

• Single-Chip Device

• Interrupt Management System

• Fast Interrupt Response

• Efficient General Instructions

• Fine Power Management

• Wide Connectivity Choice

• Rich Supervisory Functions

• Easily Programmed in C

• Simple Low-Cost Tools

• Broad Ecosystem

• Simple Integer Math

Traditional DSPs

• Multi-Chip Solution

• Single-Task Oriented

• Slower Interrupt Response

• Very Specific Instructions

• High Power Consumption

• Limited Connectivity Choice

• Few Supervisory Functions

• Complex Software

• More Expensive Special Tools

• Narrow Selection of 3rd Parties

• Hardware Multiply and Divide

• Saturating Math

• 1-Cycle, wide Multiply-Accumulate

• Barrel Shifters

• Simultaneous Code/Data Access

• Floating Point Unit

15 © 2010 Renesas Electronics America Inc. All rights reserved.

The Evolved DSC, Many Practical Uses

More MCUs are gaining DSC Features

MCUs now have better analog capabilities

Signal processing is a must

Pushes bandwidth limits of traditional MCUs

DSC Applications

Motor Control

Digital Power Management

Audio Codecs

Medical Monitoring

Factory Automation

Even benefits traditional MCU applications

More work in less time

16 © 2010 Renesas Electronics America Inc. All rights reserved.

16/32-bit MCUs and DSCs in the Market

Core Vendor

CPU

Width

(bits)

DMIPS/MHz

of CPU

Core

Available

Frequency

(MHz)

Flash

Speed

(MHz)

Max Flash

Size (KB)

V850ES Renesas 32 1.90 20 - 50 32 1024

ARM CortexM3 Various 32 1.257 60 - 150 <=502 1024

PIC326 Microchip 32 1.56 40 - 80 30 512

ARM7TDMI (Flash) Various 32 0.957 24 - 60 <=308 1024

MCUs

3 Optional FPU 4 MIPS, not DMIPS 5 MIPS, not DMIPS. 80MHz external clock yields 40MIPS

DSCs

1 Core is capable of, no released product yet

2 Based on existing CM3 and CM4 -based MCUs in mass production today

6 Microchip. PIC32MX3XX/4XX Family Data Sheet, DS61143E

7 ARM, “An Introduction to the ARM Cortex-M3 Processor”, Oct 2006

8 Renesas 32-bit Flash MCU market assessment

9 Atmel, AVR32 brochure 7919F-AVR32-07/09/5K 10 Atmel, AVR32 Architecture Document 32000B-AVR32-11/07

11 Atmel, AT32UC3A datasheet 32058G-AVR32-01/09

12 ARM, CortexM4 Features Summary, www.arm.com 13 ARM, Cortex-M4 Technical Reference Manual r0p0 14 ST, STR91xFAxxx datasheet 13495 rev 6

15 TI, Data Manual, TMS320F283xx & TMS320F282xx DSCs, SPRS439H, March 2010

17 Freescale, Data Sheet, 56F8323/56F8123 16-bit DSCs, MC56F8323 rev 17, May 2007

18 Microchip, Data Sheet, dsPIC33FJXXXMCX06A/X08A/X10A, 16-bit DSCs, DS70594B, 2009

16 TI, Data Manual, TMS320F280xx MCus, SPRS584D, June 2010

Core Vendor

CPU

Width

(bits)

DMIPS/MHz

of CPU

Core

Available

Frequency

(MHz)

Flash

Speed

(MHz)

Max Flash

Size (KB)

MAC (result

width bits)

FPU

(width

bits)

SH-2A (Flash) Renesas 32 2.00 100 - 200 100 1024 32 and 64 64

RX600 Renesas 32 1.65 80 - 100 100 2046 48 and 80 32

AVR329,10,11 Atmel 32 1.50 40 - 66 33 512 32, 48, and 64 -

ARM CortexM412,13 Various 32 1.25 1501 <=502 1024 32 and 64 323

STR9 ARM966E14 ST 32 1.10 96 33 2048 32 and 64 -

TMS320 Delfino (Flash)15 TI 32 n/a 100 - 150 27 512 64 32

TMS320 Piccolo16 TI 32 n/a 40 - 60 25 128 64 -

56F8000/830017 Freescale 16 1.004 32 - 60 No spec 512 36 -

dsPIC18 Microchip 16 0.505 60 - 80 No spec 256 40 -

17 © 2010 Renesas Electronics America Inc. All rights reserved.

RX is Best of Both

Mem-to-Mem instructions

73 Inst + DSP + FPU

10 addressing modes

1 to 8 byte instructions

Up to 28% smaller code

• Any inst accesses memory

• Many rich instructions

• Many addressing modes

• Variable instruction formats

• Smaller code size in memory

• Single register set

• Multi-clock instructions

• Less to no pipelining

• Longer interrupt response

• Only load/store mem access

• Few instructions

• Few addressing modes

• Fixed instruction formats

• Larger code size in memory

• Multiple register sets

• Single-clock instructions

• Highly pipelined

• Faster interrupt response

CISC and RISC

16 x 32-bit registers

One clock per instruction

5-stage pipeline

5-clock interrupt response

Plus it has an FPU.

Let’s Build an RX…

Traditional CISC Complex Instruction Set Computer

GOAL: Small Memory Footprint

Traditional RISC Reduced Instruction Set Computer

GOAL: 1 Clock per Instruction

18 © 2010 Renesas Electronics America Inc. All rights reserved.

Typically

SRAM

Typically

Flash Memory

RX Flash is

10 nsec, or

100 MHz

zero-wait

RX SRAM is

also 10 nsec

RX600 CISC CPU

5-STAGE PIPELINE

5 STAGES OF PIPELINE

F = FETCH INSTRUCTION

D = DECODE INSTRUCTION

E = EXECUTE INSTRUCTION

M = READ OR WRITE MEMORY

W = WRITE BACK TO REGISTER

Inst 64bit path Instruction

Data 32bit path Operand

(Data)

ENHANCED HARVARD

ARCHITECTURE WRITE BUFFER

For Slow Memory

PRE-FETCH

QUEUE

(PFQ)

Holds 4 to 32

Instructions

for Slower

Memory Memory Interface

64

32

100MHz CPU Core

1.65 DMIPS/MHz

16 x 32bit

General

Purpose

Registers

9 x 32bit

Control

Registers

RX Architecture … CPU Core and Pipeline

32bit

Floating

Point

Unit

16x16 or 32x32 MAC,

48bit or 80bit Result

32 x 32 DIV or MULT,

32bit or 64bit Result

Memory

Protect

Unit

Interrupt

Control

On-Chip

Debug

ENHANCED HARVARD

ARCHITECTURE

5-STAGE PIPELINE

64

bit

s

64

bit

s

64

bit

s

64

bit

s

Buffer

Only for

Writes

F D E M W

TIC

K

F D

F

TIC

K

E

D

F

TIC

K

M

E

D

F

TIC

K

W

M

E

D

F

TIC

K

F

W

M

E

D

TIC

K

D

F

W

M

E

TIC

K

E

D

F

W

M

TIC

K

M

E

D

F

W

TIC

K

E E

E E

E

W

M

E

D

F

Achieves One

Clock-Per-Instruction (CPI)

E E

19 © 2010 Renesas Electronics America Inc. All rights reserved.

RX Architecture … Memory Interface

SRAM,

100MHz Access

64 bits

Flash Memory,

100MHz Access

64 bits

100 MHz Flash and SRAM means

zero wait-state code and data access

PFQ minimizes stalls from slower

memory, such as external memory

Bus master of Internal Bus 1 is the CPU

Next we look at Internal Bus 2…

External

Bus Pins

for CPU

External

Bus

Controller

(BSC)

32 bits

Internal Main Bus 1 32 bits

32 bits

Bus Bridge

Peripherals

RX600 MCU

RX600

CPU

100MHz

PIPELINE PFQ

BUFFER

64b INST

32b DATA

Bus Master of Internal Main Bus 1

BUS MATRIX

20 © 2010 Renesas Electronics America Inc. All rights reserved.

CNTL

Communication (USB, CAN, SCI, SPI, I2C)

Timers (MTU, TPU, TMR, CMT)

Analog (DAC, ADC, PGA)

GPIO

System Control (DMA, E2P, ICU, LVD,

RTC, WDG, CLKS)

Multiple Peripheral Busses to Spread Bandwidth Loading

CN

TL

CN

TL

CN

TL

Internal Main Bus 2 32 bits

DTC (bus master)

Bus Bridge

DMAC (bus master)

Ethernet DMAC (bus master)

RX Architecture … System Interface

RX600

CPU

100MHz

PIPELINE PFQ

BUFFER

64b INST

32b DATA

External

Bus Pins

for CPU

Bus Master of Internal Main Bus 1

64 bits

64 bits

Bus Bridge

EXDMA (external bus master)

32 bits

Internal Main Bus 1 32 bits

32 bits

RX600 MCU

BUS MATRIX

SRAM,

100MHz Access

Flash Memory,

100MHz Access

External

Bus

Controller

(BSC)

On

e E

xte

rnal D

ev

ice

A

no

ther

Exte

rnal D

ev

ice

Ethernet MAC

2K

FIFO

FIFO

2K

21 © 2010 Renesas Electronics America Inc. All rights reserved.

1.5

DMIPS per MHz

1.0

RX 1.65 DMIPS/MHz

Note: Dhrystone 2.1 numbers for ARM processors taken from www.arm.com

ARM7

ARM9

Cortex-M3

Cortex-M4

RX CPU Core Performance

22 © 2010 Renesas Electronics America Inc. All rights reserved.

Up to 43% Power Reduction

Low power design techniques

• Clock gating

• Low power HVT transistors in slower paths

• Power gating

Low power modes

• 500mA* per MHz in Run Mode

• All Peripherals ON

• Four Low-Power Modes

• Sleep

• All-Module Stop

• Standby

• Deep Standby

•2.5mA* in Deep Standby

• RX63x, RTC ON

Milliwatts* per DMIPS

2.0 1.0

43% less

= RX600

Note: Derived from IDD specifications stated in product datasheets

= A Cortex-M3 based MCU

* Typical Conditions, 3.3V and 25oC, all peripheral clocks on

23 © 2010 Renesas Electronics America Inc. All rights reserved.

RX600 Instruction Set

= Single clock instruction

24 © 2010 Renesas Electronics America Inc. All rights reserved.

Instruction Length (bytes)

List of Instructions Number of Instructions

1 NOP, RTS, BRK 3

1-3 BCnd 1

1-4 BRA 1

2 RMPA, ROLC, RORC, SAT, SATR, POP, POPC, POPM, PUSHC, PUSHM, JMP, JSR, SCMPU, SMOVB, SMOVF, SMOVU, SSTR, SUNTIL, SWHILE, CLRPSW, RTE, RTFI, SETPSW, WAIT

24

2-3 ABS, NEG, NOT, SHAR, SHLL, SHLR, RTSD 7

2-4 MOVU, PUSH, BSR 3

2-5 SUB, BCLR, BSET, BTST 4

2-6 ADD, AND, CMP, MUL, OR 5

2-8 MOV 1

3 ROTL, ROTR, REVL, REVW, INT, MVFC, MACHI, MACLO, MULHI, MULLO, MVFACHI, MVFACMI, MVTACHI, MVTACLO, RACW

15

3-5 FTOI, ROUND, SCCnd, BMCnd, BNOT 5

3-6 SBB, ITOF, XCHG 3

3-7 DIV, DIVU, EMUL, EMULU, MAX, MIN, TST, XOR, FADD, FCMP, FDIV, FMUL, FSUB, MVTC

14

4-6 ADC 1

4-7 STNZ, STZ 2

6% have

minimum

instruction length

of 1 byte

49% have

minimum

instruction length

of 2 bytes

42% have

minimum

instruction length

of 3 bytes

Total = 89 instructions MOV instruction length is 2-8 bytes

RX Instruction Set Summary and Size

25 © 2010 Renesas Electronics America Inc. All rights reserved.

Instruction length (bytes)

1 4 7 3 2 5 8 6

MOV instruction example

Rd opcode MEMMEM [Rs] [Rd] Rs

Function Source Destination

Rd Rs opcode MEMREG [Rs] Rd

Rd Rs opcode REGMEM Rs [Rd]

#IMM:8 Rd opcode #IMM:8 [Rd]

Rd opcode #IMM:16 #IMM:16 [Rd]

Rd Rs opcode REGREG Rs Rd

opcode Rd #IMM:32 IMMREG #IMM:32 Rd

opcode Rd #IMM:32 #IMM:32 [Rd]

IMMMEM

Rd #IMM:32 dsp:16 opcode #IMM:32 dsp:16[Rd]

Direct Memory-to-Memory operation

26 © 2010 Renesas Electronics America Inc. All rights reserved.

Example: Moving data in memory

Direct Memory-to-Memory operation allows RX to avoid lengthy load/store operations and results in smaller code size

MOV [r1], [r2]

RX

Code size = 2 bytes

Number of Cycles = 3

2 bytes LDR r3, [r1]

STR r3, [r2] 2 bytes

Traditional RISC

Code size = 4 bytes

Number of Cycles = 4

2 bytes

27 © 2010 Renesas Electronics America Inc. All rights reserved.

Up to 28% Code Size Reduction

Code size (relative)

1.0

28% less

= RX600

= A Cortex-M3 based MCU

19% less

17% less

25% less

25% less

Note: Internal benchmark test, your results may vary

Motor control

Data communication

Data conversion

Real-time control

System control

28 © 2010 Renesas Electronics America Inc. All rights reserved.

RX makes Out-of-Order Instruction Decisions

F D E M M WB

F D S S WB E

F D S S WB E

1) MOV [R1], R2

2) ADD R4, R5

3) SUB R4, R5

Instructions

Instructions 2) and 3) delayed, waiting on 1)

WB E

D WB E

F D E M M WB

F D

F

1) MOV [R1], R2

2) ADD R4, R5

3) SUB R4, R5

Delay is Eliminated

S S

S S

• Is possible when there are no dependencies

• Multiple WB within same clock cycle OK if destination is different

CPU Clock

Fetch

Decode

Execute

Memory

Write Back

Stall

29 © 2010 Renesas Electronics America Inc. All rights reserved.

Resolve Interrupt,

PC & PSW to Backup Regs

PC&PSW from B/U

Regs, Return

Optional

Push Gen

Regs to Stack

ISR

Optional Pop

Gen Regs

from Stack

RX Fast Interrupt

5 clks typ. 3 clks

Interrupt Handling IRQ

RX Normal Interrupt

7clks typ.

Resolve

Interrupt

PC & PSW

to Stack

Ret-

urn

POP PC & PSW from

Stack

Optional

Push Gen

Regs to Stack

ISR

Optional Pop

Gen Regs

from Stack

6 clks

Resolve Interrupt,

PC & PSW to Backup Regs

Return ISR

5 clks typ. 3 clks

RX Fast Interrupt plus Gen Register Usage

General CPU

Registers

R0 R1 R2 R3 R4 R5 R6 R7 R8 R9

R10 R11 R12 R13 R14 R15

= Automatic by CPU = Done by Firmware

Save 5 clocks

Save many clocks

* ARM, Technical Reference Manuals: CortexM3 r1p1, CortexM4 r0p0

30 © 2010 Renesas Electronics America Inc. All rights reserved.

Interrupt Handling IRQ

Resolve Interrupt,

PC & PSW to Backup Regs

Return ISR

5 clks typ. 3 clks

RX Fast Interrupt plus Gen Register Usage

Resolve Interrupt,

PC & PSW to Backup Regs

PC&PSW from B/U

Regs, Return

Optional

Push Gen

Regs to Stack

ISR

Optional Pop

Gen Regs

from Stack

RX Fast Interrupt

5 clks typ. 3 clks

= Automatic by CPU = Done by Firmware

Resolve Interrupt,

and Push CPU State

and 5 Regs to Stack

Pop CPU State and 5 regs

from Stack, and Return ISR

12 clks 12 clks

ARM Cortex M3 or M4*

* ARM, Technical Reference Manuals: CortexM3 r1p1, CortexM4 r0p0

Save up to 16 clocks

RX Typical Interrupt

7clks typ.

Resolve

Interrupt

PC & PSW

to Stack

Ret-

urn

POP PC & PSW from

Stack

Optional

Push Gen

Regs to Stack

ISR

Optional Pop

Gen Regs

from Stack

6 clks

31 © 2010 Renesas Electronics America Inc. All rights reserved.

Floating-Point Unit

Dedicated Data Registers

General Registers

Typical Operation

Load/Store

No Load/Store Instructions Needed

RX Operation

General Registers

Floating-Point Unit

FPU directly accesses General Registers

Higher FPU performance

Smaller code size

32 © 2010 Renesas Electronics America Inc. All rights reserved.

FPU Applications

© 2010 Renesas Electronics America Inc. All rights reserved. 32

Pressure regulator

Pump control

Thermo couple conversion

Motion Control

Motor Control

Flow Control

Digital filtering

33 © 2010 Renesas Electronics America Inc. All rights reserved.

FPU benefits: Two examples

© 2010 Renesas Electronics America Inc. All rights reserved. 33

1- Motor Control

FPU removes limitations due to scaling or saturation

Improves accuracy for motor position and speed

Increases motor efficiency

Easy code development and maintenance. Write formulas directly into C code

Reduces CPU loading

Reduces code size

2- Thermocouple Conversion

0

100

200

300

400

500

Look Up

Table

Fixed Point

Math

Software

Library

FPU

Size in Byte

Sensorless vector motor control compiled for

Fixed Integer vs Floating Point FPU

FPU provides the best combined execution time and code size

34 © 2010 Renesas Electronics America Inc. All rights reserved.

FPU Comparison

The FPU provides a dramatic increase in performance and code efficiency over math libraries.

Example: Conversion of thermocouple reading to temperature

Thermocouple formula: Temperature = S (an * xn)

n = 0 ~ 5; a0 ~ a5 are constants; x is A/D reading

MCU Operating Frequency

(MHZ)

CPU Cycles (count)

Actual Execution

Time (usec)

Execution Time with

Ideal Memory (usec)

Code Size (bytes)

RX600 100 94 0.94 0.94 48

A CM3-based MCU

72 1130 15.7 14.7 892

> 16x

Faster > 18x

Smaller

• RX610 MCU: Renesas Compiler v0.02 Alpha, Size Max

• A CM3-based MCU: IAR Compiler v4.42A, Size Max

35 © 2010 Renesas Electronics America Inc. All rights reserved.

DSP Arithmetic Functions

Repeated Multiply and Accumulate (RMPA)

16-bit

16-bit

General register

General register 48-bit

Multiply-Accumulate unit

Multiply and Accumulate (MAC)

Memory (coeffic-ients)

32-bit

32-bit

80-bit

Multiply-Accumulate unit

Memory (ADC

Samples) Accumulate Accumulate Accumulate Accumulate Accumulate Accumulate Accumulate

36 © 2010 Renesas Electronics America Inc. All rights reserved.

60 MHz

2 wait cycles

IF D E M WBIF D E M WB

IF D E M WBIF D E M WB

1 wait cycle

IF D E M WBIF D E M WB

IF D E M WBIF D E M WB

30 MHz

no wait

IF D E M WBIF D E M WB

IF D E M WBIF D E M WB

D E M WBD E M WB

D E M WBD E M WB

W

W

D E M WBD E M WB

D E M WBD E M WBW

W W

W

100 MHz

Pro

cessin

g p

erf

orm

ance

MCU

frequency

RX with 100 MHz

Flash

Competing MCU with 30 MHz

Flash

Performance and Flash Speed

37 © 2010 Renesas Electronics America Inc. All rights reserved.

FIR Filter, RX600 and a CM3-based MCU

0.000

0.500

1.000

1.500

2.000

2.500

3.000

3.500

4.000

4.500

5.000

16 24 32 40 48 56 64 72 80 88 96 100

MCU Operating Frequency (MHz)

Co

mp

leti

on

Tim

e,

10

0 i

tera

tio

ns

of

FIR

Alg

ori

thm

(u

se

c)

A CM3 MCU Theorectical (73 CPU cycles per Iteration)

A CM3 MCU Actual w/ Memory Acceleration

A CM3 MCU Actual w/o Memory Acceleration

RX600 Theorectical (46 CPU cycles per Iteration)

RX600 Actual

DSP and Benefit of 10nsec Flash

• Theoretical performance with “No-Wait Memory” for this CM3 MCU

• Performance loss due to Flash slower than CPU demand on a CM3 MCU

• Mitigation effect of Memory Acceleration on a CM3 MCU

• Theoretical performance with “No-Wait Memory” for RX600

• Theoretical is Identical to Actual performance for RX600 because of 10nsec Flash

• 8 Tap FIR Filter, 16 x 16 to 32bit accumulate

• RX610 MCU: Renesas compiler v1.0, Speed 2, macro used for RMPA

• A CM3-based MCU: IAR Compiler v5.40.0.315, Speed Max

Lower

is

Better

1 wait

state

2 wait

states

Better, but

delay

remains

RX has 63%

better

performance

8 Tap FIR Filter

16 x 16 to 32-bit accumulate

38 © 2010 Renesas Electronics America Inc. All rights reserved.

Flash-MCU History and Speed

1990 2000 2010 Year

Op

era

tin

g F

req

uen

cy (

MH

z)

100

10

2005 1995

Competitors

(0.15um) (90nm)

(40nm)

(0.8um)

(0.5um)

(0.35um)

(0.18um)

Flash-MONOS

MONOS for EEPROM & IC-card

MCU Freq.

Renesas Flash Freq.

General Flash Freq.

Renesas MONOS reaches

100MHz single cycle access

Source: Renesas

39 © 2010 Renesas Electronics America Inc. All rights reserved.

50

Max MHz

100

200

2010Existing MCUs 2011 2012

FamilyFamily

RX600 SeriesRX600 Series32 Bit, 90nm32 Bit, 90nm

Extreme High PerformanceExtreme High Performance

High EfficiencyHigh Efficiency

RX200 SeriesRX200 Series32 Bit, 130 nm32 Bit, 130 nm

High PerformanceHigh Performance

Low Power / Low VoltageLow Power / Low Voltage

RX600RX60040 nm40 nm100MHz+100MHz+

H8SXH8SX32 Bit32 Bit

R32CR32C32 Bit32 Bit

M16CM16C16 Bit16 Bit

H8SH8S16 Bit16 Bit

RX Family Roadmap

40 © 2010 Renesas Electronics America Inc. All rights reserved.

RX600 System On A Chip

41 © 2010 Renesas Electronics America Inc. All rights reserved.

RX600 Series Portfolio

LGA64

5x5mm

0.5mm

LQFP64

10x10mm

0.5mm

LQFP80

14x14mm

0.65mm

LGA85

7x7mm0.

65mm

LQFP100

14x14mm

0.5mm

LQFP112

20x20mm

0.65mm

LQFP144

20x20mm

0.5mm

LGA145

9x9mm

0.65mm

BGA176

13x13m

m0.8mm

42 © 2010 Renesas Electronics America Inc. All rights reserved.

RX600 Series - 100Mhz Extreme Performance

RX Migration Between Series

Pins

Flash

32 176

32KB

2MB

RX200 Series - 50Mhz Low Power / Low Voltage

RX600: 500uA/MHz (all peripherals on), 2.5uA RTC Deep Standby, 2.7V to 3.6V

RX200: 200uA/MHz (all peripherals on), <1uA RTC Deep Standby, 1.62V to 3.6V

Common CPU & Peripherals

48 64 80/85 100 112 144/145

1MB

64KB

128KB

256KB

384KB

512KB

Migration

Within RX

Family

43 © 2010 Renesas Electronics America Inc. All rights reserved.

RX Solutions

Motor Control, RX62T

Drive Sensorless PMAC Motor

Field Oriented Control, 3-phase

High integration, low system cost

Direct Drive TFT-LCD, RX62N

Drive 4.3” Color WQVGA TFT-LCD by RGB

Full basic graphic library and demo

Source code included

WiFi

802.11b/g/n WiFi, RX62N

Simple SPI connection to WiFi module

Kit contains driver and examples

Very low power 802.11b/g/n connectivity

Connectivity, RX62N RDK

Ethernet, USB Host/Device/USB, CAN

Many surrounding functions/features

Source code, built-in JTAG debugger

See www.am.renesas.com/rx for details

44 © 2010 Renesas Electronics America Inc. All rights reserved.

RX Tools for Solutions See www.am.renesas.com/rx for details

Hi-Speed Trace • JTAG, USB-HS, plus 6 lines connection • Trace depth: - 2M branches/cycles • SRAM monitor, 4 KB

On-Chip Debug • JTAG and USB-HS connection • Program Flash • Single step execution • 256 Software break points • 12 Hardware breakpoints • PC and data breakpoints • On-chip Trace - 256 branches/cycles • Read/Write SRAM • Read/Write C variables • Performance monitoring • Non-intrusive • Hot-plug capable

E1

E20

$99*

$995*

HEW4 Plus Renesas C/C++ $1200*

Single Integrated Development & Debugging Environment

HEW4 also supports GNU-RX C/C++ compiler, all at $0

Wide 3rd Party Support for IDE, Compilers, Middleware, RTOS:

• Micrium, IAR, Segger, CMX, KPIT Cummings, freeRTOS, and more

* Suggested resale price when sold individually

45 © 2010 Renesas Electronics America Inc. All rights reserved.

Feature Unit RX600 CortexM31 CortexM42 AVR32A3 PIC324

CPU Type - CISC, DSC RISC, MCU RISC, DSC RISC, DSC RISC, MCU

Performance DMIPS/MHz 1.65 1.25 1.25 1.50 1.50

Pipeline Length Stages 5 3 3 3 5

Inst Lengths Bytes 1 to 8 2 and 4 2 and 4 2 and 4 2 and 4

# of Instructions For CPU,DSP 80, 9 97,3 97,83 115,8 129, 2

FPU # of instructions Yes, 8 No, 0 Option, 25 No, 0 No, 0

General Regs # of regs, bits 15 x 32 12 x 32 12 x 32 13 x 32 27 x 32

Min Intr Latency CPU Clocks 7 or 5 12 or 6 12 or 6 12 or 2 12 instructions

MPU - Option Option Option Option No

Bit Manipulation - Yes Yes Yes Yes Yes

Debug Connection JTAG or

2-wire

JTAG or

2-wire

JTAG or

2-wire JTAG JTAG

Hi-Speed Trace Connection 6-wire 6-wire 6-wire 12-wire 4,8,or 16-wire

Comparing other 32-bit CPU Architectures

1 ARM, CortexM3 Technical Reference Manual Revision:r1p1, ARMv7-M Architecture Reference Manual DDI 0403C_errata_v3 2 ARM, CortexM4 Technical Reference Manual Revision:r0p0, ARMv7-M Architecture Reference Manual DDI 0403C_errata_v3 3 Atmel, AVR32C Technical Reference Manual 32002A-AVR32-03/07 4 Microchip, PIC32MX Family Reference Manual DS611271C. MIPS Technology, MIPS32 Architecture for Programmers Vol II: MIPS32 Instruction Set, rev 2.5, MIPS32 MK4 Processor Core Datasheet, Rev 02.01

References:

46 © 2010 Renesas Electronics America Inc. All rights reserved.

Who’s the Best? You Decide based on what you have seen.

To help your decision, here are publicly released benchmark results based on widely acknowledged CoremarkTM from EEMBC.

*Vendor *Processor Type *CPU Freq (MHz)

*CoreMark / MHz

*CoreMark *Compiler Comment

Microchip PIC32MX360F512L MCU 30 2.599 78 GCC 4.3.2 Only 30 MHz operation

Microchip PIC32MX360F512L MCU 80 2.297 184 GCC 4.3.2 Negative effect of slow

Flash

Renesas RX610 DSC 100 2.240 224 GNURX 201009

Full speed with no loss of performance

TI Stellaris LM3S9B96

CortexM3 MCU 50 1.921 96

Keil V4.0.0.524

ST STM32 CortexM3

120MHz. 90nm MCU 120 1.905 229

KEIL 4.0.0.524

Has new “ART” memory accelerator

Microchip PIC24HJ128GP202 MCU 40 1.862 74 GCC4.0.3

ST STM32F103RB

CortexM3 MCU 24 1.797 43 GCC 4.4.1

NXP LPC1768 MCU 100 1.753 175 ARMCC 4.0

TI Stellaris LM3S9B96

CortexM3 MCU 80 1.596 127

Keil V4.0.0.524

Negative effect of slow Flash

ST STM32F103RB

CortexM3 MCU 72 1.504 108 GCC 4.4.1

Negative effect of slow Flash

Freescale ColdFire MCF52233 MCU 60 1.038 62 IAR EW 1.20

Freescale ColdFire MCF5274 MCU 150 0.773 115 GCC4.1.1

*Source: www.coremark.org as of 1 Sep 2010

Sorted by

CoreMark/MHz

47 © 2010 Renesas Electronics America Inc. All rights reserved.

Who’s the Best? Now sorted by raw Coremark, not Coremark/MHz

*Vendor *Processor Type *CPU Freq (MHz)

*CoreMark / MHz

*CoreMark *Compiler Comment

ST STM32 CortexM3

120MHz. 90nm MCU 120 1.905 229

KEIL 4.0.0.524

Much Higher CPU freq needed for same result

Renesas RX610 DSC 100 2.240 224 GNURX 201009

Positive effect of efficient CPU and fast Flash

Microchip PIC32MX360F512L MCU 80 2.297 184 GCC 4.3.2

NXP LPC1768 MCU 100 1.753 175 ARMCC 4.0

TI Stellaris LM3S9B96

CortexM3 MCU 80 1.596 127

Keil V4.0.0.524

Freescale ColdFire MCF5274 MCU 150 0.773 115 GCC4.1.1

ST STM32F103RB

CortexM3 MCU 72 1.504 108 GCC 4.4.1

TI Stellaris LM3S9B96

CortexM3 MCU 50 1.921 96

Keil V4.0.0.524

Microchip PIC32MX360F512L MCU 30 2.599 78 GCC 4.3.2

Microchip PIC24HJ128GP202 MCU 40 1.862 74 GCC4.0.3

Freescale ColdFire MCF52233 MCU 60 1.038 62 IAR EW 1.20

ST STM32F103RB

CortexM3 MCU 24 1.797 43 GCC 4.4.1

*Source: www.coremark.org as of 1 Sep 2010

Sorted by

CoreMark/MHz

48 © 2010 Renesas Electronics America Inc. All rights reserved.

Questions

1: What is the read access time of RX600 Flash Memory?

10 nsec (100MHz) across entire voltage range 2.7V to 3.6V

1.65 DMIPS/MHz, and 1mW/DMIPS

2: How many DMIPS/MHz does RX600 produce, and how many mW/DMIP does it consume?

3: What does the RMPA instruction do?

Repeat Multiply Accumulate. One instruction automatically multiplies data from

two different memory arrays, and adds result to 80-bit accumulator, then

post-increments to next two values. Repeats until specified array length is

met. DSP!!

49 © 2010 Renesas Electronics America Inc. All rights reserved.

Innovation – Single Chip Enablement

One MCU Family for many applications

See www.am.renesas.com/rx for details

50 © 2010 Renesas Electronics America Inc. All rights reserved.

Feedback Form

Please fill out the feedback form!

If you do not have one, please raise your hand

© 2010 Renesas Electronics America Inc. All rights reserved.

51

Thank You!

www.am.renesas.com/rx

Renesas Electronics America Inc.