nvidia tegra k1 presentation

66
Anurag Sekhsaria

Upload: anurag-sekhsaria

Post on 09-Jan-2017

248 views

Category:

Devices & Hardware


0 download

TRANSCRIPT

Page 1: Nvidia tegra K1 Presentation

Anurag Sekhsaria

Page 2: Nvidia tegra K1 Presentation

Introduction GPU features Applications

Page 3: Nvidia tegra K1 Presentation

Two Versions - Logan and Denver Logan - 32 bit quad-core 4-PLUS-1 ARM

Cortex A15 CPU; upto 2.3GHz; 28 nm process

Logan - Two part nos. available CD575M and CD575MI

Denver - 64 bit dual core based on ARMv8 architecture; upto 2.5GHz

64kB L1;32kB of I-cache and 32kB D-cache 2MB L2 cache OUR FOCUS → LOGAN

Page 4: Nvidia tegra K1 Presentation

Vector Graphics Rasterisation Variable Symmetric Multiprocessing (vSMP) Streaming Multiprocessor (SMX) Dynamic Parallelism Hyper-Q Polymorph Engine Bindless Textures

Page 5: Nvidia tegra K1 Presentation

Vector graphics is the use of geometrical primitives such as points, lines, curves, and shapes or polygons—all of which are based on mathematical expressions—to represent images in computer graphics

Page 6: Nvidia tegra K1 Presentation

Rasterisation (or rasterization) is the task of taking an image described in a vector graphics format (shapes) and converting it into a raster image (pixels or dots) for output on a video display or printer, or for storage in a bitmap file format.

Page 7: Nvidia tegra K1 Presentation

Reason- mobile devices are in standby state for almost 80% time → power saving

4-PLUS-1 CPU → 4 HIGH performance more power intensive cores and 1 LOW power, low performance core

S/W b/w cores done on basis of processing reqd.; intelligent s/w hysteresis

Total Power = Leakage + Dynamic Dynamic Power α Frequency x (Voltage)2

Page 8: Nvidia tegra K1 Presentation

Fast Process = Optimized for high frequency operation, but higher leakage

Low Power Process = Operates at lower frequency with lower leakage

High to low performance crossover at 600MHz

Low power core has peak freq. of 1GHz Both cores are OS transparent Not all 4 high performance cores active;

dynamic enable/disable Note: all the 5 cores cannot be active

simultaneously

Page 9: Nvidia tegra K1 Presentation
Page 10: Nvidia tegra K1 Presentation
Page 11: Nvidia tegra K1 Presentation
Page 12: Nvidia tegra K1 Presentation

Motive is to free the CPU Handle varied workload and use GPU

efficiently Run complex, less structured tasks Any kernel can launch another kernel and

can create the necessary streams,events and dependencies needed to process additional work without the need for host CPU interaction.

Page 13: Nvidia tegra K1 Presentation

GPU core can be used by multiple CPUs Enables multiple CPU cores to launch work

on a single GPU simultaneously Increases GPU utilization and slashing CPU

idle times 32 simultaneous, hardware managed

connections(?)

Page 14: Nvidia tegra K1 Presentation

Hyper-Q....contd

Page 15: Nvidia tegra K1 Presentation

Applications Internet of Things (IoT) Medical Traffic Monitoring Video Analytics

Page 16: Nvidia tegra K1 Presentation
Page 17: Nvidia tegra K1 Presentation

About the SCU The SCU connects one to four Cortex-A9

processors to the memory system through the AXI interfaces.

The SCU functions are to: maintain data cache coherency between

the Cortex-A9 processors initiate L2 AXI memory accesses arbitrate between Cortex-A9 processors

requesting L2 accesses manage ACP accesses.

Snoop Control Unit(SCU)Snoop Control Unit(SCU)

Page 18: Nvidia tegra K1 Presentation

AGENDA AVP MPIO Interrupt Controllers Clock Boot Power States PMC Flow Controller Power Architecture Memory Controller Peripherals

Page 19: Nvidia tegra K1 Presentation

AVP- Audio Video Processor Functions

- Manage initial boot stages- Control and assist hardware audio decoding blocks, BSEA and VCP2- Control and assist hardware video decoder,VDE

256 kB local RAM(IRAM) 8kB cache

Page 20: Nvidia tegra K1 Presentation

Muti- purpose I/O : MPIOMuti- purpose I/O : MPIOEach MPIO consists of: Output driver with:

-Tri state capability- Drive strength controls-Push pull mode, open drain mode or both

Input receiver with Schmitt mode, CMOS mode or both

Weak pull up or pull down They stay in their POR state until changed

by software(bootloader or OS) Default pad drive impedance is 50 ohms

Page 21: Nvidia tegra K1 Presentation

5 types of MPIO pads: ST(Standard) DD(dual driver)- 3.3V tolarant(pull up

resistor) regardless of i/p V....must be set to open drain mode...special pwr seq considerations for this

OD(open drain)-5V tolerant..no push pull driver

CZ(controlled Z)-tigntly controlled Z LV- 1.8V tolerant

MPIO....contd.MPIO....contd.

Page 22: Nvidia tegra K1 Presentation

MPIO....contd.MPIO....contd. Each MPIO can have upto 5 functions- upto

4 SFIO( special funtion wherein they are for peripherals) and 1 as GPIO

Pinmux controller handles MPIO functionality and has one register per MPIO

Page 23: Nvidia tegra K1 Presentation

MPIO....contd.MPIO....contd.

Page 24: Nvidia tegra K1 Presentation

GPIO Controller GPIO controller is divided into 8 banks Each bank handles upto 32 MPIOs Within each bank, GPIOs are arranged as 4

ports of 8 bits each 162 GPIOs in all Individually config. as Input, output,

interrupt source with edge/level triggering Lock bit functionality(optional) ensures GPIO

config. is not modified during runtime, system reset can clear this bit

Page 25: Nvidia tegra K1 Presentation

Unused Pin- PWR Saving Assert tri state and disable input buffer If all pins in a pad control group are unused,

set the drive strengths and slew rates to a minimum

If all pins on a power rail are unused, assert E_NO_IOPOWER for that rail in the PMC registers

Page 26: Nvidia tegra K1 Presentation

Two- vGIC(Virtual generic Interrupt controller) and LIC(Legacy Interrupt controller)

vGIC- For the ARM15 CPUs and LIC for the ARM7 AVP

160 hardware interrupts grouped into slices of 32 where each slice can be configured independently

Page 27: Nvidia tegra K1 Presentation

There is one vGIC per CPU cluster and runs at half the clk freq. of that cluster

vGIC supports 256 interrupts each with a unique ID

Interrupt sources for vGIC Software Generated Interrupts(SGI) Private Peripheral Interrupts(PPI) Shared Peripheral Interrupts(SPI)

Page 28: Nvidia tegra K1 Presentation

SGIs(also called IPIs ie Inter Processor Interrupts) generated by writing to vGIC registers, max. of 16 in no., ID 0 to 15

PPIs are generated by a peripheral that is specific to a CPU. 7 PPIs per CPU. nFIQ and nIRQ provided as pins.(?)

SPIs are external hardware interrupts given via IRQ pins and also by internal SoC units. Level triggered

Interrupt Interrupt Controllers.....contd.Controllers.....contd.

Page 29: Nvidia tegra K1 Presentation

Two external Clks- 32.768kHz(for PMC and RTC) and 12MHz

16 PLLs For saving power by clock gating refer page 78 of

TRM Each peripheral has its own CLOCK_SOURCE

register- 2 bits to select from 4 clk sources and 8 bits for clk divider, 7 for integer and 1 for fraction

CL-DVFS(Closed Loop Dynamic Voltage and Frequency Scaling) register help controlling clock and power supply to FCPU(fast CPU) complex

Page 30: Nvidia tegra K1 Presentation

RTC Maintains sec and ms counters 5 alarm registers Always ON pwr domain Can issue interrupts in LP states Hardware adjusts drifts in clock due to PPM

variations of osc All registers(except BUSY) use 32KHz clk

domain

Page 31: Nvidia tegra K1 Presentation

TIMERS RTC Nvidia Generic Timers (10 nos) WDT- 5 nos: 1 per FCPU and 1 for COP(AVP)

[LP CPU doesn't have WDT?] GIT- ARM CPU Generic Timers(4 timers per

CPU: Secure & Non Secure Physical Timers; Hypervisor Timer and Virtual timer)

TSC-Generic Time System Counter- reference for GIT. Its a part of PMC

Note: any timer can be used as WDT

Page 32: Nvidia tegra K1 Presentation

Power On Reset(POR)- deasserted externally(SYS_RESET_N pin)

Reset by thermal Sensor Watchdog Timer-Two types: Deadman Timer(legacy) WDT-1st expiry interrupt

issued and on 2nd reset but only some subunits WDT2- 1st expiry interrupt issued, on 2nd FIQ, on

3rd CPU reset, on 4th full system reset Software reset- Config bit in PMC; resets whole

chip LP0 wakeup reset- PMC logic controlled

Page 33: Nvidia tegra K1 Presentation

During POR or system reset, reset controller deasserts boot blocks first and then the CPU and COP after 511 osc. clock periods to prevent COP/CPU from talking to these boot devices while itself still being in reset state

Non boot devices are brought into operation from reset by software

At POR bits of registers RST_DEVICES_L/H/U/V/W/X and CLK_OUT_ENB_L/H/U/V/W/X are set by hardware(pg 90 of TRM)

PORPOR

Page 34: Nvidia tegra K1 Presentation

Blocks necessary for the boot are: AVP with its L1 All systems buses like AHB, APB etc Timer RTC NOR flash controller eFUSE GPIO CoreSight- debug controller; one per

cluster

Page 35: Nvidia tegra K1 Presentation

BOOT SOURCES SPI Flash eMMC USB Recovery

Page 36: Nvidia tegra K1 Presentation

Power States Active Suspend(LP1) Deep Sleep(LP0) OFF

Page 37: Nvidia tegra K1 Presentation

Power States..contd.

Page 38: Nvidia tegra K1 Presentation

Power States..contd.LP2 Cluster switch (a variant of LP2)- Cluster1 to 0 switch-Cluster0 to 1 switch :CPU3 ie last of cluster0

initiates this switch

Page 39: Nvidia tegra K1 Presentation

Power States..contd.LP3(per CPU) If CPU is idle for a short time its clock is

ungated ie CPU is halted( we have not pwr gated this CPU only clk is stopped)

Only small wake up logic clk is enabled, others ungated

LP3 exited on detection of IRQ or FIQ Flow controller not needed, clk

gating/ungating internal to FCPUs and LPCPU

Page 40: Nvidia tegra K1 Presentation

AVP Low Power States No specific instruction to halt the AVP However, its memory bus can be put into

WAIT state by flow controller (HALT State) IRQ/FIQ and other wake events can bring

AVP out of halt state During halt, AVP clk is automatically

ungated by hardware AVP is NOT power gated

Page 41: Nvidia tegra K1 Presentation

PWR Management Controller(PMC)

Page 42: Nvidia tegra K1 Presentation

PMC....contd. Provides interface to external PMIC Controls votage switching/transitions as

processor changes power states(eg LP0, LP1)

Processes power/clock requests( acts as slave) from various peripherals

To speed up operation, the PMC register file operates in local peripheral interface bus domain (APB) rather than in the 32KHz clock domain used for PMC processing

Page 43: Nvidia tegra K1 Presentation

Flow Controller- IMPORTANT* Provides sequencing of hardware controlled

CPU power states Handles switching between CPU clusters 0 &

1 and also switching them OFF Receives CPU pwr state requests from CPUs,

sends pwr ON/OFF requests to PMC which power gates/ungates corresponding CPUs

Monitors per CPU interrupts and events to determine CPU wake events

Initiates CPU wake WFI(wait for interrupt) command used to

trigger low power states

Page 44: Nvidia tegra K1 Presentation

Flow Controller....contd.

Page 45: Nvidia tegra K1 Presentation

Flow Controller....contd. Note:

Flow controller has 3 different state machines-

* Main CPU flow controller state machines shown in fig. above

* CPU rail power UP state machine* State machine for COP

Flow controller uses CPU-ID (in MPID register) to identify the cores

Page 46: Nvidia tegra K1 Presentation

Power Architecture There are sense pins for various system

voltage domains which access then continuously

Page 47: Nvidia tegra K1 Presentation

Power Gating and Ungating For CCPLEX PG partitions, sequencing

ensured by hardware when power gating is done via flow controller

For SoC(non CCPLEX) PG partitions, sequencing is done by software

Power gating controller- two in number1. SoC PG controller2. GPU PG controller

Page 48: Nvidia tegra K1 Presentation

SoC PG Controller Controls 8 zones and uses a fixed power

ON/OFF sequence using a fixed set of delays Power OFF seq. is opposite of power ON Same programming register for all zones

Page 49: Nvidia tegra K1 Presentation

GPU PG Controller GPU PG controlled by GPMU unit inside

Kepler GPU Independent of SoC/CPU PG If CPU and GPU share the same voltage rail

(for cost reduction), then software settings should ensure that simultaneous PG of CPU and GPU should not occur to avoid di/dt issues

Page 50: Nvidia tegra K1 Presentation

Fast CPU PG COntroller Used to power gate fast CPU partitions Funtioning similar to SoC PG controller

Page 51: Nvidia tegra K1 Presentation

Power Gating Flow controller uses seperate state machine

for PG each CPU PG done based on CPU-ID Only one request handled at a time to avoid

pwr noise issues Flow controller - PMC inerface has core ID

and not Cluster ID As shown in figure, CPU and non CPU

components can be PG seperately

Page 52: Nvidia tegra K1 Presentation

Power Gating....contd.

Page 53: Nvidia tegra K1 Presentation

Power Gating....contd. At boot,CPU rail is OFF by default. It can be

enabled by AVP using register write to PMC registers

CPU rail can also be switched ON by PMIC (I2C write)

COP can switch OFF the FCPUs CPU and non CPU blocks cannot be switched

simultaneously

Page 54: Nvidia tegra K1 Presentation

Hardware Accelerators NEON ISP

Page 55: Nvidia tegra K1 Presentation

Memory Controller- RAM Only DDR3L and LPDDR3 supported and tested by

NVIDIA x32 bit or x64 bit configuration 4 chip selects 4 individually controllable clock enables 4 individually controllable ODTs Rank 0 size > or = rank 1 3 BA Column width- 9 to 12 bits Row width - 12 to 16 bits DDR3 upto 966MHz Upto 4 GB supported (as per datasheet) 1T and 2T support

Page 56: Nvidia tegra K1 Presentation

Peripherals- USB

Page 57: Nvidia tegra K1 Presentation

Peripherals- USB....contd. USB_OTG supports USB recovery boot USB2 and USB3 support host mode only XUSB supports host mode only

Page 58: Nvidia tegra K1 Presentation

Peripherals- AUDIO Features:

- I2S controllers- 1 S/PDIF controller

Page 59: Nvidia tegra K1 Presentation

Peripherals- Display Controller 2 independent display controllers which can

support 2 independent displays

Page 60: Nvidia tegra K1 Presentation

Peripherals- MIPI CSI 2.0 2 CSI interfaces, each supports upto 4 lanes 2 image sensors can be used

simultaneously (eg stereo apps.) CSI B can support one additional single lane

input

Page 61: Nvidia tegra K1 Presentation

Peripherals- Video Input(VI)

Page 62: Nvidia tegra K1 Presentation

Peripherals- SD/MMC Controller

Page 63: Nvidia tegra K1 Presentation

Peripherals-SD/MMC Controller

Page 64: Nvidia tegra K1 Presentation

Peripherals- SATA & PCIe SATA spec Rev 3.1and AHCI spec. Rev 1.3.1 5 lane PCIe; Gen 1(2.5 GT/s) and Gen

2(5GT/s) supported

Page 65: Nvidia tegra K1 Presentation

Peripherals- I2C 6 I2C interfaces I2C 3.0 spec compliant Modes supported:

- Standard (upto 100kbps)- Fast Mode (upto 400kbps)- Fast Mode plus (upto 1Mbps)- High speed mode (upto 3.4 Mbps)

Page 66: Nvidia tegra K1 Presentation

Peripherals- UART, SPI & Misc. 4 UART interfaces (with RTS and CTS); upto

12.5Mbps baud rate SPI master upto 65MHz and slave upto

45MHz, six CS JTAG 4 PWFM interfaces Serial Transport stream(TS) Controller for

Digital TV