1 progress report (4/14/04) virtual prototyping of advanced space system architectures based on...

Progress Report (4/14/04)Progress Report (4/14/04)

Virtual Prototyping ofVirtual Prototyping ofAdvanced Space System ArchitecturesAdvanced Space System Architectures

based on RapidIObased on RapidIOSponsor:Sponsor:

Honeywell Space Systems, Clearwater, FLHoneywell Space Systems, Clearwater, FL

Principal Investigator:Principal Investigator:

Dr. Alan D. GeorgeDr. Alan D. George

OPS Graduate Assistants:OPS Graduate Assistants:

David Bueno, Ian TroxelDavid Bueno, Ian Troxel

RA Graduate Assistants:RA Graduate Assistants:

Chris Conger, Adam LekoChris Conger, Adam Leko

Modeling and Simulation (MS) GroupModeling and Simulation (MS) Group

HCS Research Laboratory, ECE DepartmentHCS Research Laboratory, ECE Department

University of FloridaUniversity of Florida

Presentation OutlinePresentation Outline

Project Motivation and Goals Project Task Outline

– Literature Review– RIO Component Modeling– GMTI Modeling– Systems Modeling– Test Plan

Conclusions Future Collaboration Possibilities

Project Motivation and GoalsProject Motivation and Goals

Determine the optimal means by which to develop RapidIO for space systems– Perform RIO switch, board and system tradeoff studies– Identify limitations of space-based RIO design– Determine design feasibility using SBR case study– Provide assistance for Honeywell proposal efforts

Lay the groundwork for future Honeywell system prototyping

Project TasksProject Tasks

Literature Review– RIO spec, RIO components, SBR, misc.

RIO Component Modeling– Layers, endpoints, switches, processors, etc.– GMTI traffic models, memory boards, backplanes

RIO System Modeling– Proposed systems models, test plan

Simulation Experiments Data Analysis and Report

Literature Review:Literature Review:OverviewOverview

Scope– Encompass major issues surrounding RapidIO-based systems– Goal: find previous work related to issues in this project

Overview of results– RapidIO information and specifications– RapidIO implementation spec sheets– General information on space-based radar (GMTI and SAR)– Switch issues

Multicasting Load balancing Shared-memory switch issues (buffer management)

Deliverable available on Honeywell HCS page:– http://www.hcs.ufl.edu/~leko/Honeywell/

RapidIO Specifications, ExtensionsRapidIO Specifications, Extensions

Protocol Issues, Extensions Not specified or not supported by

RapidIO protocol Main topics considered thus far include:

– Dynamic load balancing Papers regarding dynamic load balancing

schemes in other protocols, e.g. InfiniBand

– RapidFabric Protocol Extensions End-to-end flow control, support for

thousands of nodes Data streaming and traffic management “Next Generation Physical Layer”

– Multicast algorithms ACM paper discusses multicasting in

switch-based parallel systems Dr. Sarp Oral’s dissertation Other papers discuss general multicast

issues, deadlock avoidance RapidFabric specs also address multicast

– System-level failure recovery

RapidIO Specs All information by RapidIO Trade

Association – MM, MP, GSM logical layers– Common transport layer– Parallel, serial physical layer– Error management extensions

Specifications extremely detailed, describes all requirements for each layer and feature

Used as main reference for model design Motorola technical whitepaper

– Brief but thorough discussion of all specifications above

– Also good reference, much shorter than official specs with many diagrams

– [RIO Spec] techwhitepaper_rev3

Literature Review:Literature Review:Switch IssuesSwitch Issues

Multicasting– Switches being considered are store + forward; no deadlock possible– Modify switch internal routing table/Command and Status Registers to

support multi-valued destinations– May also use RapidIO’s built-in “Multicast event” if no payload is

needed Buffer management

– Static thresholds for different traffic classes good, but will require tuning for load

– Dynamic thresholds better and require only slightly more logic– Pushout best, but requires lots of additional logic (and silicon)

Load balancing– Simple method: static load balancing based on routing tables– Methods proposed for use with InfiniBand may be adapted for RapidIO

with extensions to protocol

Literature Review:Literature Review:Space-Based Radar (GMTI)Space-Based Radar (GMTI)

Found solid background information on GMTI (Ground-Moving Target Indicator) and STAP (Space-Time Adaptive Processing)

Several different GMTI algorithm variants– Pre-Doppler– Post-Doppler– Post-Doppler PRI-staggered

Different partitioning methods proposed– Parallel pipelined– Staggered iterations

Interesting note: Honeywell baseline spec for input data size much larger than those used in existing literature

RapidIO Component Modeling: RapidIO Component Modeling: OverviewOverview

RapidIO packet formats– Message-passing requests and responses– Flow control symbols– Packet control symbols

RapidIO endpoint model– Message-passing logical layer

Expandable to include I//O logical layer with minimal effort– Common transport layer– Parallel physical layer

RapidIO central memory switch model– Common transport layer– Parallel physical layer

GMTI traffic models– Memory board model sources/sinks traffic

Statistics gathering– RIO request stats (average latency and BW)– RIO response stats (average latency and BW)– Additional statistics components to be developed as

necessary

RapidIO Component Modeling: RapidIO Component Modeling: Endpoint ModelEndpoint Model

Key Features– Message-Passing Logical Layer– Common Transport Layer– Parallel Physical Layer

Receiver-controlled flow control Transmitter-controlled flow control Error detection and recovery Priority scheme for buffer management

Key Adjustable Parameters– Packet assembly delay– Packet disassembly delay– Clock frequency– Link width– Input queue length– Output queue length– Four priority thresholds

Determine the maximum number of packets that may be in a buffer to still accept a packet of a given priority Example: If threshold for priority 0 packets is 4, incoming priority 0 packets will be rejected if there are 5 or

more packets currently in the input buffer– Number of device ID bytes in packet (affects packet size and max number of devices in system)– Buffer memory copy delay per byte

Note: As there is no “link model,” parameters such as clock frequency and link width are incorporated into the endpoint model.

High-level Endpoint Model

RapidIO Component Modeling: RapidIO Component Modeling: Endpoint Verification testsEndpoint Verification tests

2-node BW/latency results shown to right → 2 test cases: single packet, continuous packet stream

Single-packet test– Average latency, BW for all possible packet

sizes– Theoretical BW limit: 4 Gbps– Latency determined by:

transmit time + packet disassembly time

– Error detection and correction Insert packet CRC error, control symbol parity bit

error, ack error Observe/verify that link partners correct error Also insert error in control symbol related to error

recovery, verify behavior

Packet-stream test (256-byte packets)– Around 3.5 Gbps data generation rate, link

becomes saturated– Once saturated, average latency increases with

Single Packet BW (Rx Controlled)

32 64 128 256

Packet Size

ps Actual BW (Rx)

Eff. BW (Rx)

Stream Test

2.980 3.278 3.576 3.874 4.172

Data Generation Rate (Gbps)

RapidIO Component Modeling: RapidIO Component Modeling: Central-Memory Switch ModelCentral-Memory Switch Model

Key Features– Selectable cut-through or store-

and-forward routing– Non-blocking architecture– Routes packets based solely on

destination ID (read from a routing table file) as per RIO spec

– RIO Common Transport Layer– RIO Parallel Physical Layer

RapidIO Component Modeling: RapidIO Component Modeling: Central-Memory Switch ModelCentral-Memory Switch Model

Key Adjustable Parameters

– Cut-through/store-and-forward behavior

– Average central-memory read latency

– Average central-memory write latency

– Central-memory size

– Link width, clock frequency, and other RapidIO physical layer parameters

– Static priority threshold scheme based on free memory in switch

RapidIO Component Modeling: RapidIO Component Modeling: Switch/Small-System VerificationSwitch/Small-System Verification

Verified N-hop latency for N switches (endpoint-to-endpoint latency)

– cut-throughLatency = AT + XT + N x WL + DT

– store-and-forwardLatency = AT + XT + N(WL + XT) + DT

– Passed packets through multiple switches, observe timestamps at various points compared w/ expected values

Error correction, flow control verification (Rx/Tx)– Error tests similar to 2-node, except using switch port as partner– Flow control verified by using various window sizes, observing link partner chatter– Adjusted endpoint and switch priority thresholds, verify correct acceptance/denial

All-to-one, One-to-all delivery verification– Using system shown to right, packets from

generator sent to all nodes round-robin– All nodes configured to receive packets and

redirect to single memory sink– Even with saturated links, each node eventually

receives expected packets

XT – transmission timeWL – switch memory write latencyDT – packet disassembly delayAT – packet assembly delayN – number of hops between endpoints

GMTI Modeling:GMTI Modeling: Algorithm Description Algorithm Description

Global memory board sends groups of RapidIO packets to processing boards

Each processing board has 4 processors GMTI algorithm used is a post-Doppler variant

with 4 stages:– Pulse compression– Doppler processing– Weight computation/Beamforming (STAP)– CFAR

After processing, packets containing detection information get sent back to main memory

GMTI Modeling:GMTI Modeling:Algorithm Flow OverviewAlgorithm Flow Overview

Input data cube Processing board 1

Processing board 2

Processing board 3

Processing board 9

Global memory

GMTI Modeling:GMTI Modeling:Processing Board OverviewProcessing Board Overview

Pulse compression STAPDoppler CFAR

Processing element

CPU CPU

RIO interface

RIO switch

GMTI Modeling:GMTI Modeling: Data Cube Generator Data Cube Generator

Key features– Can control amount of data

generated, as well as rate– Evenly spreads out packets over

entire CPI Diagram to right shows effect of

increasing generation rate while keeping data size constant

Blocks represent packets, dashed lines represent CPI

Must be careful to not saturate → balance data size/CPI

– No endpoint, only generates data

LegendBlue oval – signal new CPI, calculate

number of packets to create

Red square – loop over number of packets

Purple square – time delay between each packet

Green square – create a packet, fill out necessary fields

Pink oval – create last packet, may be smaller size

GMTI Modeling:GMTI Modeling: Global Memory Board Global Memory Board

Key features– May serve as sensor data source, global memory

endpoint (port), or both– One endpoint and one generator per model– Using multiple instances in a system represents:

Multiple sensors each generating part of the complete data cube

Multiple ports to the same global memory

– 3 simple blocks in yellow circle give significant flexibility in controlling data traffic

Key adjustable parameters– Ranges, channels, pulses– CPI– Size per element, packet size– Packet group size (message length)– Destination ID(s) to send to– Memory delay per byte– All RapidIO endpoint parameters

BLUE – data cube generator YELLOW – traffic shaper RED – RapidIO Endpoint

GMTI Modeling:GMTI Modeling:Baseline Board/Backplane ModelsBaseline Board/Backplane Models

Baseline Processor Board Model– Four processors

Compute node ASIC + RIO endpoint– One 8-port switch– One link to each endpoint– Four links available for backplane

connection

Baseline Backplane Model– Four 8-port switches

Minimal number for a symmetric configuration– Two links to each processor board

“Wastes” 2 links per board– One link to memory board– Many additional configurations to explore

GMTI Modeling:GMTI Modeling: System Model System Model

Baseline configuration– 125MHz DDR RIO links

– Receiver-controlled flow control

– Two-link trunk between each switch

– Static, store-and-forward routing

– 10kB switch central-memory size

– 72ns average read/write latency per packet for switch memory

– Baseline GMTI algorithm Pulses = 126 Ranges = 120000 Beams = 6 CPI = 256ms

Additional Proposed System Model #1Additional Proposed System Model #1Generic Backplane Model Double links for up to 12 boards Routing tables and switch setup independent of

number of boards actually used, or application Two levels of switches

– Each 1st-level switch has two links to level 2– 2 free links per switch, currently forms ring– Each 2nd-level switch is connected to all 1st level, as well

as double global memory link (red circles, bottom)

Uniform, dual-link access to all other boards– Logical neighbors one less hop with ring

N-Board Configuration (N = 1 … 12) Diagram to left shows system with 6 boards inserted All 4 global-memory links are used (bottom), assuming 4

ports/endpoints to memory; input data cube from global memory also

System purposefully built to be application independent, reusable– Attempt to maintain performance with increased versatility– GMTI may be pipelined or staggered

Easy to simulate applications with different numbers of processors Each board also has two free links, could be used

Additional Proposed System Model #2Additional Proposed System Model #2Blue circles GMTI

algorithm tasks

Red circles Global

memory ports

Custom System Configuration – Pipelined Algorithm– Currently assumes # boards as specified in GMTI spreadsheet

Changing # procs/boards may require remaking routing tables, switch layout

– For tasks with > 1 board allocated, form star topology among boards for ↑ intra-task communication If not necessary/worthwhile, use extra board links for double connections to switches All switches have free links for small architectural adjustments

– Algorithm will be pipelined – Not all global memory ports must be used; currently have switch support for up to 4 input, 3 output

Proposed ExperimentsProposed Experiments

Motivation– Maximize performance

Minimize the time needed for one CPI of the GMTI algorithm (must be < 256ms for baseline configuration)

– Minimize cost/power Number of boards/processors/switches

Independent variables– System configuration– Routing tables– Store-and-forward vs. cut-through routing– Flow control method (tx-controlled vs. rx-controlled)– Link rate (125MHz vs. 250 MHz)– Priority thresholds– Endpoint queue lengths– Switch central memory size

Initial experiments will examine the effects of varying these one at a time Experiments to follow will seek correlations between the parameters

Additional Features to ExploreAdditional Features to Explore Multicast and flow control extensions for RIO

– Spec to be released this year– Part of RapidFabric extensions

Dynamic load balancing– RIO spec technically allows only static load

balancing Must accomplish using clever routing tables

– Possible to extend the spec to allow some dynamic load balancing

Literature search on this topic revealed several promising directions

Primary challenge is RIO packet delivery ordering rules

– Experiments can be conducted to determine optimal method for dynamic load balancing, if Honeywell’s applications warrant

SBR algorithm alternatives– Pipelining of GMTI algorithm– Staggering of GMTI algorithm– SAR (our initial focus directed at GMTI)

RapidIO I/O Logical Layer– Remote reads/writes instead of message

passing– Can be easily added to the models if deemed

important by Honeywell for their applications

Diagram courtesy of: RapidFabric RapidIO Extensions Whitepaper, March 2004.

RapidIO Architecture Layers Highlighting RapidFabric Extensions

Project TasksProject Tasks

Literature Review– RIO spec, RIO components, SBR, misc.

RIO Component Modeling– Layers, endpoints, switches, processors, etc.– GMTI traffic models, memory boards, backplanes

RIO System Modeling– Proposed systems models, test plan

Simulation Experiments Data Analysis and Report

Project TimetableProject Timetable

Literature Review [complete, yet ongoing]– RIO spec, RIO components, SBR, misc.

RIO Component Modeling [complete]– Layers, endpoints, switches, processors, etc.– GMTI traffic models, memory boards, backplanes

RIO System Modeling [in progress]– Proposed systems models, test plan

Simulation Experiments [to begin May 10th] Data Analysis and Report [to begin June 20th]

ConclusionsConclusionsCompleted literature review

– Will include future topics as requiredRIO components and systems are well

underwayGMTI case study is well underwayExperiments identifiedFoundation for future integrated payload

prototyping is well underway

Future Collaboration PossibilitiesFuture Collaboration Possibilities Expanding the current project

– Jeremy Ramos’s suggestions for ST-8 Examine latency sensitivity in RIO systems with data and control packets Examine RIO’s ability to throttle flow control for buffered pipeline processing with

systems that have limited software support (i.e. RC devices)

– Include other aspects of UF’s Fast and Accurate Simulation Environment (FASE) Algorithm profiling, system tradeoff analysis

New directions– Wireless RC interconnects proposal– ST-8 / Integrated Payload middleware study

Green Hills and other RTOS providers (evaluation / collaboration) Monitoring and management of RC components (light version of UF’s CARMA) Algorithm / system prototyping

– Other possibilities?

1 progress report (4/14/04) virtual prototyping of advanced space system architectures based on...

honeywell space systems

spacebased radar gmti

extensionsprotocol issues

rapidiobased systemsgoal

general multicast issues

gmti traffic models

honeywell proposal effortslay

honeywell hcs page

Documents

clearwater chrysalis - clearwater forest - clearwater...

logicore ip serial rapidio gen2 endpoint v3serial rapidio...

rapidio - io logical

freescale technology forum: rapidio and freescale … ·...

rapidio specification, revision 3.0 rapidiotm ... · this...

rapidio intel fpga ip user guide · 1. about the rapidio...

rapidio subsystem guide subsystem guide.pdf · rapidio...

using the serial rapidio messaging unit on powerquicc...

rapidio gen2 technology - workspace.rapidio.orgsemiconductor...

rapidio ii ip core user guide - intel.com · rapidio ii ip...

next generation space interconnect standard (ngsis...

idt tsi572 serial rapidio switch user manual

texas instruments multi-core processor with rapidio...

clearwater - swaleview park · clearwater key highlights...

tms320c6457 dsp serial rapidio (srio) - analog, embedded

rapidio network management and diagnostics

rapidio: the interconnect architecture for high

rapidio ii megacore function v14.0 user guide -...

22428 pqiii rapidio wbt

12/9/04 1 virtual prototyping of advanced space system...