the project asymmetric fpga-loaded hardware accelerators for fpga- enhanced cpu systems with linux...

19

Upload: nora-gallagher

Post on 03-Jan-2016

225 views

Category:

Documents


0 download

TRANSCRIPT

The Project

Asymmetric FPGA-loaded hardware accelerators for FPGA-enhanced CPU systems with Linux

The Project

Asymmetric FPGA-loaded hardware accelerators for FPGA-enhanced CPU systems with Linux

Performed by: Avi WernerWilliam Backshi

Instructor: Evgeny FiksmanDuration: 1 year (2 semesters)

Performed by: Avi WernerWilliam Backshi

Instructor: Evgeny FiksmanDuration: 1 year (2 semesters)

Mid-project presentation

30/03/2009

RMI Processor

RMI – SW Programming Model

RMI Processor - RMIOS

Agenda Project description

Design considerations and schematics

System diagram and functionality

Preparing the demo

Planned future progress

Project definitionAn FPGA-based system.Asymmetric multiprocessor system,

with Master CPU and several slave Accelerators (modified softcore CPUs with RAM) with same or different OpCode.

Master CPU running single-processor Linux OS, with the Accelerators functionality provided to the applications in OS by driver API.

PlatformML310 with PPC405

AcceleratorsBased on uBlaze soft-core

microprocessors.Controllers

IRQ controller for each core.

“Accelerator” refers to microprocessor + IRQ generator + RAM

The Platform

Project Progress

Theoretical research Found and read articles on HW accelerators, both of the faculty staff

and external (CELL – IBM, etc) Met with most of MATRICS group, checking their interest in our platform

and possible demands Met with Systems Dpt. Members in IBM (Muli Ben-Yehuda) for a concept

review. System architecture has undergone significant changes.

Practical achievements – attempt to load Linux on ML310 Compiled kernel for PPC-405 with ML310 support (no PCI support). Booted ML310 from CF with Xilinx pre-loaded Linux. Introduced additional hardware into FPGA, tested liveness.

Practical achievements – creating HW system platform Moved to Xilinx 10.1 to get a single system bus (PLB v.4.6) with multi-

port memory. Created a template for Accelerator core (IRQ Generator and

microprocessor). Designed interconnect topology. Connected the devices on HW level, tested system liveness and

independency.

HW Design considerations

Scalability – the design is CPU-independent.Accelerator working with interrupts – no polling

(improved performance).OS not working with interrupts – generic HW

compatibility and scalability (polling IRQ generators).

Separate register space – not using main memory for flags / device data / etc.

Single cycle transaction for checking / setting accelerator status.

Data Mover stub init includes chunk size – no character recognition needed.

Accelerator

Data & Instr.Dual port RAM

CPU (uBlaze)IRQ Generator

General Purpose Registers

SlaveMaster

PLB v.4.6

IRQ

Accelerator Schematics

MEMController

MEMController

Inst

ruct

ion

bu

s

Da

ta b

us

PPCAccelerator

DDR MEMMMU

Accelerator Accelerator

PLB v.4.6 bus

Data & InstrMEM

Data & InstrMEM

Data & InstrMEM

HW Design Schematics

Accelerated Software platform

FPGA

PPC 405

Accelerator

DDR MEM

MMU

Memory test demo

Instr MEM & Data MEM

Software Stub(Data mover & executer)

LED Accelerator demo

Manual execution

Manual execution : we can’t load any executable into the DDR without JTAG – since we don’t have OS. Thus we have to load it manually, and setup and execute stub manually.

Current System layer

Accelerated Software platform

FPGA

PPC 405

Accelerator

DDR MEM

MMU

Linux (Debian)

Driver

Virtual communicationLayer (SW)

Instr MEM & Data MEM

Software Stub(Data mover & executer)

Complete System layer

System Functionality

Functionality HW is loaded on FPGA, Demo application (in the future - Linux kernel)

runs on central PPC core, accelerators are preloaded with client software stub.

SW driver is loaded in the memory (in kernel - using insmod command).

Accelerator-aware SW is executed (in kernel - communicates with the driver API).

To commit a job for specific accelerator, the SW initializes the registers of the accelerator’s IRQ controller and sets the “run” flag in the status register.

Client stub runs in idle loop until an IRQ controller of the accelerator issues an interrupt - initialized by driver code running on PPC core.

The stub reads IRQ controller registers that initialize the Data Mover (in the 1st stage - with start address and length of code).

Data Mover sets a flag in the IRQ generator status register, that signals a working accelerator core.

Data Mover initializes transactions with the main memory until all the code segment has been brought and passes control to the 1st byte of the code segment.

The target code includes “rtid” instruction to return control to Data Mover after execution, it finishes and the inserted “rtid” passes control back to Data Mover stub.

Data Mover changes the status register of IRQ generator to “complete”, and returns to idle loop (the stub has a possibility to support returning resulting data structures to the main memory).

Preparing Accelerator SW

Compilation of accelerator target code, with execution-only segment (there is no data segment – data inserted inline).Target code should be compiled with Program starting address = 0x1000, set via Compiler options, using Default linker script.

Insert in the end – call to a “return” function with address that is taken from 0xFFC:

asm("andi r1, r1, 0x0;\ lwi r15, r1, 0xFFC;\ rtsd r15,0;");

Open Xilinx EDK Shell, run for converting ELF to binary code:mb-objcopy -O binary --remove-section=.stab --remove-section=.stabstr executable.elf target.bin

Preparing the system

Download bitstream to FPGA (PPC code and uBlaze stub).

Launch XMD on PPC core.

Download target accelerator BIN to DRAM as data:dow –data target.bin 0xSTART_ADDR

Set IRQ Generator parameters:1. Base address – 0xSTART_ADDR + 0x1000.2. Length of BIN in DRAM.3. Run bit.4. Set run bit again, if you liked it.

Planned future progress

Load Linux on the platform.Update the stub to allow data passing.Finish writing the driver API for Linux.Write additional demo application for

uBlaze.Write demo application for PPC (Linux).

Backup slides

Hidden