system development. numerical techniques for matrix inversion

Post on 17-Dec-2015

224 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

System Development

Numerical Techniques for Matrix Inversion

The Elementary Technique

Matrix Inversion using Co-Factors

Inversion using Co-Factors? Not Suitable Computationally!!

• This technique is a very bad contender for implementationComplexity : ‘N!’ (N x N-1 x N-2 x … x 3 x 2 x 1)(Evaluated for SIMD machines)

• A recursive algorithm may lend an elegant solution but– Devours memory resources with extreme greed– Drags the processor out from the Grand Prix into a traffic jam

• Therefore, a computationally extremely expensive algorithm with magnanimous memory requirements

• Above all a SPS hardware architecture for this technique is a distant reality because of the irregular global communication requirements lend to it by its recursive algorithm

Any Alternatives?

• Fortunately YES!• A technique which employs LU Decomposition and

Triangular Matrix Inversion for it’s solutionComplexity : N3 (Evaluated for SIMD machines)

• What are these numerical techniques? (We’ll soon get to learn them)

• The distinct advantage of these techniques is the fact that their solution is a mimicry of the Gaussian Elimination procedure, which in turn is an excellent contender for systolic implementations

To the Computationally Efficient Numerical Techniques

Matrix Inversion using LU Decomposition and Triangular Matrix Inversion

Matrix Inversion

LU Decomposition

LU Decomposition (cont.)

LU Decomposition (cont.)

LU Decomposition (cont.)

Triangular Matrix Inversion

Upper Triangular Matrix

Triangular Matrix Inversion (cont.)

Triangular Matrix Inversion (cont.)

Triangular Matrix Inversion (cont.)

Triangular Matrix Inversion

Lower Triangular Matrix

Triangular Matrix Inversion (cont.)

Triangular Matrix Inversion (cont.)

Triangular Matrix Inversion (cont.)

A Systolic Architecture for Triangular Matrix Inversion

Matrix Order is 4 x 4

Regular Cells

Boundary Cells

The following architecture’s abstract computational working has been illustrated using the upper triangular matrix. The same architecture, after some arrangement of data, can be employed for the computation of a lower triangular matrix.

Array for LU Decomposition?

Left for you to practice! Try to develop an idea of it’s dataflow independently and without any help. It will lend you and excellent understanding systolic data flows.

A Systolic System for the Complete Matrix Inversion Algorithm

MappingMapping is a procedure through which we can achieve the phenomenon of Resource Reuse. Mapping means that two or more algorithms use the same hardware architecture for their execution. It turns out that the most excellent contenders for Resource Reuse are Arithmetic Blocks or as in our case the Processing Elements. Usually, before mapping algorithms on to the same set of Processing Elements we need to develop a Scheduling Algorithm. A Scheduling Algorithm decides that at ‘which time interval’ will a particular processing element execute ‘what data’ for a particular algorithm, out of the given set of algorithms required to be mapped onto the system.

An Example for Mapping

The Square Matrix Multiplication Array on the Band Matrix

Multiplication Array

The Array for Band Matrices

The Array for Square Matrices

The Combined or “Mapped” Array

The ‘maroon’ lines represent common connections to each array

The control signal, in sense, will perform the scheduling of operations

In experience, I’ve found the Muliplexer to be arguably the single most important logic element for Datapath design. It’s use is especially imperative to resource efficient system design, as well as in devising the data-flow (data routing) between various devices within the system. Therefore, learning to utilize and eventually control multiplexers in system interconnection is critically essential for system design. I’ll assert upon the fact that you develop a clever understanding of this device as expertees with it will facilitate your design process and help you groom into excellent ‘Special-Purpose-System’ Datapath Designers.

A Sincere Advice!!

General Framework for Datapath Development involving Processing Elements which require various Data Sources

Procedure that can be adopted for routing data of varoius algorithms and tasks that maybe utilizing the same Processing Elements

The Do-Yourself Thing

Resource Efficiency

‘Mapping’ is a technique that results in reduced Logic Resource Consumption. Another effective technique for Area Optimization is developing ‘Partially-Parallel/Semi-Parallel Architectures’ from the Fully-Parallel Algorithm Data-path. This is actually considered as a ‘Time to Area Tradeoff’ approach and is valid only and until it suffices the Real-time requirements of the Special Purpose System being developed.

I’ll throw light upon SPS Semi-Parallel Architectures using the Matrix

Multiplication Problem

The Single Processing Element Approach

The Fully Parallel (Simple and Systolic) Architecture for Matrix Multiplication

The Semi-Parallel (Simple and Systolic) Architecture for Matrix Multiplication

Towards Complete Systems

Kalman Filter Equations

Extended Kalman Filter Equations

The Local Control

• These are usually state machines or counters• In this particular example they are used to– Generate addresses and read/write signals for the

data storages– Specify the function to be performed by the

processing elements of the array– May also be used for selecting data inputs of

multiplexers for data transfer between the arrays and also for set, reset and load operations for various registers

The Global Control

• These are usually wait-assert or interrupt based state-machines

• This may be again a state machine or a counter (at times rather large and complex)

• May be a Programmable State Machine!• Programmable State Machine?• These are like small microcontrollers that can

be programmed through software

HW/SW Co-Design

• HW/SW stands for hardware software co-design• The concept is to solve the problem partially in

software and the rest in hardware• Why software? Because sequential problems are

more suited to software solutions• Let’s understand the particular example of

Kalman/H-Infinity Filter design using the Xilinx 8-bit PicoBlaze or KCPSM (Constant Coded Programmable State Machine)

A Glance at the PicoBlaze Architecture

But Why? Why PicoBlaze?

Application of Wait-Assert type Global Control in Kalman System Design

Down Memory Lane

Remeber and Relate!!

Q & A s

top related