cgo2007 p3 3 birkbeck

27
A Dimension Abstraction Approach to Vectorization in Matlab Neil Birkbeck Jonathan Levesque Jose Nelson Amaral Computing Science University of Alberta Edmonton, Alberta, Canada

Upload: aiquant

Post on 23-Jun-2015

463 views

Category:

Business


1 download

DESCRIPTION

A Dimension Abstraction Approach to Vectorization in Matlab

TRANSCRIPT

Page 1: Cgo2007 P3 3 Birkbeck

A Dimension Abstraction Approach to Vectorization in Matlab

Neil BirkbeckJonathan LevesqueJose Nelson Amaral

Computing ScienceUniversity of Alberta

Edmonton, Alberta, Canada

Page 2: Cgo2007 P3 3 Birkbeck

Problem

Problem Statement:Generate equivalent, error-free vectorized

source code for Matlab source while utilizing higher level matrix operations when possible to improve efficiency.

Page 3: Cgo2007 P3 3 Birkbeck

Motivation

Loop-based code is slower than vector code in Matlab.Why?

interpretive overhead (type/shape checking,…)

resizing of arrays in loops

Vectorization also useful for compiled Matlab code, where optimized vector routines could be substituted.

n=1000;for i=1:n, A(i)=B(i)+C(i);end

n=1000;for i=1:n, A(i)=B(i)+C(i);end

n=1000;A(1:n)=B(1:n)+C(1:n);

n=1000;A(1:n)=B(1:n)+C(1:n);

5x faster!

Page 4: Cgo2007 P3 3 Birkbeck

Related Work

Data dependence vectorization Allen & Kennedy’s Codegen algorithm

Build data dependence graph Topological visit strongly connected components

Abstract Matrix Form (AMF) [Menon & Pingali] axioms used to transform array code take advantage of matrix multiplication Not clear if it is easily extensible or allows for vectorization

of irregular access (e.g., access to the diagonal)

Page 5: Cgo2007 P3 3 Birkbeck

Incorrect Vectorization

Example 1:for i=1:n,

a(i)=b(i)+c(i);

end

Pull out of loop.Index variable

substitution (i1:n)a(1:n)=b(1:n)+c(1:n)a(1:n)=b(1:n)+c(1:n)

Vectorization correct if a,b, and c are row vectors or column vectors

If this is not true the vectorized code will introduce an error!

Page 6: Cgo2007 P3 3 Birkbeck

Incorrect Vectorization Example 2:

for i=1:n, x(i)=y(i,h)*z(h,i);end

for i=1:n, x(i)=y(i,h)*z(h,i);end

Matlab is untyped Vectorization depends on whether h is

a vector or scalar. If h is a scalar:Otherwise:

x(1:n)=y(1:n,h).*z(h,1:n)’;

x(1:n)=sum(y(1:n,h).*z(h,1:n)’,2);

Page 7: Cgo2007 P3 3 Birkbeck

Overview of Solution

Vectorizable statement

Data dependence-basedvectorizer

Knowledge ofShape of variables

Propagate dimensionalityup parse tree

Dimensions Agree?

Leave statement in loopNo

Yes Perform Transformations

Output Vector statement

Page 8: Cgo2007 P3 3 Birkbeck

More Specifically

Represent dimensionality of expressions as list of symbols 1 or “*” (>1) Assume known for variables.

Type dim

scalar (1)

1xn vector (1,*)

nx1 vector (*,1),(*)

mxn matrix (*,*)

Examples:

Propagate up parse tree according to Matlab rules Compatibility:

dim(A)≈dim(B) when the lists are equivalent (after removal of redundant 1’s)

Page 9: Cgo2007 P3 3 Birkbeck

Vectorized Dimensionality

Vectorized dimensionality: representation of dimensions after vectorization of

a loop denoted dimi for loop with index variable i

Introduce new symbol ri for index variable i

for i=1:n, a(i)=10+i;end

exp dim(exp) vectorized dimi(exp)

10 (1) 10 (1)

i (1) 1:n (1,ri)

a(i) (1) a(1:n) (ri)

a (*) a (*)

Page 10: Cgo2007 P3 3 Birkbeck

Vectorized Dimensionality

Expressions with incompatible vectorized dimensionality should not be vectorized.

When do dimensionalities agree?Assignment expressions: elhs=erhs

dimi(elhs)≈dimi(erhs) || erhs≈(1)

Element-wise binary operators: e=elhsΘerhs

dimi(elhs) ≈(1)||dimi(erhs)≈(1)||dimi(elhs)≈dim(erhs)

Θ in {+,-,.*,…}

Page 11: Cgo2007 P3 3 Birkbeck

dimi,j(B)=(rj,ri)dimi,j(C)=(ri,rj)

Vectorization fails because (ri,rj) is not compatible with (rj,ri)

dimi,j(B)=(rj,ri)dimi,j(C)=(ri,rj)

Vectorization fails because (ri,rj) is not compatible with (rj,ri)

Vectorized Dimensionality

Rules very restrictive: Assume dim(A)=dim(B)=dim(C)=(*,*)

for i=1:100,

for j=1:100

A(i,j)=B(j,i)+C(i,j);

end

end

for i=1:100,

for j=1:100

A(i,j)=B(j,i)+C(i,j);

end

end

Page 12: Cgo2007 P3 3 Birkbeck

Transpose Transformation

Extension to utilize transpose when necessary is straightforward:For assignment:

if dimi(A)≈reverse(dimi(B)) then A=BT is allowable

for i=1:m,

for j=1:n

A(i,j)=B(j,i);

end

end

for i=1:m,

for j=1:n

A(i,j)=B(j,i);

end

end

dimi,j(A)=reverse(dimi,j(B))=(ri,rj)

A(1:m,1:n)=(B(1:n,1:m))’

dimi,j(A)=reverse(dimi,j(B))=(ri,rj)

A(1:m,1:n)=(B(1:n,1:m))’

Page 13: Cgo2007 P3 3 Birkbeck

Transpose Transformation

Extension to utilize transpose when necessary is straightforward:Similar for pointwise operations:

if dimi(A)≈reverse(dimi(B)) then AΘBT is allowable, propagate dimi(AΘBT)=dimi(A)

if dimi(reverse(A))≈dimi(A) then ATΘB is allowable, propagate dimi(ATΘB)=dimi(B)

Page 14: Cgo2007 P3 3 Birkbeck

Pattern Database Dimensionality disagreement at binary operators inhibits

vectorization. Recognizing patterns (consisting of operator type and

operand dimensionalities) can be used to identify a transformation enabling vectorization.

lhs operation rhs output(ri, rj) ΘΘ (ri,1) (ri, rj)

for i=1:m, for j=1:n, A(i,j)=B(i,j)+C(i); endend

for i=1:m, for j=1:n, A(i,j)=B(i,j)+C(i); endend

B(i,j)+C(i);B(i,j)+C(i); B(1:m,1:n)+repmat(C(1:m),1,n);B(1:m,1:n)+repmat(C(1:m),1,n);

Transformed Result

Pattern:

Page 15: Cgo2007 P3 3 Birkbeck

Pattern Database

Diagonal access pattern:

lhs operation rhs output(ri, ri) (index) (index) nil (1, ri)

Pattern:

for i=1:n, a(i)=A(i,i)*b(i);end

for i=1:n, a(i)=A(i,i)*b(i);end

a(1:n)=A((1:n)+size(A,1)*((1:n)-1)).*b(1:n);a(1:n)=A((1:n)+size(A,1)*((1:n)-1)).*b(1:n);

Column major indexing of A

Page 16: Cgo2007 P3 3 Birkbeck

Additive Reduction Statements

Additive-reduction statements use a loop variable to perform an accumulation. Not all loop nest index variables appear in

output dimensionality

for i1=…, for i2=…, … for ik=… A(J)=A(J)+E; … end endend

for i1=…, for i2=…, … for ik=… A(J)=A(J)+E; … end endend

Loop nest variables I={i1,i2,…,ik}J is a subset of Efor i=1:m,

for j=1:n, a(i)=a(i)+B(i,j); endend

for i=1:m, for j=1:n, a(i)=a(i)+B(i,j); endend

I={i,j} J={i}

Page 17: Cgo2007 P3 3 Birkbeck

for i=1:m a=a+b(i);end

I={i},J={}I-J={i}ρ(b(i))={}

ri in dimi(b(i))=(ri,1)Reduce: b(i)sum(b(i),1);Vectorize: a=a+sum(b(1:m));

for i=1:m a=a+10;end

I={i},J={}I-J={i}ρ(10)={}

ri not in dimi(10)Reduce: 10m*10, ρ(m*10)={ri}Vectorize: a=a+m*10;

Additive Reduction (Solution)

Maintain/propagate dimensionality and reduced variables for an expression. ρ(E) denotes the reduced variables for expression E

When checking statement A(J)=A(J)+E ensure dimi1,i2,…,ikA(J)≈dimi1,i2,…,ik(E) and ρ(E)=I-J any variable ri in I-J but not in ρ(E) must be reduced

Page 18: Cgo2007 P3 3 Birkbeck

Additive Reduction via Matrix Multiplication

Matrix multiplication can be used to perform reductions on e=elhs*erhs , provided:

1. dimi1,…,ik(elhs)=(Sl,rk)

2. dimi1,…,ik(erhs)=(rk,Sr)

3. rk is a reduction variable. Implies:

dimi1,…,ik(e)=(Sl,Sr) ρ(e)=union(ρ(elhs), ρ(erhs),{rk})

for i=1:m for j=1:n a(i)=a(i)+B(i,j)*x(j); endend

• j is used for reduction• dimi,j(B(i,j))=(ri,rj)• dimi,j (x(j))=(rj)

a(1:m)=a(1:m)+… B(1:m,1:n)*x(1:n);

Page 19: Cgo2007 P3 3 Birkbeck

ρ(a(i,j)*b(j)+sum(c(i,j),2))={rj}, dimi,j(a(i,j)*b(j)+sum(c(i,j),2)=(ri,rj)

ρ(a(i,j))={}, dimi,j(a(i,j))=(ri,rj)ρ(b(j))={}, dimi,j(b(j))=(rj)rj is reduction variable

Additive Reduction Example

Additive reduction example:for i=1:m,for i=1:m, for j=1:n,for j=1:n, d(i)=d(i)+a(i,j)*b(j)+c(i,j)d(i)=d(i)+a(i,j)*b(j)+c(i,j) endendendend ρ(c(i,j))={},

dimi,j(c(i,j))={ri,rj}

Need to reduce rj: c(i,j)sum(c(i,j),2);

Dimensionality and reduced variables agree, now replace index

variables:

ρ(a(i,j)*b(j))={rj},dimi,j(a(i,j)*b(j))=(ri)

Use matrix multiplication to reduce rj

d(1:m)=d(1:m)+a(1:m,1:n)*b(1:n)+sum(c(1:m,1:n),2);

Page 20: Cgo2007 P3 3 Birkbeck

Implementation Prototype

Pattern database and corresponding transformations are specified in modular end-user extensible manner.

Original Loop

Octave ParserEmbedded

ControlStatements

Create DDG

DimensionCheck

SuccessVectorize

Statement

Code Generator

VectorizerVectorized

Loop

no

yes

no

yes

Page 21: Cgo2007 P3 3 Birkbeck

Results

Source-to-source transformation Timing results averaged over 100 runs: Platform:

Matlab 7.2.0.2833.0 GHz Pentium D Processor

Page 22: Cgo2007 P3 3 Birkbeck

Results Histogram Equalization:

h=hist(im(:),[0:255]);%histogramheq=255*cumsum(h(:))/sum(h(:));for i=1:size(im,1), for j=1:size(im,2), im2(i,j)=heq(im(i,j)+1); endend

h=hist(im(:),[(0:255)]);heq=255*cumsum(h(:))/sum(h(:));im2(1:size(im,1),1:size(im,2))=... heq(im(1:size(im,1),1:size(im,2))+1);

Input source Vectorized Result

For monochrome 8-bit 800x600 image: original/vectorized:

Entire routine: 0.178s/0.114s (speedup: 1.56) Loop Portion only: 0.0814s/0.0176s (speedup: 4.6)

Page 23: Cgo2007 P3 3 Birkbeck

Results (Menon & Pingali Examples)X(i,1:p)=X(i,1:p)-L(i,1:i-1)*X(1:i-1,1:p);for k=1:p, for j=1:(i-1),

X(i,k)=X(i,k)-L(i,j)*X(j,k);end end

for i=1:N,for j=1:N phi(k)=phi(k)+a(i,j)*x_se(i)*f(j);end end

phi(k)=phi(k)+sum(a(1:N,1:N)’* x_se(1:N).*f(1:N),1);

for i=1:n,for j=1:n, for k=1:n,for l=1:n y(i)=y(i)+x(j)*A(i,k)* B(l,k)*C(l,j); end end end end

y(1:n)=y(1:n)+x(1:n)’*... (A(1:n,1:n)*B(1:n,1:n)’*C(1:n,1:n))’;

Settings Input time (s) Output time(s) speedup

i=500,p=5000 0.536s 0.030s 17

N=1000 0.174s 0.012s 14

n=40 0.622s 0.0001s 5000

Page 24: Cgo2007 P3 3 Birkbeck

Remaining Issues/Future Work

Each pattern transformation is local; no optimization over entire statement. e.g., we do not optimize and distribute transposes

Control flow within loop Function calls

functions are treated as pointwise operators (correct for many predefined arithmetic functions)

Incorporate our analysis directly with shape analysis

Page 25: Cgo2007 P3 3 Birkbeck

Summary

Contributions:A simple method to prevent incorrect

vectorization in MatlabA user extensible operator/dimensionality

pattern database can be used to improve vectorization

These patterns can make use of higher level semantics (e.g., matrix multiplication) or diagonal accesses in vectorization.

Page 26: Cgo2007 P3 3 Birkbeck

Acknowledgements

Funding provided by NSERC Grateful for reviewers comments and

suggestions

Page 27: Cgo2007 P3 3 Birkbeck

Thank You

Questions?