emulated digital cnn-um implementation of a 3-dimensional ocean model on fpgas

23
University of Veszprém Department of Image Processing and Neurocomputing Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs Zoltán Nagy, Péter Szolgay

Upload: ivan-dudley

Post on 02-Jan-2016

32 views

Category:

Documents


2 download

DESCRIPTION

Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs. Zoltán Nagy, Péter Szolgay. Introduction. Cellular Neural/Nonlinear Networks Universal Machine (CNN-UM) Ocean modeling Results Conclusions. Cellular Neural/Nonlinear Networks (CNN). 2 or N dimensional grid - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs

University of VeszprémDepartment of Image Processing and Neurocomputing

Emulated Digital CNN-UM Implementation of a

3-dimensional Ocean Model on FPGAs

Zoltán Nagy, Péter Szolgay

Page 2: Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs

Nagy 2 MAPLD 2005/153

Introduction

• Cellular Neural/Nonlinear Networks Universal Machine (CNN-UM)

• Ocean modeling• Results• Conclusions

Page 3: Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs

Nagy 3 MAPLD 2005/153

Cellular Neural/Nonlinear Networks (CNN)

• 2 or N dimensional grid• Locally connected• Analog processing elements • State value is continuous in time

Page 4: Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs

Nagy 4 MAPLD 2005/153

Structure of a CNN cell

• uij input

• xij state

• yij output

• zij constant bias

• Aij,kl feedback template

• Bij,kl feed-forward template

, ,

1( ) ( ) ( ) ( )

r r

x ij ij ij kl kl ij kl kl ijkl S ij kl S ijx

C x t x t A y t B u t zR

z ij

uij xij

Cx Rx I (ij,kl)xu I (ij,kl)xy Iyx Ry

yij

Page 5: Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs

Nagy 5 MAPLD 2005/153

CNN-UM implementations

• Software simulation Easy to implement Slow, even if using processor specific instructions

• Emulated digital VLSI Specialized digital architecture Selectable computing precision (Castle architecture: 1, 6,

12 bit) Orders faster than the software simulation Long design time

• Analog VLSI Huge computing power (~TeraOP/s) Low accuracy (7-8 bit) Noise and temperature sensitivity

Page 6: Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs

Nagy 6 MAPLD 2005/153

Structure of the Falcon emulated digital CNN-UM

• Mixer Contains cell values for

the next updates

• Memory unit Contains a belt of the

cell array

• Template memory• Arithmetic unit• Processors can be

connected on a grid Linear speedup

Memory unit

Mixer unitTemplatememory

Arithmetic unit

StateIn ConstIn TmpselIn

StateOut ConstOut TmpselOut

RightOut

RightOutNewLeftOut

LeftIn

LeftInNewRightIn

Coreprocessor

Coreprocessor

Input lines

Output lines

Control lines

Coreprocessor

Coreprocessor

Coreprocessor

Coreprocessor

Coreprocessor

Coreprocessor

Coreprocessor

Page 7: Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs

Nagy 7 MAPLD 2005/153

Structure of the arithmetic unit

• Cell update in row wise order

• Cycle time depends on template size

• Fully pipelined

Mult Mult Mult

Reg2

+

+

Sh

ift

reg

+

ACC

Reg4

Sh

ift

reg

+

Reg3

S1 S2 S3T1 T2 T3 gij xij

Reg1

Page 8: Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs

Nagy 8 MAPLD 2005/153

Configurable parameters

• State, template and constant width between 2 to 64 bits

• Number of templates• Size of the templates• Width of the cell array slice• Number of layers• Number and arrangement of the

processor cores

Page 9: Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs

Nagy 9 MAPLD 2005/153

lxxgxu

lxxfxu

ttlu

ttu

tlxx

uc

dt

ud

00,

00,

00,

00,0

002

22

2

2

Example: Solution of a simple PDE on CNN

• The Wave equation • Spatial discretization

• 2 layer CNN

121

1

2

2

21

12

x

cA

A

A12 A21

Layer d

Layer v

Page 10: Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs

Nagy 10 MAPLD 2005/153

Ocean models

• Barotropic model• Baroclinic models

z-coordinate model σ-coordinate model isopycnal

• Fine resolution models Real-time forecast Fishing industry Search and rescue

• Coarse resolution models Long term

predictions Climate modeling

Page 11: Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs

Nagy 11 MAPLD 2005/153

The Princeton Ocean Model (POM)

• Sigma coordinate model Vertical coordinate

is scaled on the water column depth

• Second moment turbulence closure sub-model Provides vertical

mixing coefficients

• Solution technique: Mode splitting Internal mode (3D)

o Vertical structure equations

o Implicit solution External mode (2D)

o Vertically integrated equations

o Explicit solution (Leapfrog method)

Page 12: Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs

Nagy 12 MAPLD 2005/153

Governing equations of the external (2D) mode

• ux, uy mass transport

• η free surface elevation• Ω angular rotation of

the Earth • Θ latitude

• H depth of the ocean• g gravitational

acceleration • τw, τb wind and bottom

stress

• A lateral viscosity

y

u

H

u

x

u

H

uuA

xgHusin2

dt

du xyxxx

2bxwxy

x

y

u

H

u

x

u

H

uuA

ygHusin2

dt

du yyyxy

2bywyx

y

y

u

x

u

dt

d yx

Page 13: Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs

Nagy 13 MAPLD 2005/153

Solution on CNN

• Spatial discretization on a uniform grid• 3-layer CNN structure• Non-linear template required for advection

term

• Cannot be solved on analog VLSI CNN chips• Solvable on the modified Falcon architecture

Support of non-linearity Specialized cell model

ij,x,xij,x

ij,x A000101000

x2

u

xu

Page 14: Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs

Nagy 14 MAPLD 2005/153

The modified arithmetic unit of the Falcon architecture

*

fij uy,ij

recHij

*

-

i-1,j i+1,j

wx,ij

Aij ux,i-1,j ux,i+1,j ux,i,j-1 ux,i,j+1 ux,ij

+ +

+

+

*

-

*

-

*

+

*

+ + +

+

+

uy,ij

gHij

ux,ij ux,i-1,j ux,i+1,j ux,ij ux,i,j-1 ux,i,j+1

Page 15: Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs

Nagy 15 MAPLD 2005/153

Area requirements

0

10

20

30

40

50

60

70

10 12 14 16 18 20 22 24 26 28 30 32 34 36

Precision (bit)

Mult18x18 18kbit BRAM

Implementation on FPGA

• Complicated arithmetic unit

• Fixed-point number representation

• Configurable precision

• High level hardware description language required(e.g. Handel-C)

Page 16: Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs

Nagy 16 MAPLD 2005/153

PerformanceSpeedup compared to an Athlon64 2GHz

1

10

100

1000

10000

10 14 18 22 26 30 34

Precision (bit)

Sp

ee

du

p

XC2V1000 XC2V6000 XC4VSX55

Number of processors

0

5

10

15

20

25

30

10 12 14 16 18 20 22 24 26 28 30 32 34 36

Precision (bit)

XC2V1000 XC2V6000 XC4VSX55

Page 17: Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs

Nagy 17 MAPLD 2005/153

The Seamount problem

Page 18: Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs

Nagy 18 MAPLD 2005/153

Results after 72 hours

0 500 1000 1500 20000

200

400

600

800

1000

1200

1400

1600

1800

2000

X (km)

Y (

km)

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

X (km)

Y (

km)

0 500 1000 1500 20000

200

400

600

800

1000

1200

1400

1600

1800

2000

-1.5

-1

-0.5

0

0.5

1

1.5

x 10-3

Circulation pattern Elevation

Page 19: Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs

Nagy 19 MAPLD 2005/153

Error of the solution

Error of the mass transport uy

1.0E-05

1.0E-04

1.0E-03

1.0E-02

1.0E-01

1.0E+00

10 14 18 22 26 30 34

Precision (bit)

Err

or

Case1 Case2 Case3

Case4 Case5 Case6

Error of the mass transport ux

1.0E-05

1.0E-04

1.0E-03

1.0E-02

1.0E-01

1.0E+00

10 14 18 22 26 30 34

Precision (bit)

Err

or

Case1 Case2 Case3

Case4 Case5 Case6

Page 20: Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs

Nagy 20 MAPLD 2005/153

Error of the solution

Error of the elevation

1.0E-07

1.0E-06

1.0E-05

1.0E-04

1.0E-03

1.0E-02

10 14 18 22 26 30 34

Precision (bit)

Err

or

Case1 Case2 Case3

Case4 Case5 Case6

Page 21: Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs

Nagy 21 MAPLD 2005/153

Memory requirements of the internal (3D) equations

• Extended memory hierarchy New level stores 3 cross sectional slices from

the 3D arrayo Large memory required (e.g. 512x512x64 sized grid,

3x512x64 elements per state variable)o Cannot be stored on-chipo Off-chip storage requires huge I/O bandwidth

• Processor array should be used The 3D array is divided between the

processors Optimal data set for on chip storage: 2048

elements per cross sectional slice (512x32x64 sized grid per processor)

Each processor located on a separate FPGA

Page 22: Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs

Nagy 22 MAPLD 2005/153

Solution of the internal (3D) equations

• Implicit solution Fixed-point solution

o Requires large precision to avoid rounding errors

o Seems to be impractical Floating-point solution

o Requires large area (especially add/sub)

• Explicit solution Smaller timestep Simpler arithmetic unit

Page 23: Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs

Nagy 23 MAPLD 2005/153

Conclusions

• Ocean modeling using emulated digital CNN is very promising

• Moderate precision is required in 2D mode 1% accuracy using 24 bits

• Expected speedup (compared to an Athlon64 2GHz microprocessor) 80 times on our RC200 prototyping board 3700 times on the largest available FPGA