aparallel out-of-core algorithm for time-domain adaptive integral...

84
A Parallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Method Guneet KAUR and Ali E. YILMAZ Department of Electrical & Computer Engineering University of Texas at Austin IEEE International Conference on Computational Electromagnetics (ICCEM) Hong Kong, February 2-5, 2015

Upload: others

Post on 23-Apr-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

A Parallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Method

Guneet KAUR and Ali E. YILMAZ

Department of Electrical & Computer EngineeringUniversity of Texas at Austin

IEEE International Conference on Computational Electromagnetics (ICCEM)Hong Kong, February 2-5, 2015

Page 2: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Motivation- Integral Equation Methods (Historical Perspective)- TD-AIM vs. FD-AIM- Memory Hierarchy- Out of Core Algorithms

Page 3: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

(High-Frequency) IE Methods: Historical Progress since RWG

*

* http://web.corral.tacc.utexas.edu/BioEM-Benchmarks/historicalProgress/index.html

[1] S. M. Rao, D. R. Wilton, A. W. Glisson, IEEE TAP, May 1982.[2] A. Taflove, K. Umashankar, IEEE TEMC, Nov. 1983.[3] D. H. Schaubert, D. R. Wilton, A. W. Glisson, IEEE TAP, Jan. 1984.[4] M. F. Catedra, J. G. Cuevas, L. Nuno, IEEE TAP, Dec. 1988.[5] M. F. Catedra, E. Gago, L. Nuno, IEEE TAP, May 1989.[6] T. Cwik, J. Patterson, D. Scott, Proc. Supercomp., Nov. 1992.[7] H. Gan, W. C. Chew, J EM Waves Appl., 1995.[8] J. M. Song and W. C. Chew, MOTL, Sep. 1995.[9] E. Bleszynski et al., Radio Sci., Sep.-Oct. 1996.[10] C. F. Wang and J. M. Jin, IEEE TMTT, May 1998.[11] J. Song et al., IEEE TAP, June 1998.[12] S. Velamparambil, W. C. Chew, J. Song, IEEE AP Mag., Apr. 2003.[13] A. Rubinstein, et al., IEEE TEMC, May 2003.[14] Ö. Ergül, L. Gürel, IEEE TAP, Aug. 2008.[15] J. M. Taboada et al., IEEE AP Mag., Dec. 2009.[16] A. Heldring et al., IEEE TAP, Jan. 2013.

TDIE papers lagging:(1) Late-time instability: implicit solvers, better temporal basis functions, well-conditioned IE formulations, exact/semi-analytical integration techniques, ….(2) High computation time: fast algorithms(3) High(er) memory requirement(4) …

[17] F. Wei, A. E. Yılmaz, IEEE TAP, Feb. 2014.[18] C.L. Bennett , H. Mieras, Radio Sci., Nov.-Dec.1981.[19] S. M. Rao, D. R. Wilton, IEEE TAP, Jan. 1991.[20] D.A. Vechinski and S. M. Rao, IEEE TAP, June 1992.[21] M. J. Bluck, S. P. Walker, IEEE TAP, May 1997.[22] S. J. Dodson, S. P. Walker, M. J. Bluck, IEEE AP Mag., Aug. 1998.[23] B. Shanker, et al., IEEE TAP, Apr. 2000.[24] J. L. Hu, C. H. Chan, and Y. Xu, MOTL, May 2000.[25] A. E. Yılmaz et al., IEEE TAP, July 2002.[26] B. Shanker,et al., IEEE TAP, Mar. 2003.[27] A. E. Yılmaz, J.M. Jin, E. Michielssen, IEEE TAP, Oct. 2004.[28] H. Bağcı et al., IEEE TEMC, May 2007.[29] H. Bağcı et al., IEEE TEMC, Feb 2010.[30] G. Kaur and A.E. Yılmaz, Proc. ACES., Apr. 2012.[31] G. Kaur and A.E. Yılmaz, submitted IEEE TAP, Sep. 2014.[32] G. Kaur and A.E. Yılmaz, ICCEM, Feb. 2015.

This talk

Page 4: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

TD-AIM vs. FD-AIM

102

103

104

105

106

10710

0

101

102

103

104

105

106

107

Ns

Mat

rix-F

ill Ti

me

(s)

O(Ns)

102

103

104

105

106

10710

-3

10-2

10-1

100

101

102

103

104

105

Ns

Mar

chin

g/So

lutio

n Ti

me

per t

ime/

freq.

ste

p(s)

O(Ns3/2logN

s)

O(Ns5/4log2N

s)

O(Ns3/2log2N

s)

O(Nslog2N

s)

102

103

104

105

106

10710

-3

10-2

10-1

100

101

102

103

Ns

Mem

ory

(GB)

O(Ns)

O(Ns2)

O(Ns3/2)

FD‐AIM Plate

TD‐AIM Plate

FD‐AIM Sphere

TD‐AIM Sphere

Number of frequency samples bandwidth of fieldsNumber of time samples maximum frequency content of the fields

Envelope-tracking methods[1-2] requires time samples bandwidth, and stores smaller time history than traditional time-domain methods

[1] A. Mohan and D. S. Weile, IEEE TAP, 2005.[2] G. Kaur and A. E. Yılmaz, IEEE TAP, to appear in 2015.

Page 5: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Memory Hierarchy

Registers

Processor

Control Unit

On-chip Cache

Main Memory

Local Secondary Storage

Remote Secondary StorageIncr

easi

ng c

apac

ity a

nd a

cces

s tim

esD

ecre

asin

g co

sts

and

frequ

ency

of a

cces

s Access Time

300 ps

1-10 ns

10-20 ns

50-100 ns(~20-75 ns)

5-10 ms

?

Off-Chip Cache

http://www.edn.com/design/systems-design/4397051/Memory-Hierarchy-Design-part-1John Hennessy and David Patterson, Computer Architecture, Fifth Edition: A Quantitative Approach, Morgan Kaufmann, San Francisco, CA, 1996.

TransferTime/byte

1-10 ps

25-50 ps

50-100 ps

50-200 ps(~20 ps)

0.2-2 ns

?(~2 ns)

Size

1000 bytes

64-256 kB(L1: 32 kBL2: 256 kB)

2-4 MB(L3: 20 MB)

4-16 GB(32 GB

2GB/core)

4-16 TB(80-430 GB)

O(PB)(14 PB)

* Stampede@TACC#7 supercomputer in

top500 list, Nov. 2014

Page 6: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Memory Hierarchy

F/A-18 HornetTop speed: 1915 km/h

Sustained speed:

~1.66 mm/s

http://www.scase.co.uk/snailracing/

Inspiration for this analogy:http://blog.scoutapp.com/articles/2011/02/10/understanding-disk-i-o-when-should-you-be-worried

Access time:50 ns vs. 5 msTransfer time:20 ps vs. 0.2-2 ns

53.2 10´

Wells (2014 world snail racing champ)

McLaren 12c (fasteststock car 0-200 kph)

http://www.dragtimes.ru/en/blogs/view/2659 29-

Highway/top speed:

65 mph/210 mph

( ~access time ratio)

(~transfer time ratio)

Page 7: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

102 103 104 10510-5

10-4

10-3

10-2

10-1

100

101

102

103

Number of Unknowns

Tim

e pe

r MV

M (s

)

Out of CoreIn Core

O(Ns2)

• Use disk space in addition to core memory+ Lots of space: (or more)- I/O to disk much slower than memory

To read/write N bytes:Access time (latency): Transfer time (bandwidth):

+ Possible to amortize I/O costsLatency: Read/write large chunks of data (simple)Bandwidth: Many FLOPs per byte read/written (not so simple)Example: Dense-matrix-vector product (double-precision) on Stampede

• Goal: Reduce memory requirement without (significantly) increasing run time

Out of Core Algorithms

rw lat bw5

lat lat1 2

bw bw

10

10

IO IO

IO core

IO core

Nt t Nt

t t

t t-

= +

= ´

= ´

Tim

e

I/O volume (bytes)

latIOt

bwSlope: IOt

3 610 - ´

x 50- Store 100 kBin memory

- Read data in100 kB chunks & multiply

102 103 104 10510-4

10-3

10-2

10-1

100

101

102

Number of Unknowns

Mem

ory

(GB

)

Out of CoreIn Core

O(Ns2)

Page 8: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Background- Time Domain Integral Equations Basics- Method of Moments- Time Marching- Alternative Ways to Calculate Right-Hand Side

Page 9: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

- Frequency interval:

- Time interval:

1

( , )( , )( , , )

( , ) ( , )t S

ttg t ds

t t

mf e-

ì üì ü ¢ï ïï ï ï ïï ï ¢ ¢= *í ý í ý¢ ¢ï ï ï ï¶ - ⋅ï ï ï ïî þ î þòò

J rA rr r

r J r

sca 2

sca 1

( , ) ( , ) ( , )

( , ) ( , )t t t

t t

t t t

t t

f

m-

ì ü ì üï ï ï ï¶ -¶ -¶ï ï ï ïï ï ï ï=í ý í ýï ï ï ï¶ ´¶ï ï ï ïï ï ï ïî þ î þ

E r A r r

H r A r

inc inc,E HS

min maxf f f

( , )tJ r

• EFIE, MFIE, CFIE2 inc

1 inc

10

ˆ ˆ ˆ ˆ( ( , ) ( , )) ( , ) (TD-EFIE)

ˆ ˆ( , ) ( , ) ( , ) (TD-MFIE)

TD-CFIE TD-EFIE (1 )TD-MFIE

t t t

t t t

n n t t n n t

t n t n t

f

m

h a a

-

-

- ´ ´ ¶ +¶ =- ´ ´¶

¶ - ´´¶ = ´¶

= + -

A r r E r

J r A r H r

TDIE Basics

,e m

PEC

( )n r

sca inc

sca inc

( , ) ( , ) ˆ ˆ ˆ ˆ

( , ) ( , ) ( , )ˆ ˆt t

t t t

n n t n n t S

t n t n t

- ´ ´¶ = ´ ´¶ " Î

¶ - ´¶ = ´¶

E r E r r

J r H r H r

S,e m ,e m

inc inc,E H

( )/( , , )

4

t cg t

d

p

¢- -¢ =

¢-

r rr r

r r

max0 t T

Page 10: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Numerical Solution

- Discretize geometry, expand unknowns

• Method of moments

S T

,1 1

( , ) ( ) ( )N N

k l kk l

t I T t l t¢ ¢ ¢¢ ¢= =

¢» - DååJ r S r

Rao, Wilton, Glisson, IEEE Trans. AP, May 1982.

Must resolve fastest variations:

- Testing

( ) ( ) TD-CFIEk

S

dt ds t l td¥

- D ⋅ò òò S r

Tfor 1,...,l N=

2max

min S 2

max T max max

/ 10,

1/ 10 ,

(HF)A

fs N S

c

t f N T f

lD

D

- System of equations

incT

1

for 1,...,l

l l l ll

l N¢ ¢-¢=

= =åZ I V G. Manara et al., IEEE Trans. AP, Mar. 1997.D. S. Weile et al., IEEE Trans. AP, Jan. 2004.G. Kaur and A. E. Yilmaz, MOTL, June 2011.

Page 11: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Marching on in Time

g g g

g g g

g g T

i10 1

1 0 2

2 1 0 3

3 2 1 0 4

1 2 1 0

1 2 1 0 1

1 2 1 0

0

0

0 0

N N N

N N N

N N N

-

- +

-

é ù é ùê ú ê úê ú ê úê ú ê úê ú ê úê ú ê úê ú ê úê ú ê úê ú ê ú

=ê ú ê úê ú ê ú⋅ ⋅ ⋅ê ú ê úê ú ê ú

⋅ ⋅ ⋅ê ú ê úê ú ê úê ú ê úê ú ê úê ú ê ú⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ê úê ú ë ûë û

VZ IZ Z IZ Z Z IZ Z Z Z I

Z Z Z Z Z I

Z Z Z Z Z I

Z Z Z Z Z I

g

g

TT S

nc

inc2inc3inc4

inc

inc1

inc

1

N

N

N N N

+

´

é ùê úê úê úê úê úê úê úê úê úê úê úê úê úê úê úê úê úê úë û

V

V

V

V

V

V

- Due to causality, time invariance, uniform time-step size,(discretely) causal sub-domain space-time basis functions

- Frequency domain: 11 1 1

2 2 2 2

F F F FF S

inc,

inc,

inc,

1

0 0

0

0 N N N N

ff f f

f f f f

f f f fN N ´

é ùé ù é ùê úê ú ê úê úê ú ê úê úê ú ê ú = ê úê ú ê úê úê ú ê úê úê ú ê úê úê ú ê ú ê úë û ë û ë û

VZ I

Z I V

Z I V

Matrix properties:

- Lower triangular, Toeplitz, sparse blocks

- Non-zero entries:

- Unique entries:

- Unique blocks:

2T S

( )N NQ

2S

( )NQ

- Diagonal, unique,dense blocks

- Non-zero/uniqueentries:

2F SN N

g1N +

Page 12: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Marching on in Time

g gg g

g g

T T

inc10 1inc2 10 2inc

2 130 3inc

3 2 10 4 4

inc1 2 10

inc0 1 1

inc0

0

0

0 0

0

0

N NN N

N N

N N

-

+ +

é ùé ù ê úê ú ê úê ú ê úê ú ê úê ú ê úê ú ê úê ú ê úê ú ê úê ú ê ú= -ê ú ê úê ú ⋅ ⋅ ⋅ê úê ú ê úê ú ê úê ú ê úê ú ê úê ú ê úê ú ê úê ú ê úê úë û ë û

VZ IV ZZ I

Z ZVZ IZ Z ZZ I V

Z Z Z ZZ I V

Z I V

Z I V

g

g g g

g g T

1

2

3

4

1 2 1 1

1 2 1

0 0

0 0 0

N

N N N

N N N

- +

-

é ù é ùê ú ê úê ú ê úê ú ê úê ú ê úê ú ê úê ú ê úê ú ê úê ú ê úê ú ê úê ú ê úê ú ê úê ú ê ú

⋅ ⋅ ⋅ê ú ê úê ú ê úê ú ê úê ú ê úê ú ê ú⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ê úê ú ë ûë û

IIII

I

Z Z Z Z I

Z Z Z Z I

• Computational Costs

Storage: +0 S

( )N NQ

matrix Current/Field vectors

g S( )N NQ

0Z

g1 N-Z Z matrices

2S

( )NQ +

Iterative solution Right-hand sideFLOPs: +

T I 0 S( )O N N N N 2

T S( )O N N

Matrix fill2S

( )O N +

Page 13: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Marching on in Time: High-Frequency- HF: and unknowns distributed such that =>

, , c t

c t

c t

c t sD D 0 SN N

ttD 2 tDg

( 1)N t+ D

• Computational Costs

* Graph describes radiation by a point source that is a pulse of width turned on at time 0

tD

Storage:

matrix Current/Field vectors0

Zg1 N

-Z Z matrices

+

Iterative solution Right-hand sideFLOPs:

T I 0 S( )O N N N N 2

T S( )O N N

Matrix fill2S

( )O N +

1/2g S

( )N O N=

0 S( )N NQ

g S( )N NQ2

S( )NQ

Page 14: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Scattered-Field (Right-Hand-Side) Computation

Storage:

matrix Current/Field vectors0

Zg1 N

-Z Z matrices

+

Iterative solution Right-hand sideFLOPs:

T I 0 S( )O N N N N 2

T S( )O N N

Matrix fill2S

( )O N +

• Computational Costs

g

T

1

2

3

4

5

6

7

8

N

N

IIIIIIII

I

I

g

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

7 6 5 4 3 2 1

7 6 5 4 3 2 1

7 6 5 4 3 2 1

7 6 5 4 3 2 1

7 6 5 4 3 2 1

7 6 5 4 3 2 1

0 0

0

0

0

0 0

0

0

0

0

0

0

0

0 0

0

0

0 0

N

Z

Z Z

Z Z Z

Z Z Z Z

Z Z Z Z Z

Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z

g 8 7 6 5 4 3 2 1

0NZ Z Z Z Z Z Z Z Z

g

T

sca1sca2sca3sca4sca5sca6sca7sca8

sca

sca

N

N

V

V

V

V

V

V

V

V

V

V

=

0 S( )N NQ

g S( )N NQ2

S( )NQ

Page 15: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

g g 1 T1 2 3 4 5 6 7 8 N N N+

g

T

1

2

3

4

5

6

7

8

N

N

IIIIIIII

I

I

g

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

7 6 5 4 3 2 1

7 6 5 4 3 2 1

7 6 5 4 3 2 1

7 6 5 4 3 2 1

7 6 5 4 3 2 1

7 6 5 4 3 2 1

0 0

0

0

0

0 0

0

0

0

0

0

0

0

0 0

0

0

0 0

N

Z

Z Z

Z Z Z

Z Z Z Z

Z Z Z Z Z

Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z

g 8 7 6 5 4 3 2 1

0NZ Z Z Z Z Z Z Z Z

g

T

sca1sca2sca3sca4sca5sca6sca7sca8

sca

sca

N

N

V

V

V

V

V

V

V

V

V

V

=

Scattered-Field (Right-Hand-Side) Computation

• Row multiplication: Store current history

- Must access all of the stored data at every time step g

l N³

Storage needed:

Current/Field vectorsg1 N

-Z Z matrices

+

l

MemoryAccessed

S. P. Walker, IEEE AP Mag., Oct. 1997. g S( )N NQ2

S( )NQ

2S g S

( ) ( )N N NQ +Q

Page 16: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

g

T

1

2

3

4

5

6

7

8

N

N

IIIIIIII

I

I

g

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

7 6 5 4 3 2 1

7 6 5 4 3 2 1

7 6 5 4 3 2 1

7 6 5 4 3 2 1

7 6 5 4 3 2 1

7 6 5 4 3 2 1

0 0

0

0

0

0 0

0

0

0

0

0

0

0

0 0

0

0

0 0

N

Z

Z Z

Z Z Z

Z Z Z Z

Z Z Z Z Z

Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z

g 8 7 6 5 4 3 2 1

0NZ Z Z Z Z Z Z Z Z

g

T

sca1sca2sca3sca4sca5sca6sca7sca8

sca

sca

N

N

V

V

V

V

V

V

V

V

V

V

=

Scattered-Field (Right-Hand-Side) Computation

• Row multiplication: Store current history

• Column multiplication: Store future fields

Storage needed:

Current/Field vectorsg1 N

-Z Z matrices

+

1l

MemoryAccessed

S. P. Walker, IEEE AP Mag., Oct. 1997. g S( )N NQ2

S( )NQ

2S g S

( ) ( )N N NQ +Q

Page 17: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

g

T

1

2

3

4

5

6

7

8

N

N

IIIIIIII

I

I

g

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

7 6 5 4 3 2 1

7 6 5 4 3 2 1

7 6 5 4 3 2 1

7 6 5 4 3 2 1

7 6 5 4 3 2 1

7 6 5 4 3 2 1

0 0

0

0

0

0 0

0

0

0

0

0

0

0

0 0

0

0

0 0

N

Z

Z Z

Z Z Z

Z Z Z Z

Z Z Z Z Z

Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z

g 8 7 6 5 4 3 2 1

0NZ Z Z Z Z Z Z Z Z

g

T

sca1sca2sca3sca4sca5sca6sca7sca8

sca

sca

N

N

V

V

V

V

V

V

V

V

V

V

=

Scattered-Field (Right-Hand-Side) Computation

• Row multiplication: Store current history

• Column multiplication: Store future fields

- Must access all of the stored data at every time step T g

l N N£ -

Storage needed:

Current/Field vectorsg1 N

-Z Z matrices

+

l

MemoryAccessed

T g1 2 3 -N N

S. P. Walker, IEEE AP Mag., Oct. 1997. g S( )N NQ2

S( )NQ

2S g S

( ) ( )N N NQ +Q

Page 18: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4

0

0

0

0

0

0

0

0

0

0

0

0

0

Z

Z Z

Z Z Z

Z Z Z Z

Z Z Z Z Z

Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

Z Z

Z Z Z Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

IIIIIIIIIIIIIIII

=

sca1sca2sca3sca4sca5sca6sca7sca8sca9sca10sca11sca12sca13sca14sca15sca16

VVVVVVVVVVVVVVVV

1l

MemoryAccessed

eventually

Scattered-Field (Right-Hand-Side) Computation

• Row multiplication: Store current history

• Column multiplication: Store future fields

• Multilevel block multiplication: Store both

Storage needed:

Current/Field vectorsg1 N

-Z Z matrices

+g S

( )N NQ2S

( )NQ

2S g S

( ) ( )N N NQ +Q

Page 19: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4

0

0

0

0

0

0

0

0

0

0

0

0

0

Z

Z Z

Z Z Z

Z Z Z Z

Z Z Z Z Z

Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

Z Z

Z Z Z Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

IIIIIIIIIIIIIIII

=

sca1sca2sca3sca4sca5sca6sca7sca8sca9sca10sca11sca12sca13sca14sca15sca16

VVVVVVVVVVVVVVVV

1 2l

MemoryAccessed

Scattered-Field (Right-Hand-Side) Computation

• Row multiplication: Store current history

• Column multiplication: Store future fields

• Multilevel block multiplication: Store both

Storage needed:

Current/Field vectorsg1 N

-Z Z matrices

+g S

( )N NQ2S

( )NQ

eventually 2S g S

( ) ( )N N NQ +Q

Page 20: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4

0

0

0

0

0

0

0

0

0

0

0

0

0

Z

Z Z

Z Z Z

Z Z Z Z

Z Z Z Z Z

Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

Z Z

Z Z Z Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

IIIIIIIIIIIIIIII

=

sca1sca2sca3sca4sca5sca6sca7sca8sca9sca10sca11sca12sca13sca14sca15sca16

VVVVVVVVVVVVVVVV

1 2 3l

MemoryAccessed

Scattered-Field (Right-Hand-Side) Computation

• Row multiplication: Store current history

• Column multiplication: Store future fields

• Multilevel block multiplication: Store both

Storage needed:

Current/Field vectorsg1 N

-Z Z matrices

+g S

( )N NQ2S

( )NQ

eventually 2S g S

( ) ( )N N NQ +Q

Page 21: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4

0

0

0

0

0

0

0

0

0

0

0

0

0

Z

Z Z

Z Z Z

Z Z Z Z

Z Z Z Z Z

Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

Z Z

Z Z Z Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

IIIIIIIIIIIIIIII

=

sca1sca2sca3sca4sca5sca6sca7sca8sca9sca10sca11sca12sca13sca14sca15sca16

VVVVVVVVVVVVVVVV

1 2 3 4l

MemoryAccessed

Scattered-Field (Right-Hand-Side) Computation

• Row multiplication: Store current history

• Column multiplication: Store future fields

• Multilevel block multiplication: Store both

Storage needed:

Current/Field vectorsg1 N

-Z Z matrices

+g S

( )N NQ2S

( )NQ

eventually 2S g S

( ) ( )N N NQ +Q

Page 22: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4

0

0

0

0

0

0

0

0

0

0

0

0

0

Z

Z Z

Z Z Z

Z Z Z Z

Z Z Z Z Z

Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

Z Z

Z Z Z Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

IIIIIIIIIIIIIIII

=

sca1sca2sca3sca4sca5sca6sca7sca8sca9sca10sca11sca12sca13sca14sca15sca16

VVVVVVVVVVVVVVVV

1 2 3 4 5l

MemoryAccessed

Scattered-Field (Right-Hand-Side) Computation

• Row multiplication: Store current history

• Column multiplication: Store future fields

• Multilevel block multiplication: Store both

Storage needed:

Current/Field vectorsg1 N

-Z Z matrices

+g S

( )N NQ2S

( )NQ

eventually 2S g S

( ) ( )N N NQ +Q

Page 23: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4

0

0

0

0

0

0

0

0

0

0

0

0

0

Z

Z Z

Z Z Z

Z Z Z Z

Z Z Z Z Z

Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

Z Z

Z Z Z Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

IIIIIIIIIIIIIIII

=

sca1sca2sca3sca4sca5sca6sca7sca8sca9sca10sca11sca12sca13sca14sca15sca16

VVVVVVVVVVVVVVVV

1 2 3 4 5 6l

MemoryAccessed

Scattered-Field (Right-Hand-Side) Computation

• Row multiplication: Store current history

• Column multiplication: Store future fields

• Multilevel block multiplication: Store both

Storage needed:

Current/Field vectorsg1 N

-Z Z matrices

+g S

( )N NQ2S

( )NQ

eventually 2S g S

( ) ( )N N NQ +Q

Page 24: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4

0

0

0

0

0

0

0

0

0

0

0

0

0

Z

Z Z

Z Z Z

Z Z Z Z

Z Z Z Z Z

Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

Z Z

Z Z Z Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

IIIIIIIIIIIIIIII

=

sca1sca2sca3sca4sca5sca6sca7sca8sca9sca10sca11sca12sca13sca14sca15sca16

VVVVVVVVVVVVVVVV

1 2 3 4 5 6 7l

MemoryAccessed

Scattered-Field (Right-Hand-Side) Computation

• Row multiplication: Store current history

• Column multiplication: Store future fields

• Multilevel block multiplication: Store both

Storage needed:

Current/Field vectorsg1 N

-Z Z matrices

+g S

( )N NQ2S

( )NQ

eventually 2S g S

( ) ( )N N NQ +Q

Page 25: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4

0

0

0

0

0

0

0

0

0

0

0

0

0

Z

Z Z

Z Z Z

Z Z Z Z

Z Z Z Z Z

Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

Z Z

Z Z Z Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

IIIIIIIIIIIIIIII

=

sca1sca2sca3sca4sca5sca6sca7sca8sca9sca10sca11sca12sca13sca14sca15sca16

VVVVVVVVVVVVVVVV

1 2 3 4 5 6 7 8l

MemoryAccessed

Scattered-Field (Right-Hand-Side) Computation

• Row multiplication: Store current history

• Column multiplication: Store future fields

• Multilevel block multiplication: Store both

Storage needed:

Current/Field vectorsg1 N

-Z Z matrices

+g S

( )N NQ2S

( )NQ

eventually 2S g S

( ) ( )N N NQ +Q

Page 26: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4

0

0

0

0

0

0

0

0

0

0

0

0

0

Z

Z Z

Z Z Z

Z Z Z Z

Z Z Z Z Z

Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

Z Z

Z Z Z Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

IIIIIIIIIIIIIIII

=

sca1sca2sca3sca4sca5sca6sca7sca8sca9sca10sca11sca12sca13sca14sca15sca16

VVVVVVVVVVVVVVVV

1 2 3 4 5 6 7 8 9l

MemoryAccessed

Scattered-Field (Right-Hand-Side) Computation

• Row multiplication: Store current history

• Column multiplication: Store future fields

• Multilevel block multiplication: Store both

Storage needed:

Current/Field vectorsg1 N

-Z Z matrices

+g S

( )N NQ2S

( )NQ

eventually 2S g S

( ) ( )N N NQ +Q

Page 27: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4

0

0

0

0

0

0

0

0

0

0

0

0

0

Z

Z Z

Z Z Z

Z Z Z Z

Z Z Z Z Z

Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

Z Z

Z Z Z Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

IIIIIIIIIIIIIIII

=

sca1sca2sca3sca4sca5sca6sca7sca8sca9sca10sca11sca12sca13sca14sca15sca16

VVVVVVVVVVVVVVVV

1 2 3 4 5 6 7 8 9 10l

MemoryAccessed

Scattered-Field (Right-Hand-Side) Computation

• Row multiplication: Store current history

• Column multiplication: Store future fields

• Multilevel block multiplication: Store both

Storage needed:

Current/Field vectorsg1 N

-Z Z matrices

+g S

( )N NQ2S

( )NQ

eventually 2S g S

( ) ( )N N NQ +Q

Page 28: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4

0

0

0

0

0

0

0

0

0

0

0

0

0

Z

Z Z

Z Z Z

Z Z Z Z

Z Z Z Z Z

Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

Z Z

Z Z Z Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

IIIIIIIIIIIIIIII

=

sca1sca2sca3sca4sca5sca6sca7sca8sca9sca10sca11sca12sca13sca14sca15sca16

VVVVVVVVVVVVVVVV

1 2 3 4 5 6 7 8 9 10 11l

MemoryAccessed

Scattered-Field (Right-Hand-Side) Computation

• Row multiplication: Store current history

• Column multiplication: Store future fields

• Multilevel block multiplication: Store both

Storage needed:

Current/Field vectorsg1 N

-Z Z matrices

+g S

( )N NQ2S

( )NQ

eventually 2S g S

( ) ( )N N NQ +Q

Page 29: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4

0

0

0

0

0

0

0

0

0

0

0

0

0

Z

Z Z

Z Z Z

Z Z Z Z

Z Z Z Z Z

Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

Z Z

Z Z Z Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

IIIIIIIIIIIIIIII

sca1sca2sca3sca4sca5sca6sca7sca8sca9sca10sca11sca12sca13sca14sca15sca16

VVVVVVVVVVVVVVVV

=1 2 3 4 5 6 7 8 9 10 11 12

l

MemoryAccessed

Scattered-Field (Right-Hand-Side) Computation

• Row multiplication: Store current history

• Column multiplication: Store future fields

• Multilevel block multiplication: Store both

Storage needed:

Current/Field vectorsg1 N

-Z Z matrices

+g S

( )N NQ2S

( )NQ

eventually 2S g S

( ) ( )N N NQ +Q

Page 30: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4

0

0

0

0

0

0

0

0

0

0

0

0

0

Z

Z Z

Z Z Z

Z Z Z Z

Z Z Z Z Z

Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

Z Z

Z Z Z Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

IIIIIIIIIIIIIIII

sca1sca2sca3sca4sca5sca6sca7sca8sca9sca10sca11sca12sca13sca14sca15sca16

VVVVVVVVVVVVVVVV

=1 2 3 4 5 6 7 8 9 10 11 12 13

l

MemoryAccessed

Scattered-Field (Right-Hand-Side) Computation

• Row multiplication: Store current history

• Column multiplication: Store future fields

• Multilevel block multiplication: Store both

Storage needed:

Current/Field vectorsg1 N

-Z Z matrices

+g S

( )N NQ2S

( )NQ

eventually 2S g S

( ) ( )N N NQ +Q

Page 31: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4

0

0

0

0

0

0

0

0

0

0

0

0

0

Z

Z Z

Z Z Z

Z Z Z Z

Z Z Z Z Z

Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

Z Z

Z Z Z Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

IIIIIIIIIIIIIIII

sca1sca2sca3sca4sca5sca6sca7sca8sca9sca10sca11sca12sca13sca14sca15sca16

VVVVVVVVVVVVVVVV

=1 2 3 4 5 6 7 8 9 10 11 12 13 14

l

MemoryAccessed

Scattered-Field (Right-Hand-Side) Computation

• Row multiplication: Store current history

• Column multiplication: Store future fields

• Multilevel block multiplication: Store both

Storage needed:

Current/Field vectorsg1 N

-Z Z matrices

+g S

( )N NQ2S

( )NQ

eventually 2S g S

( ) ( )N N NQ +Q

Page 32: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

• Row multiplication: Store current history

• Column multiplication: Store future fields

• Multilevel block multiplication: Store both

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

IIIIIIIIIIIIIIII

sca1sca2sca3sca4sca5sca6sca7sca8sca9sca10sca11sca12sca13sca14sca15sca16

VVVVVVVVVVVVVVVV

=

Storage needed:

Current/Field vectorsg1 N

-Z Z matrices

+

l

MemoryAccessed

Scattered-Field (Right-Hand-Side) Computation

+ Only access all of the stored data once every time steps

g/ 2N

g S( )N NQ2

S( )NQ

eventually 2S g S

( ) ( )N N NQ +Q

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4

0

0

0

0

0

0

0

0

0

0

0

0

0

Z

Z Z

Z Z Z

Z Z Z Z

Z Z Z Z Z

Z Z Z Z Z Z

Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

Z Z

Z Z Z Z Z Z Z Z Z Z Z Z Z Z

Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z

Page 33: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Time Domain Adaptive Integral Method

Page 34: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

TD-AIM

Step 1: Anterpolate

L

cDEmbed in regular grid with nodesSCN

{ , , }k x y zÎ

corr † A †kl l l l

kl ll l ll l l

f¢¢ ¢ ¢ ¢ ¢- ¢-

- -¢+ L LL» + LåZ I GZ I IGI

• Computational Costs (HF, while time marching)

Storage:Inter/anter

FLOPs: T( )O N pN

( )pNQ

L

Inter/anterpolation order: 3, ( 1)m p m= +

Grid spacing: cD

Page 35: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

TD-AIM

Step 1: AnterpolateStep 2: Propagate

L

G

Embed in regular grid with nodesSCN

c †Aorr †ll l l ll l l

kl l l

kl

f¢ ¢ ¢ ¢-

¢ ¢ ¢- -

- ¢» +L L+ L Lå G GZ I I I IZ

Storage:

current/field vectors on grid,

l l ¢-G

g C( )N NQ +

Inter/anterFLOPs: +

4-D blocked space-time FFTs2

T C C g[log log ]( )N N N NO +

T( )O N pN

( )pNQ

L

• Computational Costs (HF, while time marching)

Inter/anterpolation order: 3, ( 1)m p m= +

Grid spacing: cD

{ , , }k x y zÎ

Page 36: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

TD-AIM

Step 1: AnterpolateStep 2: Propagate

Step 3: Interpolate L

G†L

Embed in regular grid with nodesSCN

c †A†orrll l l ll l ll l l

kl

k f¢ ¢ ¢ ¢-

¢ ¢ ¢- -

- ¢» +L L+ L Lå G GZ I I I IZ

Storage:

current/field vectors on grid,

l l ¢-G

g C( )N NQ +

Inter/anterFLOPs: +

4-D blocked space-time FFTs2

T C C g[log log ]( )N N N NO +

T( )O N pN

( )pNQ

L

• Computational Costs (HF, while time marching)

Inter/anterpolation order: 3, ( 1)m p m= +

Grid spacing: cD

{ , , }k x y zÎ

Page 37: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

TD-AIM

Step 1: AnterpolateStep 2: Propagate

Step 3: InterpolateStep 4: Correct

L

G†L

corrZ

Embed in regular grid with nodesSCN

c †A†orrll l l ll

kl

klll l l

f¢ ¢ ¢ ¢ ¢-

¢- -¢ ¢-» + L+L L LåZ I IGI IGZ

• Computational Costs (HF, while time marching)

Storage: +near( )NQ

current/field vectors on grid

corrl l ¢-Z

,l l ¢-G

g C( )N NQ +

Inter/anterFLOPs: ++

Correct 4-D blocked space-time FFTs2

T C C g[log log ]( )N N N NO +near

T( )O N N

T( )O N pN

( )pNQ

L

Inter/anterpolation order: 3, ( 1)m p m= +

Grid spacing: cD

Correction region size: 2g =

{ , , }k x y zÎ

Page 38: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

TD-AIM

Step 1: AnterpolateEmbed in regular grid with nodesS

CN

Inter/anterpolation order: 3, ( 1)m p m= +

Grid spacing: cD

LStep 2: Propagate

G

Step 3: Interpolate†L

Step 4: CorrectcorrZ

Correction region size: 2g =

c †A†orrll l l ll

kl

klll l l

f¢ ¢ ¢ ¢ ¢-

¢- -¢ ¢-» + L+L L LåZ I IGI IGZ

• Computational Costs (HF, while time marching)

Storage: +near( )NQ

current/field vectors on grid

corrl l ¢-Z

,l l ¢-G

g C( )N NQ +

Inter/anterFLOPs: ++

Correct 4-D blocked space-time FFTs2

T C C g[log log ]( )N N N NO +near

T( )O N N

T( )O N pN

( )pNQ

L

{ , , }k x y zÎ

Page 39: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

4-D Blocked Space-Time FFTs

g

1inc

0max(1,

c

)

A† †orrll

l

ll l ll

k

N

kl

ll l

G GZ ZI V I

A. E. Yilmaz, J. M. Jin, and E. Michielssen, IEEE Trans. AP, Oct. 2004.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5

0

0

0

0

0

0

0

0

0

0

0

f

f f

f f f

f f f f

f f f f f

f f f f f f

f f f f f f f

f f f f f f f f

f f f f f f f f f

f f f f f f f f f f

f f f f f f f

G

G G

G G G

G G G G

G G G G G

G G G G G G

G G G G G G G

G G G G G G G G

G G G G G G G G G

G G G G G G G G G G

G G G G G G G 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4 3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

0

0

f f f f

f f f f f f f f f f f f

f f f f f f f f f f f f f

f f f f f f f f f f f f f f

f f f f f f f f f f f f f f f

G G G G

G G G G G G G G G G G G

G G G G G G G G G G G G G

G G G G G G G G G G G G G G

G G G G G G G G G G G G G G G

1i =2i =

3i =

4i =

ìïïïïïïïïíïïïïïïïïî

ìïïïíïïïî

{{

{ , , }k x y zÎ

Unique Blocks:

Block is of size

Storage space needed to store block

to store largest block:

2 glog{1, , 1}Ni Î +ê úê úë û1 1

C C2 2i iN N- -´i

C: (2 )ii NQ

g C( )N NQ

Page 40: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

A. E. Yilmaz, J. M. Jin, and E. Michielssen, IEEE Trans. AP, Oct. 2004.

MOTTD-AIM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15l

MemoryAccessed 2

S g S( ) ( )N N NQ +Q

g C( )N NQ

{ , , }k x y zÎ

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5

0

0

0

0

0

0

0

0

0

0

0

f

f f

f f f

f f f f

f f f f f

f f f f f f

f f f f f f f

f f f f f f f f

f f f f f f f f f

f f f f f f f f f f

f f f f f f f

G

G G

G G G

G G G G

G G G G G

G G G G G G

G G G G G G G

G G G G G G G G

G G G G G G G G G

G G G G G G G G G G

G G G G G G G 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4 3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

0

0

f f f f

f f f f f f f f f f f f

f f f f f f f f f f f f f

f f f f f f f f f f f f f f

f f f f f f f f f f f f f f f

G G G G

G G G G G G G G G G G G

G G G G G G G G G G G G G

G G G G G G G G G G G G G G

G G G G G G G G G G G G G G G

4-D Blocked Space-Time FFTs

g

1inc

0max(1,

c

)

A† †orrll

l

ll l ll

k

N

kl

ll l

G GZ ZI V I

Page 41: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Out-of-Core Algorithm

Page 42: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

l1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

CMemory 2iN

- Block requires bytes of memory - Introduce an integer threshold parameter- Limit the memory use to

- Modify TD-AIM propagation stage for blocks

- Minimize effect of latency by managing data layout in disk

Out-of-Core Algorithm: Synopsisi

C(2 )iNQ

4IO =

2IO =

IO

C(2 )IONQ

i IO>

Page 43: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Out-of-Core Algorithm

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

=

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5

0

0

0

0

0

0

0

0

0

0

0

f

f f

f f f

f f f f

f f f f f

f f f f f f

f f f f f f f

f f f f f f f f

f f f f f f f f f

f f f f f f f f f f

f f f f f f f

G

G G

G G G

G G G G

G G G G G

G G G G G G

G G G G G G G

G G G G G G G G

G G G G G G G G G

G G G G G G G G G G

G G G G G G G 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4 3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

0

0

f f f f

f f f f f f f f f f f f

f f f f f f f f f f f f f

f f f f f f f f f f f f f f

f f f f f f f f f f f f f f f

G G G G

G G G G G G G G G G G G

G G G G G G G G G G G G G

G G G G G G G G G G G G G G

G G G G G G G G G G G G G G G

1i =2i =3i =4i =

2IO =

1i =1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

Page 44: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Out-of-Core Algorithm

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

=

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5

0

0

0

0

0

0

0

0

0

0

0

f

f f

f f f

f f f f

f f f f f

f f f f f f

f f f f f f f

f f f f f f f f

f f f f f f f f f

f f f f f f f f f f

f f f f f f f

G

G G

G G G

G G G G

G G G G G

G G G G G G

G G G G G G G

G G G G G G G G

G G G G G G G G G

G G G G G G G G G G

G G G G G G G 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4 3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

0

0

f f f f

f f f f f f f f f f f f

f f f f f f f f f f f f f

f f f f f f f f f f f f f f

f f f f f f f f f f f f f f f

G G G G

G G G G G G G G G G G G

G G G G G G G G G G G G G

G G G G G G G G G G G G G G

G G G G G G G G G G G G G G G

1i =2i =3i =4i =

2IO =

2i =

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

Page 45: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Out-of-Core Algorithm

1

2

3

5

6

7

8

9

10

11

12

13

14

15

16

4

=

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5

0

0

0

0

0

0

0

0

0

0

0

f

f f

f f f

f f f f

f f f f f

f f f f f f

f f f f f f f

f f f f f f f f

f f f f f f f f f

f f f f f f f f f f

f f f f f f f

G

G G

G G G

G G G G

G G G G G

G G G G G G

G G G G G G G

G G G G G G G G

G G G G G G G G G

G G G G G G G G G G

G G G G G G G 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4 3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

0

0

f f f f

f f f f f f f f f f f f

f f f f f f f f f f f f f

f f f f f f f f f f f f f f

f f f f f f f f f f f f f f f

G G G G

G G G G G G G G G G G G

G G G G G G G G G G G G G

G G G G G G G G G G G G G G

G G G G G G G G G G G G G G G

1i =2i =3i =4i =

Stored in core memory Stored in disk

2IO =

4

5

6

7

8

9

10

11

12

13

14

1

1

1

5

3

6

2

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

1i =

Page 46: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5

0

0

0

0

0

0

0

0

0

0

0

f

f f

f f f

f f f f

f f f f f

f f f f f f

f f f f f f f

f f f f f f f f

f f f f f f f f f

f f f f f f f f f f

f f f f f f f

G

G G

G G G

G G G G

G G G G G

G G G G G G

G G G G G G G

G G G G G G G G

G G G G G G G G G

G G G G G G G G G G

G G G G G G G 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4 3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

0

0

f f f f

f f f f f f f f f f f f

f f f f f f f f f f f f f

f f f f f f f f f f f f f f

f f f f f f f f f f f f f f f

G G G G

G G G G G G G G G G G G

G G G G G G G G G G G G G

G G G G G G G G G G G G G G

G G G G G G G G G G G G G G G

1i =2i =3i =4i =

Stored in disk

2IO =

Out-of-Core Algorithm

1i =

1

2

3

5

6

7

8

9

10

11

12

13

14

15

16

4

=

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

4

5

6

7

8

9

10

11

12

13

14

1

1

1

5

3

6

2

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

Stored in core memory

Page 47: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

=

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5

0

0

0

0

0

0

0

0

0

0

0

f

f f

f f f

f f f f

f f f f f

f f f f f f

f f f f f f f

f f f f f f f f

f f f f f f f f f

f f f f f f f f f f

f f f f f f f

G

G G

G G G

G G G G

G G G G G

G G G G G G

G G G G G G G

G G G G G G G G

G G G G G G G G G

G G G G G G G G G G

G G G G G G G 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4 3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

0

0

f f f f

f f f f f f f f f f f f

f f f f f f f f f f f f f

f f f f f f f f f f f f f f

f f f f f f f f f f f f f f f

G G G G

G G G G G G G G G G G G

G G G G G G G G G G G G G

G G G G G G G G G G G G G G

G G G G G G G G G G G G G G G

1i =2i =3i =4i =

Stored in disk

2IO =

Out-of-Core Algorithm

1i =5

6

7

8

9

1

10

11

12

13

14

1

6

2

3

4

5

1

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

Stored in core memory

Page 48: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

2IO

2IO

12i-

12i-

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

=

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5

0

0

0

0

0

0

0

0

0

0

0

f

f f

f f f

f f f f

f f f f f

f f f f f f

f f f f f f f

f f f f f f f f

f f f f f f f f f

f f f f f f f f f f

f f f f f f f

G

G G

G G G

G G G G

G G G G G

G G G G G G

G G G G G G G

G G G G G G G G

G G G G G G G G G

G G G G G G G G G G

G G G G G G G 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4 3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

0

0

f f f f

f f f f f f f f f f f f

f f f f f f f f f f f f f

f f f f f f f f f f f f f f

f f f f f f f f f f f f f f f

G G G G

G G G G G G G G G G G G

G G G G G G G G G G G G G

G G G G G G G G G G G G G G

G G G G G G G G G G G G G G G

1i =2i =3i =4i =

Stored in disk

2IO =

Out-of-Core Algorithm

3i =

For each node :

Fetch anterpolated currentsamples from the memoryand the remaining

samples from diskCompute the contribution of

these current samples to thepotentials at future timesteps

Read (previously computed)partial potentials from thedisk and add to thecontribution computed instep (2)

Write the currents at theprevious time steps andthe updated potentialsbeyond time steps intothe future to the disk.

, 1 Cu u N£ £

5

6

7

8

9

1

10

11

12

13

14

1

6

2

3

4

5

1

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

Stored in core memory

2IO

12 2i IO- -

Page 49: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

2IO

2IO

12i-

12i-

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5

0

0

0

0

0

0

0

0

0

0

0

f

f f

f f f

f f f f

f f f f f

f f f f f f

f f f f f f f

f f f f f f f f

f f f f f f f f f

f f f f f f f f f f

f f f f f f f

G

G G

G G G

G G G G

G G G G G

G G G G G G

G G G G G G G

G G G G G G G G

G G G G G G G G G

G G G G G G G G G G

G G G G G G G 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4 3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

0

0

f f f f

f f f f f f f f f f f f

f f f f f f f f f f f f f

f f f f f f f f f f f f f f

f f f f f f f f f f f f f f f

G G G G

G G G G G G G G G G G G

G G G G G G G G G G G G G

G G G G G G G G G G G G G G

G G G G G G G G G G G G G G G

1i =2i =3i =4i =

Stored in disk

2IO =

Out-of-Core Algorithm

3i =

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

=

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

For each node :

Fetch anterpolated currentsamples from the memoryand the remaining

samples from diskCompute the contribution of

these current samples to thepotentials at future timesteps

Read (previously computed)partial potentials from thedisk and add to thecontribution computed instep (2)

Write the currents at theprevious time steps andthe updated potentialsbeyond time steps intothe future to the disk.

2IO

12 2i IO- -

, 1 Cu u N£ £

5

6

7

8

9

1

10

11

12

13

14

1

6

2

3

4

5

1

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

Stored in core memory

Page 50: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

2IO

2IO

12i-

12i-

2IO

12 2i IO- -1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5

0

0

0

0

0

0

0

0

0

0

0

f

f f

f f f

f f f f

f f f f f

f f f f f f

f f f f f f f

f f f f f f f f

f f f f f f f f f

f f f f f f f f f f

f f f f f f f

G

G G

G G G

G G G G

G G G G G

G G G G G G

G G G G G G G

G G G G G G G G

G G G G G G G G G

G G G G G G G G G G

G G G G G G G 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4 3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

0

0

f f f f

f f f f f f f f f f f f

f f f f f f f f f f f f f

f f f f f f f f f f f f f f

f f f f f f f f f f f f f f f

G G G G

G G G G G G G G G G G G

G G G G G G G G G G G G G

G G G G G G G G G G G G G G

G G G G G G G G G G G G G G G

1i =2i =3i =4i =

Stored in disk

2IO =

Out-of-Core Algorithm

3i =

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

=

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

For each node :

Fetch anterpolated currentsamples from the memoryand the remaining

samples from diskCompute the contribution of

these current samples to thepotentials at future timesteps

Read (previously computed)partial potentials from thedisk and add to thecontribution computed instep (2)

Write the currents at theprevious time steps andthe updated potentialsbeyond time steps intothe future to the disk.

, 1 Cu u N£ £

5

6

7

8

9

1

10

11

12

13

14

1

6

2

3

4

5

1

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

Stored in core memory

Page 51: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

12i-

12i-

2IO

12 2i IO- -1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5

0

0

0

0

0

0

0

0

0

0

0

f

f f

f f f

f f f f

f f f f f

f f f f f f

f f f f f f f

f f f f f f f f

f f f f f f f f f

f f f f f f f f f f

f f f f f f f

G

G G

G G G

G G G G

G G G G G

G G G G G G

G G G G G G G

G G G G G G G G

G G G G G G G G G

G G G G G G G G G G

G G G G G G G 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4 3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

0

0

f f f f

f f f f f f f f f f f f

f f f f f f f f f f f f f

f f f f f f f f f f f f f f

f f f f f f f f f f f f f f f

G G G G

G G G G G G G G G G G G

G G G G G G G G G G G G G

G G G G G G G G G G G G G G

G G G G G G G G G G G G G G G

1i =2i =3i =4i =

Stored in disk

2IO =

Out-of-Core Algorithm

3i =

1

2

3

4

5

9

10

11

12

13

14

15

16

6

7

8 =

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

For each node :

Fetch anterpolated currentsamples from the memoryand the remaining

samples from diskCompute the contribution of

these current samples to thepotentials at future timesteps

Read (previously computed)partial potentials from thedisk and add to thecontribution computed instep (2)

Write the currents at theprevious time steps andthe updated potentialsbeyond time steps intothe future to the disk.

2IO

2IO

, 1 Cu u N£ £

6

7

8

9

1

2

3

4

10

11

12

13

14

15

16

5

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

Stored in core memory

Page 52: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5

0

0

0

0

0

0

0

0

0

0

0

f

f f

f f f

f f f f

f f f f f

f f f f f f

f f f f f f f

f f f f f f f f

f f f f f f f f f

f f f f f f f f f f

f f f f f f f

G

G G

G G G

G G G G

G G G G G

G G G G G G

G G G G G G G

G G G G G G G G

G G G G G G G G G

G G G G G G G G G G

G G G G G G G 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4 3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

0

0

f f f f

f f f f f f f f f f f f

f f f f f f f f f f f f f

f f f f f f f f f f f f f f

f f f f f f f f f f f f f f f

G G G G

G G G G G G G G G G G G

G G G G G G G G G G G G G

G G G G G G G G G G G G G G

G G G G G G G G G G G G G G G

1i =2i =3i =4i =

Stored in disk

2IO =

Out-of-Core Algorithm

1i =

1

2

3

4

5

9

10

11

12

13

14

15

16

6

7

8 =

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

6

7

8

9

1

2

3

4

10

11

12

13

14

15

16

5

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

Stored in core memory

Page 53: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5

0

0

0

0

0

0

0

0

0

0

0

f

f f

f f f

f f f f

f f f f f

f f f f f f

f f f f f f f

f f f f f f f f

f f f f f f f f f

f f f f f f f f f f

f f f f f f f

G

G G

G G G

G G G G

G G G G G

G G G G G G

G G G G G G G

G G G G G G G G

G G G G G G G G G

G G G G G G G G G G

G G G G G G G 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4 3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

0

0

f f f f

f f f f f f f f f f f f

f f f f f f f f f f f f f

f f f f f f f f f f f f f f

f f f f f f f f f f f f f f f

G G G G

G G G G G G G G G G G G

G G G G G G G G G G G G G

G G G G G G G G G G G G G G

G G G G G G G G G G G G G G G

1i =2i =3i =4i =

Stored in disk

2IO =

Out-of-Core Algorithm

1i =

1

2

3

4

5

6

9

10

11

12

13

14

15

16

7

8 =

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

7

8

9

10

11

12

1

2

13

14

15

16

5

3

4

6

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

Stored in core memory

Page 54: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5

0

0

0

0

0

0

0

0

0

0

0

f

f f

f f f

f f f f

f f f f f

f f f f f f

f f f f f f f

f f f f f f f f

f f f f f f f f f

f f f f f f f f f f

f f f f f f f

G

G G

G G G

G G G G

G G G G G

G G G G G G

G G G G G G G

G G G G G G G G

G G G G G G G G G

G G G G G G G G G G

G G G G G G G 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4 3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

0

0

f f f f

f f f f f f f f f f f f

f f f f f f f f f f f f f

f f f f f f f f f f f f f f

f f f f f f f f f f f f f f f

G G G G

G G G G G G G G G G G G

G G G G G G G G G G G G G

G G G G G G G G G G G G G G

G G G G G G G G G G G G G G G

1i =2i =3i =4i =

Stored in disk

2IO =

Out-of-Core Algorithm

2i =

1

2

3

4

5

6

9

10

11

12

13

14

15

16

7

8 =

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

7

8

9

10

11

12

1

2

13

14

15

16

5

3

4

6

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

Stored in core memory

Page 55: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5

0

0

0

0

0

0

0

0

0

0

0

f

f f

f f f

f f f f

f f f f f

f f f f f f

f f f f f f f

f f f f f f f f

f f f f f f f f f

f f f f f f f f f f

f f f f f f f

G

G G

G G G

G G G G

G G G G G

G G G G G G

G G G G G G G

G G G G G G G G

G G G G G G G G G

G G G G G G G G G G

G G G G G G G 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4 3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

0

0

f f f f

f f f f f f f f f f f f

f f f f f f f f f f f f f

f f f f f f f f f f f f f f

f f f f f f f f f f f f f f f

G G G G

G G G G G G G G G G G G

G G G G G G G G G G G G G

G G G G G G G G G G G G G G

G G G G G G G G G G G G G G G

1i =2i =3i =4i =

Stored in disk

2IO =

Out-of-Core Algorithm

2i =

1

2

3

4

5

6

7

9

10

11

12

13

14

15

16

8 = f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

8

9

10

11

1

2

3

4

12

13

14

6

5

1

5

1

7

6

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

Stored in core memory

Page 56: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5

0

0

0

0

0

0

0

0

0

0

0

f

f f

f f f

f f f f

f f f f f

f f f f f f

f f f f f f f

f f f f f f f f

f f f f f f f f f

f f f f f f f f f f

f f f f f f f

G

G G

G G G

G G G G

G G G G G

G G G G G G

G G G G G G G

G G G G G G G G

G G G G G G G G G

G G G G G G G G G G

G G G G G G G 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4 3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

0

0

f f f f

f f f f f f f f f f f f

f f f f f f f f f f f f f

f f f f f f f f f f f f f f

f f f f f f f f f f f f f f f

G G G G

G G G G G G G G G G G G

G G G G G G G G G G G G G

G G G G G G G G G G G G G G

G G G G G G G G G G G G G G G

1i =2i =3i =4i =

Stored in disk

2IO =

Out-of-Core Algorithm

1i =

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

=

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

9

10

11

12

13

1

5

6

7

14

15

2

3

6

8

1

4

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

Stored in core memory

Page 57: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5

0

0

0

0

0

0

0

0

0

0

0

f

f f

f f f

f f f f

f f f f f

f f f f f f

f f f f f f f

f f f f f f f f

f f f f f f f f f

f f f f f f f f f f

f f f f f f f

G

G G

G G G

G G G G

G G G G G

G G G G G G

G G G G G G G

G G G G G G G G

G G G G G G G G G

G G G G G G G G G G

G G G G G G G 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4 3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

0

0

f f f f

f f f f f f f f f f f f

f f f f f f f f f f f f f

f f f f f f f f f f f f f f

f f f f f f f f f f f f f f f

G G G G

G G G G G G G G G G G G

G G G G G G G G G G G G G

G G G G G G G G G G G G G G

G G G G G G G G G G G G G G G

1i =2i =3i =4i =

Stored in disk

2IO =

Out-of-Core Algorithm

4i =

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

=

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

For each node :

Fetch anterpolated currentsamples from the memoryand the remaining

samples from diskCompute the contribution of

these current samples to thepotentials at future timesteps

Read (previously computed)partial potentials from thedisk and add to thecontribution computed instep (2)

Write the currents at theprevious time steps andthe updated potentialsbeyond time steps intothe future to the disk.

12i-

12i-

2IO

2IO

, 1 Cu u N£ £

9

10

11

12

13

1

5

6

7

14

15

2

3

6

8

1

4

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

Stored in core memory

2IO

12 2i IO- -

Page 58: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

2IO

2IO

12i-

12i-

2IO

12 2i IO- -1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5

0

0

0

0

0

0

0

0

0

0

0

f

f f

f f f

f f f f

f f f f f

f f f f f f

f f f f f f f

f f f f f f f f

f f f f f f f f f

f f f f f f f f f f

f f f f f f f

G

G G

G G G

G G G G

G G G G G

G G G G G G

G G G G G G G

G G G G G G G G

G G G G G G G G G

G G G G G G G G G G

G G G G G G G 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4 3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

0

0

f f f f

f f f f f f f f f f f f

f f f f f f f f f f f f f

f f f f f f f f f f f f f f

f f f f f f f f f f f f f f f

G G G G

G G G G G G G G G G G G

G G G G G G G G G G G G G

G G G G G G G G G G G G G G

G G G G G G G G G G G G G G G

1i =2i =3i =4i =

Stored in disk

2IO =

Out-of-Core Algorithm

4i =

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

=

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

For each node :

Fetch anterpolated currentsamples from the memoryand the remaining

samples from diskCompute the contribution of

these current samples to thepotentials at future timesteps

Read (previously computed)partial potentials from thedisk and add to thecontribution computed instep (2)

Write the currents at theprevious time steps andthe updated potentialsbeyond time steps intothe future to the disk.

, 1 Cu u N£ £

9

10

11

12

13

1

5

6

7

14

15

2

3

6

8

1

4

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

Stored in core memory

Page 59: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

2IO

2IO

12i-

12i-

2IO

12 2i IO- -1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5

0

0

0

0

0

0

0

0

0

0

0

f

f f

f f f

f f f f

f f f f f

f f f f f f

f f f f f f f

f f f f f f f f

f f f f f f f f f

f f f f f f f f f f

f f f f f f f

G

G G

G G G

G G G G

G G G G G

G G G G G G

G G G G G G G

G G G G G G G G

G G G G G G G G G

G G G G G G G G G G

G G G G G G G 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4 3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

0

0

f f f f

f f f f f f f f f f f f

f f f f f f f f f f f f f

f f f f f f f f f f f f f f

f f f f f f f f f f f f f f f

G G G G

G G G G G G G G G G G G

G G G G G G G G G G G G G

G G G G G G G G G G G G G G

G G G G G G G G G G G G G G G

1i =2i =3i =4i =

Stored in disk

2IO =

4i =

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

=

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

Out-of-Core AlgorithmFor each node :

Fetch anterpolated currentsamples from the memoryand the remaining

samples from diskCompute the contribution of

these current samples to thepotentials at future timesteps

Read (previously computed)partial potentials from thedisk and add to thecontribution computed instep (2)

Write the currents at theprevious time steps andthe updated potentialsbeyond time steps intothe future to the disk.

, 1 Cu u N£ £

9

10

11

12

13

1

5

6

7

14

15

2

3

6

8

1

4

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

Stored in core memory

Page 60: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

2IO

2IO

12i-

12i-

2IO

12 2i IO- -1

2 1

3 2 1

4 3 2 1

5 4 3 2 1

6 5 4 3 2 1

7 6 5 4 3 2 1

8 7 6 5 4 3 2 1

9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5

0

0

0

0

0

0

0

0

0

0

0

f

f f

f f f

f f f f

f f f f f

f f f f f f

f f f f f f f

f f f f f f f f

f f f f f f f f f

f f f f f f f f f f

f f f f f f f

G

G G

G G G

G G G G

G G G G G

G G G G G G

G G G G G G G

G G G G G G G G

G G G G G G G G G

G G G G G G G G G G

G G G G G G G 4 3 2 1

12 11 10 9 8 7 6 5 4 3 2 1

13 12 11 10 9 8 7 6 5 4 3 2 1

14 13 12 11 10 9 8 7 6 5 4 3 2 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

0

0

0

0

0

f f f f

f f f f f f f f f f f f

f f f f f f f f f f f f f

f f f f f f f f f f f f f f

f f f f f f f f f f f f f f f

G G G G

G G G G G G G G G G G G

G G G G G G G G G G G G G

G G G G G G G G G G G G G G

G G G G G G G G G G G G G G G

1i =2i =3i =4i =

Stored in disk

2IO =

Out-of-Core Algorithm

4i =

1

2

3

4

5

6

7

8

9

10

13

14

15

16

11

12

=

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

V

For each node :

Fetch anterpolated currentsamples from the memoryand the remaining

samples from diskCompute the contribution of

these current samples to thepotentials at future timesteps

Read (previously computed)partial potentials from thedisk and add to thecontribution computed instep (2)

Write the currents at theprevious time steps andthe updated potentialsbeyond time steps intothe future to the disk.

, 1 Cu u N£ £

9

10

11

12

13

1

1

2

3

4

5

6

7

1

8

4

5

16

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

Stored in core memory

Page 61: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Data Layout: In Core vs. Out-of-Core

g1,

Nf fG G

g1,...,l N=

12

CN

g1,...,l N=

12

1,...,2 IOl=

Stored in core memory

Stored in disk

CN

Stored in core memory

Page 62: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Cost Analysis (No Buffering)

fl

rw

rw lat bw

: Cost of one floating point operation

: Average cost of read/write of one byte from/to disk

(must minimize effect of latency/access time)IO IO

t

t

Nt t Nt= +

Memory:

current/field vectors on grid,l l ¢-G

g C( )N NQ

In-Core TD-AIM Out-of-Core TD-AIM

C( )2IONQ

Disk space: - g C[ 2 ]( )ION NQ -

Time for FLOPs:

4-D blocked space-time FFTs2

T C C gfl[log log( ])N Nt NO N +

Time for I/O: -

2T C C gfl

[log log( ])N Nt NO N +

grw T Clog( )N N Nt O

Page 63: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Data Layout in Disk

g

2 1,...,IOl N+=

Stored in disk

12

CN

- Divide into blocks of 2IO

Page 64: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Data Layout in Disk

g2 1,...,IOl N+=

bufN

buf2N

Store in disk using fixed-length record unformatted direct access files

C buf 1N N- +

Stored in disk

12

CN

bufEach record contains bytes, formed from 2 temporal data for 2 spatial points

IO IOB N B

Page 65: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Data Layout in Disk

bufEach record contains bytes, formed from 2 temporal data for 2 spatial points

IO IOB N B

g

2 1,...,IOl N+=

bufN

buf2N

C buf 1N N- +

Stored in disk

12

CN

1=rec2=rec

l1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

CMemory 2iN

Store in disk using fixed-length record unformatted direct access files

Step: Write the currents at the previous time steps2IO

Page 66: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Data Layout in Disk

bufEach record contains bytes, formed from 2 temporal data for 2 spatial points

IO IOB N B

g

2 1,...,IOl N+=

bufN

buf2N

C buf 1N N- +

Stored in disk

12

CN

l1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

CMemory 2iN

1=rec2=rec

4l=

Store in disk using fixed-length record unformatted direct access files

Step: Write the currents at the previous time steps2IO

Page 67: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Data Layout in Disk

bufEach record contains bytes, formed from 2 temporal data for 2 spatial points

IO IOB N B

g2 1,...,IOl N+=

bufN

buf2N

C buf 1N N- +

Stored in disk

12

CN

l1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

CMemory 2iN

1=rec2=rec

g

2IO

N=rec

4l=

Store in disk using fixed-length record unformatted direct access files

Step: Write the currents at the previous time steps2IO

Page 68: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Data Layout in Disk

bufEach record contains bytes, formed from 2 temporal data for 2 spatial points

IO IOB N B

g2 1,...,IOl N+=

bufN

buf2N

C buf 1N N- +

Stored in disk

12

CN

l1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

CMemory 2iN

4l=

1=rec2=rec

g

2IO

N=rec

g C

buf

1 12IO

N N

N

æ ö÷ç ÷ç ÷= - +ç ÷ç ÷çè ørec

Step: Write the currents at the previous time steps2IO

Store in disk using fixed-length record unformatted direct access files

Page 69: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Data Layout in Disk

bufEach record contains bytes, formed from 2 temporal data for 2 spatial points

IO IOB N B

g2 1,...,IOl N+=

bufN

buf2N

C buf 1N N- +

Stored in disk

12

CN

l1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

CMemory 2iN

1=rec2=rec

g

2IO

N=rec

g C

buf

1 12IO

N N

N

æ ö÷ç ÷ç ÷= - +ç ÷ç ÷çè ørec

8l=

Step: Write the currents at the previous time steps2IO

Store in disk using fixed-length record unformatted direct access files

Page 70: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Data Layout in Disk

bufEach record contains bytes, formed from 2 temporal data for 2 spatial points

IO IOB N B

g

2 1,...,IOl N+=

bufN

buf2N

C buf 1N N- +

Stored in disk

12

CN

l1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

CMemory 2iN

1=rec2=rec

g

2IO

N=rec

g C

buf

1 12IO

N N

N

æ ö÷ç ÷ç ÷= - +ç ÷ç ÷çè ørec

8l=

Step: Write the currents at the previous time steps2IO

Store in disk using fixed-length record unformatted direct access files

Page 71: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Data Layout in Disk

bufEach record contains bytes, formed from 2 temporal data for 2 spatial points

IO IOB N B

g

2 1,...,IOl N+=

bufN

buf2N

C buf 1N N- +

Stored in disk

12

CN

l1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

CMemory 2iN

1=rec2=rec

g

2IO

N=rec

g C

buf

1 12IO

N N

N

æ ö÷ç ÷ç ÷= - +ç ÷ç ÷çè ørec

8l=

Step: Write the currents at the previous time steps2IO

Store in disk using fixed-length record unformatted direct access files

Page 72: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Data Layout in Disk

bufEach record contains bytes, formed from 2 temporal data for 2 spatial points

IO IOB N B

g

2 1,...,IOl N+=

bufN

buf2N

C buf 1N N- +

Stored in disk

12

CN

l1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

CMemory 2iN

1=rec2=rec

g

2IO

N=rec

g C

buf

1 12IO

N N

N

æ ö÷ç ÷ç ÷= - +ç ÷ç ÷çè ørec

gl N=

Step: Write the currents at the previous time steps2IO

Store in disk using fixed-length record unformatted direct access files

Page 73: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Data Layout in Disk

bufEach record contains bytes, formed from 2 temporal data for 2 spatial points

IO IOB N B

g

2 1,...,IOl N+=

bufN

buf2N

C buf 1N N- +

Stored in disk

12

CN

l1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

CMemory 2iN

1=rec2=rec

g

2IO

N=rec

g C

buf

1 12IO

N N

N

æ ö÷ç ÷ç ÷= - +ç ÷ç ÷çè ørec

gl N=

Step: Write the currents at the previous time steps2IO

Store in disk using fixed-length record unformatted direct access files

Page 74: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Data Layout in Disk

bufEach record contains bytes, formed from 2 temporal data for 2 spatial points

IO IOB N B

g

2 1,...,IOl N+=

bufN

buf2N

C buf 1N N- +

Stored in disk

12

CN

l1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

CMemory 2iN

1=rec2=rec

g

2IO

N=rec

g C

buf

1 12IO

N N

N

æ ö÷ç ÷ç ÷= - +ç ÷ç ÷çè ørec

gl N=

Step: Write the currents at the previous time steps2IO

Store in disk using fixed-length record unformatted direct access files

Page 75: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Data Layout in Disk

bufEach record contains bytes, formed from 2 temporal data for 2 spatial points

IO IOB N B

g

2 1,...,IOl N+=

bufN

buf2N

C buf 1N N- +

Stored in disk

12

CN

l1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

CMemory 2iN

1=rec2=rec

g

2IO

N=rec

g C

buf

1 12IO

N N

N

æ ö÷ç ÷ç ÷= - +ç ÷ç ÷çè ørec

gl N=

Store in disk using fixed-length record unformatted direct access files

Step: Fetch current samples from disk

12 2i IO- -

Page 76: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Cost Analysis (with Buffer)

fl

rw

rw lat bw

: Cost of one floating point operation

: Average cost of read/write of one byte from/to disk

(must minimize effect of latency/access time)IO IO

t

t

Nt t Nt= +

Memory:

current/field vectors on grid,l l ¢-G

g C( )N NQ

In-Core TD-AIM Out-of-Core TD-AIM

C g buf( )2 [ 2 ]IO ION N N+ -Q

Disk space: - g C[ 2 ]( )ION NQ -

Time for FLOPs:

4-D blocked space-time FFTs2

T C C gfl[log log( ])N Nt NO N +

Time for I/O:-

2T C C gfl

[log log( ])N Nt NO N +

Tlat

bw

C buf g

T C g

( / ) log

log

( )

+ ( )

IO

IO

t O

t O

N N N N

N N N

Page 77: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Cost Analysis (with Buffer + Parallelization)

Memory:

current/field vectors on grid,l l ¢-G

g C/( )N N PQ

In-Core TD-AIM Out-of-Core TD-AIM

C g buf( )2 / [ 2 ]IO ION P N N-Q +

Disk space: - g C]([ )2 /ION N PQ -

- Every process needs memory space for the buffer- Buffer limits parallel scalability of memory requirement: max C

bufg

2~

[ 2 ]

IO

IO

NP

NN -

Page 78: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Results- Sphere - Model Airplane

Stampede

Page 79: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Sphere

min 1 m, 3

0.24 ns

2, 100 kB

t

IO B

l g= =D =

= =

2

2

ˆ( / 8 )inc 2

c

c bw bw

ˆˆ( , ) cos(2 [ 8 ])

200 MHz, 3 2 ; 100 MHz

t z cz

t xe f tc

f f f

s

s p s

s p

+ ⋅ -- ⋅

= + -

= = =

rr

E r

s(m)L sN CN tN

0.5

1

2

4

8

16

684

3384

10 947

44 595

179 130

742 059

32 2 903 916 3512

3256

3128

380

348

327

318

gN

260

285

300

420

545

760

1375

32

44

74

132

247

471

919

Page 80: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Sphere

102 103 104 105 106 10710-4

10-3

10-2

10-1

100

101

102

103

104

105

Ns

Cor

e M

emor

y R

equi

rem

ent(G

B)

In-core TD-AIMOut-of-core TD-AIMBuffer MemoryFD-AIM

O(Ns3/2)

O(Ns2)

O(Ns1/2)

102 103 104 105 106 10710-2

10-1

100

101

102

103

104

105

Ns

Mar

chin

g/S

olut

ion

Tim

e pe

r tim

e/fre

quen

cy s

tep

(s)

In-core TD-AIMOut-of-core TD-AIMFD-AIM

O(Ns3/2log2Ns)

O(Ns3/2logNs)

Page 81: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Model Airplane

-0.1 0 0.2 0.4 0.6 0.8 1 1.20

0.005

0.01

Range-ctd/2 (m)

P (V

)

FD-AIMTD-AIM

incE

y

z x

k

max

(GHz)

fsN CN tN

2.5

5

10

20

23 217

92 868

371 472

1 485 888 2576 160´

2288 80´

2144 45´

272 27´

gN

500

1000

1600

2500

119

217

416

813

2

2

c bwc

c bw

ˆ( / 8 )inc 2

c

bw max bw

( )sca

ˆˆ( , ) cos(2 [ 8 ])

/ 2.5, 3 2

1 ˆ( , , ) lim ( , )

t y c

j t

r

yt ze f t

cf f f

P t r e d

s

s

w ww w

qw w

p s

s p

q f q w wp

+ ⋅ --

+-

¥-

⋅= + -

= =

= ⋅ò

rr

E r

E r

max

3

1 / 14

2, 100 kB

t f

IO B

g =D =

= =max 20 GHzf =

Page 82: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Model Airplane

104 105 106 10710-3

10-2

10-1

100

101

102

103

104

105

Ns

Cor

e M

emor

y R

equi

rem

ent(G

B)

In-core TD-AIMOut-of-core TD-AIMBuffer MemoryFD-AIM

O(Ns3/2)

O(Ns2)

O(Ns1/2)

104 105 106 107100

101

102

103

104

105

106

Ns

Mar

chin

g/S

olut

ion

Tim

e pe

r tim

e/fre

quen

cy s

tep

(s)

In-core TD-AIMOut-of-core TD-AIMFD-AIM

O(Ns3/2log2Ns)

O(Ns3/2logNs)

Page 83: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Model Airplane @ 10GHz

2 3 4 5 6 7 8102

103

104

IO

Cor

e M

emor

y R

equi

rem

ent (

GB

)

In Core

Out of Core

2 3 4 5 6 7 80

100

200

300

400

500

600

700

800

900

1000

IO

Mar

chin

g Ti

me

per t

ime

step

(s)

In Core

Out of Core~(logNg-IO)

Page 84: AParallel Out-of-Core Algorithm for Time-Domain Adaptive Integral Methodusers.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/gy_iccem... · 2015-03-27 · AParallel Out-of-Core Algorithm

Conclusions

• A parallel multilevel out-of-core algorithm for TD-AIM+ Core-memory requirement only ~ 3-4x FD-AIM when IO = 2 + Buffering => only ~8-10x slower on Stampede when IO = 2

+ Control knob to trade-off memory for speed:

- Buffer limits parallel scalability (can be ameliorated by using hybrid

parallelism on multi-core clusters)

• Acknowledgments

21 log

gIO Nê ú£ £ ê úë û