scientific computing for energy

8/7/2019 Scientific Computing for Energy

1/112


2/112


3/112

TECHNIQUES OF SCIENTIFICCOMPUTING FOR ENERGYAND THE ENVIRONMENT


4/112


5/112

TECHNIQUES OF SCIENTIFICCOMPUTING FOR ENERGY

AND THE ENVIRONMENT

FRDRIC MAGOULSAND

RIAD BENELMIREDITORS

Nova Science Publishers, Inc.New York


6/112

Copyright 2007 by Nova Science Publishers, Inc.

All rights reserved. No part of this book may be reproduced, stored in a retrieval system ortransmitted in any form or by any means: electronic, electrostatic, magnetic, tape, mechanical

photocopying, recording or otherwise without the written permission of the Publisher.

For permission to use material from this book please contact us:

Telephone 631-231-7269; Fax 631-231-8175

Web Site: http://www.novapublishers.com

NOTICE TO THE READERThe Publisher has taken reasonable care in the preparation of this book, but makes no expressed or

implied warranty of any kind and assumes no responsibility for any errors or omissions. No

liability is assumed for incidental or consequential damages in connection with or arising out of

information contained in this book. The Publisher shall not be liable for any special,

consequential, or exemplary damages resulting, in whole or in part, from the readers use of, or

reliance upon, this material. Any parts of this book based on government reports are so indicated

and copyright is claimed for those parts to the extent applicable to compilations of such works.

Independent verification should be sought for any data, advice or recommendations contained in

this book. In addition, no responsibility is assumed by the publisher for any injury and/or damage

to persons or property arising from any methods, products, instructions, ideas or otherwise

contained in this publication.

This publication is designed to provide accurate and authoritative information with regard to the

subject matter covered herein. It is sold with the clear understanding that the Publisher is not

engaged in rendering legal or any other professional services. If legal or any other expert

assistance is required, the services of a competent person should be sought. FROM A

DECLARATION OF PARTICIPANTS JOINTLY ADOPTED BY A COMMITTEE OF THE

AMERICAN BAR ASSOCIATION AND A COMMITTEE OF PUBLISHERS.

LIBRARY OF CONGRESS CATALOGING-IN-PUBLICATION DATA

Available upon request

ISBN 13: 978-1-60692-823-3

Published by Nova Science Publishers, Inc. New York


7/112

CONTENTS

Preface vii

Frdric Magouls and Riad Benelmir

Stability Analysis of Abnormal Multiplication of PlanktonConsidering Flow Velocity 1

T. Yamauchi and M. Kawahara

Achieving High-Performance Computing in Geomechanicsby Development of Parallel Finite Element Package 21

F. Okulicka - Duzewska

Large-Scale Data Visualization Using Multi-Language Programming

Applied to Environmental Problems 37F. Magouls and R. Putanowicz

An Analysis of Flow Around a Propeller UsingFictitious Domain Finite Element Method 69

K. Harada and M. Kawahara

Numerical Simulation of Supersonic Combustion Using Parallel Computing 85E. von Lavante and M. Kallenberg

Index 99


8/112


9/112

PREFACE

Frederic Magoules and Riad Benelmir

The research and development in Scientific Computing and Computational Science

have considerably increased the power of numerical simulation. Engineers and researchers

are now able to solve large and complex problems which were impossible to solve in the

past. This Special Issue presents some techniques, methods and algorithms for solving

engineering problems arising in energy and environment.

The first article by T. Yamauchi and M. Kawahara presents a numerical method for theabnormal multiplication of plankton that happens by the water pollution. In this study, the

basic equation represents food chain of ecological model, which consists of phytoplankton,

zooplankton, and nutrient. The stability problem, the eigenvalue problem and the param-

eter identification technique are reintroduced. Experimental data used in the numerical

simulation arise from the Lake Kasumigaura, located in the Ibaraki Prefecture in Japan. A

discretisation with finite element is considered for the numerical experiments.

The paper of F. Okulicka - Duzewska analyzes the geotechnical problem, the embank-

ment rising, the dam building and the settlement of the underground. A finite element soft-

ware is considered for this analysis. Due to the large amount of data used in such model,these environmental problems appear to be difficult to solve with classical finite element

methods. The use of high performance computers is thus mandatory. For this reason, the

finite element software is parallelized on a network of PCs, and details of the proposed

approach are presented by the authors.

The paper written by F. Magoules and R. Putanowicz describes a technique well suited

for the visualization and analysis of large data sets arising in environmental problems. This

technique is based on a multi-programming approach using the visualisation toolkit VTK

library, and components written in Tcl and C++. VTK is an open-source software system

for visualisation, computer graphics and imaging. Though it is possible to write a whole

VTK application in a scripting language like Tcl, it is more suitable, for efficiency reasons,

to implement some functionality in a compiled language like C/C++ for instance. This is

specially the case when working with large data sets arising from environment analysis for

instance, as presented here. Pieces of code and detailed examples are provided to the reader

in order to alow him to program his own software.

The paper by K. Harada and M. Kawahara describes the analysis of flow around a

propeller. This analysis is related to the minimisation of the turbulence around the propeller

which leads to a lost of energy for the engine. The proposed analysis is based on the

fictitious domain method and a finite element discretisation is performed for the numerical

experiments.Finally, the paper of E. von Lavante and M. Kallenberg discusses the numerical simula-

tion of supersonic combustion using parallel computing. This analysis is related to a better

understanding of the transfer of energy involved in such combustion. For this purpose the


10/112

viii Frederic Magoules and Riad Benelmir

unsteady, three-dimensional, supersonic flow in a channel with transverse hydrogen injec-

tion is simulated. The time accurate computation was accelerated by an implicit method

and implemented on a massively parallel computer. The parallelization is accomplished

using domain decomposition on a distributed memory systems. The relative efficiency and

relative speedup of the parallel algorithm are analyzed for various sizes of the problem andnumber of processor units.

Naturally, the present issue cannot provide a complete record of the many approaches,

applications, features, and numerical methods related to energy and environment. However,

it does give an indication of the progress that is being made in addressing these issues and

the possibilities that are available for future research in this area.

Frederic Magoules

Universite Henri Poincare

Institut Elie Cartan de Nancy, BP 239

54506 Vandoeuvre-les-Nancy Cedex, France

Riad Benelmir

Universite Henri Poincare

Ecole Sup. Sc. Tech. Ing. de Nancy

2 rue Jean Lamour

54519 Vandoeuvre-les-Nancy Cedex, France


11/112

In: Techniques of Scientific Computing for Energy ...

Editors: F. Magoules and R. Benelmir, pp. 120

ISBN 1-60021-921-7

c 2007 Nova Science Publishers, Inc.

STABILITY ANALYSIS OF ABNORMAL

MULTIPLICATION

OF PLANKTON CONSIDERING FLOW VELOCITY

Tomohiro Yamauchi and Mutsuto Kawahara

Department of Civil Engineering, Chuo University,

Kasuga 1-13-27,Bunkyou-ku,Tokyo 112-8551,Japan

Abstract

This paper presents a numerical method for the abnormal multiplication of plank-

ton that happens by the water pollution. In this study, the basic equation represents

food chain of ecological model, which consists of phytoplankton, zooplankotn, and

nutrient. Flow velocity and time-lag increase are added to the ecological model as a

new approach. Velocity of the flow is obtained by using the nonlinear shallow water

equation. The stabilized bubble function finite element method is applied to the spatial

discretization in an analysis of the nonlinear shallow water flow. In this paper, an ab-

normal multiplication is thought as one of the unstable problem.Therefore, if there is

no problem of water quality, the system is stable. The stability of the system is investi-gated introducing the eigenvalues of the basic equation. The stability of the system can

be judged by the eigenvalue based on the Lyapunovs stability theory. In this paper,

the Arnoldi-QR method is used to obtain eigenvalues and eigenvectors of the system.

The Lake Kasumigaura, that is located in the Ibaraki Prefecture in Japan, is selected

and actual data in 1991 is used in order to guess the phenomenon of plankton at the

lake. Mode analysis is employed to make the initial distribution at the lake Kasumi-

gaura. Finally, change of distribution of plankton patchness for various time stage and

equilibrium solution is obtained.

Keywords: plankton, shallow water equation, ecological model, stability, parameter iden-

tification

1. Introduction

Recently, environmental problems are serious problems. Water pollution causes a large

amount of nutrient, because the industrial waste water, that streamed into the river, lake

and sea, etc. includes a lot of nutrient, which means nitrogen and phosphorous. The phy-

toplankton preys the nutrient. As a result, a large amount of nutrient causes an abnormal

multiplication of the plankton. Fish and shellfish die by suffocation because a lot of plank-ton consumes a large quantity of oxygen. Therefore, the abnormal multiplication of the

plankton has damaged seriously fishing industry. In fact, income of villages dramatically

E-mail address: [email protected]


12/112

2 Tomohiro Yamauchi and Mutsuto Kawahara

decrease for heavy damage more than billion yen. This abnormal multiplication of the

plankton is called as red tide or blue-green alagae 1),2),3),4).

Prediction of abnormal multiplication of plankton using numerical analysis leads to pre-

vention of abnormal multiplication. In the recent study, abnormal multiplication is thought

as one of the unstable problem

5),6)

. The purpose of this study is to prevent an abnormalmultiplication of plankton by investigating stability of the system considering flow veloc-

ity. To obtain the initial spatial distribution of plankton at the Lake Kasumigaura, the mode

analysis is applied. This spatial distribution is used as the initial data.

2. Basic Equation

2.1. Nonlinear Shallow Water Equation

The two-dimensional nonlinear shallow water equation is used to caluculate the water flow,

which is written as follows;

u

t+ (u

u

x+ v

u

y) + g

+

x[(

2u

x2+2u

x2) + (

2u

y2+

2v

xv)] + f u = 0, (1)

v

t+ (u

v

x+ v

v

y) + g

+

y[(

2v

x2+

2u

xy) + (

2v

y2+2v

y2)] + f v = 0, (2)

t+ u

x+ v

y+(

u

x+v

y) = 0. (3)

The boundary condition can be expressed as;

u = u on d, (4)

v = v on d, (5)

= on d, (6)

un = unx = un on n, (7)

vn = vny = vn on n. (8)

The stabilized bubble element is used for the discretization by the finite element

method7). The bubble function is capable of eliminating the barycenter point by using

the static condensation. The discretized form derived from the bubble function element is

equivalent to that from the SUPG 9). Therefore, the stabilized parameter which is derived

from the bubble function element is expressed as follows for the momentum equation of the

shallow water flow:

eBui =e,12eA

1e

1t||e||

2e

+ 12

(+ )2||e,j||2e f||e||

2e

, (9)

and for the continuity equation :


13/112

Stability Analysis of Abnormal Multiplication... 3

eBui =e,1

2e

A1e1t||e||

2e

+ 12

( )||e,j||2e

, (10)

where is the stabilized control parameter. From the criteria for the stabilized parametercorresponding to the SUPG, an optimal parameter can be given as follows for the momen-

tum equation of shallow water flow;

eBui = (1

21es +

t)1 (11)

es1 = [(2|Ui|

he)2 + (

4

h2e)2]

12 , (12)

and for the continuity equation;

eBui = (1

21es +

t)1, (13)

es1 = (2|Ui|

he), (14)

where

=Ae||e||

2e

e,12e

, (15)

he =

2Ae, (16)

|Ui| =

u2

+ v2

+ g, (17)

and e is the element domain and u,ve =e

uvd.

2.2. Ecological Model

In this study, a simple mathematical model is employed, which is suggested in 1). There

are many parameters in these equations. The ecological model is shown in Figure 1.

Pt

= D1x 2

Px2

+D1y 2

Py2

uPx

vPy

+ f(P,Z,N), (18)

Z

t= D2x

2Z

x2+D2y

2Z

y2u

Z

xv

Z

y+ g(P,Z,N), (19)


14/112


Figure 1. Ecological System

N

t= D3x

2N

x2+D3y

2N

y2u

N

xv

N

y+ h(P,Z,N). (20)

where, P is Phytoplankton, Z is Zooplankton, and N is Nutrient in which P,Z and N

show the concentration of each component. In these equation,D1x,D1y,D2x,D2y,D3x and

D3y are the non-dimensional diffusion coefficient of P,Z and N, respectively. The terms

f(P,Z,N),g(P,Z,N) and h(P,Z,N) are the biological reaction terms, which are expressed as

follows;

f(P,Z,N) =NP

+NZ[1exp{(P P)}]P, (21)

g(P,Z,N) = Z[1exp{(P P)}] Z2[1exp{(P P)}], (22)

h(P,Z,N) = NP

+N+P +Z2[1exp{(P P)}]. (23)

where positive term underlined means the increase of the time-lag.

3. Stability Problem

3.1. Lyapunovs Stability Theory

The stability analysis based on the Lyapunovs stability theory is employed. Considering

this theory, equilibrium points and perturbation can be thought to research the stability of

the system.Equilibrium means the state points, and the perturbation is the microscopic os-

cillation. In case that the system is completely stable, the oscillation settles down according

as time goes by. But if the system is unstable, according as time passes, oscillation becomes

unlimited. In this study, to decide the stability of the system, eigenvalue is employed. Thejudging criteria of the stability is described in Table1.


15/112


Table 1. Judging Criteria

Eigenvalue system

< 0 Completely stable

= 0 Neutral

> 0 Unstable

3.2. Linearization

In order to obtain eigenvalues of the system, the basic equations are the linearized Lya-

punovs stability theory. At the first equilibrium point is pursuited. The following way is

employed in order to linearize the equations;

1:Considering the solution around the equilibrium points as follows;

P+P,Z+Z,N+N,

where, P,Zand Nmean equilibrium points in each component. The equilibrium pointsare determined if change of value by the incremental method is less than 1.0 105, andP,Z, N are the perturbations. Thus, substituting P + P,Z + Z, and N+N to

eqs.(18) - eq.(20);

(P+P)

t= D1

2(P+P) + f(P+P,Z+Z,N+N), (24)

(Z+Z)

t= D2

2(Z+Z) + g(P+P,Z+Z,N+N), (25)

(N+N)

t= D3

2(N+N) + h(P+P,Z+Z,N+N). (26)

2:Employing Taylor-expansion and omitting terms more than one order, the linearized

equation of eqs.(24) - (26) is obtained as follows;

= F, (27)

where

=

P

ZN

,

F =

D12 +

fP

fZ

fN

gP

D22 +

gZ

gN

hP

hZ

D32 +

hN

.

in whichfP

is the function at the substituted equilibrium solution with differentiation by P.


16/112


3.3. Discretization by FEM

The following perturbations are substituted ;

P = Pet (28)

Z = Zet (29)

N = Net (30)

Using eqs.(28) - (30) and discretizing eq.(27), the following equation is obtained ;

[M] = [H], (31)

where

M=

MM

M

,

H =

D1S + FP FZ FN beta

GP D2S + GZ GNHP HZ D3S +HN

,

M =

VdV,S =

V,i,idV. (32)

4. Eigenvalue Problem

4.1. Arnoldis Method

To obtain the eigenvalue of the system, the Arnoldis method is applied in this research. This

method is enable to decrease the memory of dimension and computation time. Algorithm

for the standard eigenvalues and eigenvectors problem(Cu = u) is as follows;1:Start;Choose an initial vector v1 of unity norm, and a number of step m.

2:Iterate;For j = 1,2,...,m do:

vj+1 = Cvj j

i=1

hi jvi, (33)

wherehi j = (Cvj,vi), i = 1,...., j, (34)

hj+1,j = ||vj+1||2, (35)

vj+1 = vj+1/hj+1,j. (36)

This algorithm produces an orthonormal basis Vm = [v1,v2,...,vm] of the Krylov sub-space Km = span{v1,Cv1,...,C

m1v1}. In this basis the restriction of C to Km is representedby the upper Hessenberg matrix Hm whose entries are the hi j produced by algorithm,i.e.,

Hm = hi j. (37)

The eigenvalues of C are approximated by those of Hm which are :

Hm = VT

m CVm. (38)


17/112


4.2. Application for Generalized Eigenvalue Problem

If one wishes to find out the leading eigenvalue with maximum real part,it is common to

use the shift and invert strategy. If0 is an approximation to an eigenvalue of interest, then

the shifted and inverted problem is;

(C0I)1u = u, (39)

where, = 1/(0).Thus,eigenvalues of C close to 0 correspond to eigenvalues ofeq.(39) with large absolute value, and one expects the Arnoldis method to converge to

such eigenvalues. In order to apply the Arnoldis method to eq.(39) for the generalized

eigenvalue problem eq.(31), eq(39) may be described as;

(H0M)1Mu = u, (40)

and to apply to the Arnoldis method the LU decomposition of H - 0Monce is performed,and then each time(H-0M)

1Mv is needed, we solve (H0M)w = Mv by forward andbackward analysis. This is much more economical than forming the matrix of eq.(40)

explicitly since it is usually full and also its dimension is much larger than M.

5. Parameter Identification

5.1. Performance Function

In case that the parameter in the equation is changed, the stability of the system changes.Obtaining the parameter value in case that the system is stable, the parameter identification

technique is applied. This technique is equal to the estimation with minimization of the

performance function J, which is defined as the sum of square residual between calculated

and observed values. This function is described as follows:

J =1

2

v((k))t((k))dv. (41)

where, is objective eigenvalue,(k) is the eigenvalues of the system . In a word, the

optimal parameter value can be decided to minimize the performance function J applyingthe parameter identification technique.

5.2. Algorithm

In this reseach, the Conjugate Gradient Method is employed to minimize the performance

function J. The algorithm of the parameter identification technique is as follows:

1.Assume initial parameter value k(0), decide convergence criterion J2.Calculate state value (k)(0)

3.Calculate performance function J(0)

4.Calculate sensitivity matrix [ (k)k

](0)

5.Calculate initial gradient d(0) = [ (k)k ](0)

6.Calculate step size so as to minimize J((i) +d(i))


18/112


7.Renew parameter k(i+1) = k(i) +d(i)

8.Calculate state value (k)(i+1)

9.Calculate performance function J(i+1)

10.Calculate sensitivity matrix [ (k)k ](i+1)

11.Calculate = [J

k]

(i+1)

[

J

k]

(i+1)

[ Jk](i)[ J

k](i)

12.Calculate gradient of performance function J; d(i+1) = [ Jk](i+1) +d(i)

13.If|J(i+1)| |J(i)|< , then stop14.Set i = i + 1 and go to 6

5.3. Sensitivity Matrix

In order to solve the sensitivity matrix, the left eigenvalue problem has to be used in this

study. Left eigenvalue problem is as follows:

M = H, (42)

MT= HT. (43)

where, MT(or HT) is the transposed matrix of M(or H).The eigenvector of eqs.(42) and (43)

are not the same, but the eigenvalues are the same. In this study, the maximum eigenvalue

of real part is investigated and eigenvectors of real and imaginary part are employed to solve

sensitivity matrix. The real part of sensitivity matrix is able to be obtained as follows :

Rek

= AC+BDA2 +B2

. (44)

where,

A = TreMreTimMim, (45)

B = TreMim +TimMre, (46)

C= TreH

kre

Tim

H

kim, (47)

D = Tim

H

kre +Tre

partialH

k im. (48)

re:Real part of eigenvectors

im:Imaginary part of eigenvectorsre:Real part of eigenvectors

im:Imaginary part of eigenvectors

Calculating this matrix, parameter identification technique can be applied and the opti-

mal parameter value which makes the system stable can be obtained.


19/112


6. Initial Distribution of Plankton

6.1. Modal Analysis

To represent the whole distribution of the plankton as the initial distribution,the concept of

the modal analysis is utilized as shown in Figure 2. If the eigenvalue of the linear Laplacian is denoted as 2 in area V, spectrum of linear Laplacian is determined by the Helmholtzequation as follows;

Figure 2. Modal analysis

2+2= 0, (49)

where

2 2

x2+

2

y2, (50)

In eq.(49), presents basic mode of phytoplankton, zooplankton and nutrient. Theboundary condition is as follows;

n = 0. (51)

6.2. Eigenvalue Problem by FEM

To obtain the eigenvalues 2 and eigenvectors , the finite element method is employed.The Galerkin method is used for the spatial discretization of eq.(49).

S2M = 0, (52)

whereS =

V,i,idV, (53)

M =

VdV. (54)


20/112


Eq(52) is dealed with as the general eigenvalue problem. The Householder-QR method

is employed to find the eigenvalues 2. However this method cant be applied to the general

eigenvalue problem. Therefore this problem is transformed the into standard eigenvalue

problem. Matrix M is symmetric, therefore the matrix can be divided into two matrices

by the Choleski Method; M = LTL, (55)

Substituting eq.(55) into eq.(52);

S2LTL = 0, (56)

where eigenvector is replaced by using the following equation;

z = L, (57)

then it is obtained thatSL

1

z = 2LTL, (58)

2z = LT

SL1

z, (59)

where

A = LT

SL1

, (60)

Substituting eq.(60) into eq.(59), the following equation can be derived ;

2z = Az. (61)

To obtain the eigenvalues 2 and the eigenvectors z by eq.(61), the Householder-QRMethod and the Inverse iteration method is employed. And equals to L1z, thus, eigen-

vector is found by the Backward substitution method 8).

6.3. Superposition of Spectra

6.3.1. Performance Function

Two-dimensional distribution is calculated by the superposition of the eigenmode and ob-

servation data. Then obtained spectra which represent the state for spatial density is super-

posed. It is called as the Modal analysis method. The method for superposition is calculated

as follows. The performance function is defined as;

J=1

2

V

(u u)2dV, (62)

where

u =n

i=1

ciui, (63)

Determine c1-cn so as to minimize J,where;

J=1

2

V

(u u)2dV, (64)


21/112


J =1

2

mx

j=1

(uj uj)2, (65)

J=1

2

mx

j=1

(uj22 uj uj + uj

2), (66)

J =1

2

mx

j=1

[(n

i=1

ui jci)22(

n

i=1

ui jci)uj + uj2], (67)

J

Cl=

1

2

mx

j=1

[(2n

i=1

ui jci)ul j2ul j uj], (68)

=mx

j=1

[(n

i=1

ui jci)ul j ul juj], (69)

=mx

j=1

[ul j(n

i=1

ui jci uj)]. (70)

(l = 1,2,3,,n)

6.3.2. Minimization Method

Spatial distribution is made by the superposition of each eigenmode. Therefore, mode with compo-

nent influence by the unknown constants ci have generated the spatial distribution. The conjugate

gradient method is employed for the above equations to obtain the ci.

7. Numerical Example

7.1. Verification of Ecological Model

7.1.1. Case 1

Figure 3 shows the finite element mesh. The total number of element and node are 400 and 303,

respectively.

Figure 3. Used Mesh

Assuming that the sufficient nutrient change of phytoplankton, zooplankton, nutrient is com-

puted and represented in Figures 3-6. Vertical axis is normalized fraction of nutrient Nt. Horizontal

axis is the scaled distance of plankton patchness.

Figure 4 shows the initial condition. In figure 5, phytoplankton increases as the nutrient de-

creases and all patchnesses are moved by velocity.

In figure 6, zooplankton increases as phytoplankton decreases, because zooplankton absorbs

phytoplankton. The nutrient is also preyed by the phytoplankton. All patchnesses dont move out-siede through the boundary. In figures 5 and 6, all patchness is moved by velocity and diffusion.

The small amount of the nutrient remained mainly due to the extinction of zooplankton. In figure

7, equilibrium solution is obtained. Equilibrium solution means balanced value on the ecological

model.


22/112


0

0.2

0.4

0.6

0.8

1

-30 -20 -10 0 10 20 30

NormalizedFra

ctionofNt

Scaled Distance

PhytoplanktonZooplankton

Nutrient

Figure 4. t = 0.00

0

0.2

0.4

0.6

0.8

1

-30 -20 -10 0 10 20 30

NormalizedFractionofNt

Scaled Distance


Nutrient

Figure 5. t = 3.00

0

0.2

0.4

0.6

0.8

1

-30 -20 -10 0 10 20 30

NormalizedFractionofNt

Scaled Distance


Nutrient

Figure 6. t = 9.00


23/112


0

0.2

0.4

0.6

0.8

1

-30 -20 -10 0 10 20 30

NormalizedFra

ctionofNt

Scaled Distance


Nutrient

Figure 7. t = 50.00

7.1.2. Case 2

Figure 8. Used Mesh

The mesh in Figure 8 is used in case 2. The total number of element is 3600 and that of node is

1861.

Figure 9 is initial condition of the nutrient. Phytoplankton and zooplankton are constant. From

figure 10, that nutrient diffuses and moves by flow velocity is confirmed.

7.2. Analysis in Lake Kasumigaura

In this research, the Lake Kasumigaura is chosen as the analysis field. This lake is with area of 220

square kilometers which is the second in size within Japan.

In this lake,there have been water quality problems and its damage has been very serious. One of the

famous problems is outbreak of Microcystis aeruginisa.It is a kind of phytoplankton like red tide,

and by the eutrophication, the water quality problem like Microcystis was taken place. Location

of the Lake Kasumigaura is shown in Figure 11. The mesh is used in Figure 12. The total number

of element is 1409 and that of node is 804.


24/112


N

0.95

0.9

0.850.8

0.75

0.7

0.65

0.6

0.55

0.5

0.45

0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

Figure 9. t = 0.00

Table 2. Biological Parameter

The parameter values of the system are described on Table 2.

Parameter difinition

Micahelis constant 0.1

Zooplankton maximum grazing threshold 1.2

Zooplankton egestion coefficient 2.31

Ivlev constant 1.0

p Phytoplankton loss coefficient 0.083

Zooplankton grazing threshold 0.15

7.2.1. Initial Distribution

Figures 13 and 14 are results of modal analysis. The convergence of the performance function J is

shown in Figure 13. Figure 14 gives control vector ci to each spectrum, it is confirmed which mode

has significant influence on the components in the Lake Kasumigaura.

Figure 15 represents appearance of initial distribution of phytoplankton in time. The plankton

patchness is changed for predatism and velocity of flow on the Lake Kasumigaura. In this study, the

values in figure 16 are regarded as equilibrium solution, which means balanced value in ecological

system.

7.2.2. Stability Analysis

In this study, it is thought to combine the outbreak of the planktons with stability problem employingeigenvalues. For example, the maximum eigenvalue is negative, this system is stable. However, if it

is positive, it is considered that the system is unstable. Figure 17 shows the stability of the system.

Changing parameter , the system shifts from stable to unstable. The parameter value of which


25/112


N

0.012

0.011

0.010.009

0.008

0.007

0.006

0.005

0.004

0.003

0.002

0.001

Figure 10. t = 10.00

Figure 11. Place of the lake Kasumigaura


26/112


Figure 12. Used mesh

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0 10 20 30 40 50 60 70 80 90

PerformanceFunctionJ

Iteration

J

Figure 13. Performance function of Phytoplankton

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 5 10 15 20 25 30

controlvalue

mode number

control value

Figure 14. Control Quantity of Phytoplankton


27/112


P

3.22

2.98615

2.75231

2.51846

2.28462

2.05077

1.81692

1.58308

1.34923

1.11538

0.881538

0.647692

0.413846

0.18

Figure 15. t = 0.00(phytoplankton)

P

3.22

2.98615

2.75231

2.51846

2.28462

2.05077

1.81692

1.58308

1.34923

1.11538

0.881538

0.647692

0.413846

0.18

Figure 16. t = 6.00(phytoplankton)


28/112


-0.1

-0.08

-0.06

-0.04

-0.02

0

0.02

0 2 4 6 8 10 12 14 16 18 20

falpha.r

Maximume

igenva

lue

Parameter alpha

Figure 17. Stability of The System

0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0 50 100 150 200 250 300

performancefunctionJ

iteration

dej.r

Figure 18. Performance Function

-0.48

-0.46

-0.44

-0.42

-0.4

-0.38

-0.36

-0.34

-0.32

0 50 100 150 200 250 300

maximume

igenvalue

iteration

emax.r

Figure 19. Maximum eigenvalue


29/112


0.7

0.75

0.8

0.85

0.9

0.95

1

1.05

1.1

0 50 100 150 200 250 300

parameteralpha

iteration

balpha.r

Figure 20. Parameter

changes the stability is guessed about 9.45. From this result, in case that parameter is small,abnormal multiplication occurs. That is small means that phytoplankton absorbs large amount ofnutrient. Figure 18 shows the convergence of the performance function. Figure 19 shows that of real

part of maximum eigenvalue, and Figure 20 shows the convergence of . In this case, convergentvalue of the performance function J is 1.0 105. From the result, in case that maximum eigenvalue

is -0.32, parameter is 1.1. Therefore, 1.1 is the value of parameter which makes the system isstable.

8. Conclusion

In this study, two-dimensional spatial distribution is made by the superposition of spectra of mode

number from 1 to 30. From Figure 5, mode No.1 has a significant influence on spatial distribution.

Relation of food chain is represented and applied to the phenomena in the Lake Kasumigaura. The

nonlinear shallow water equation is employed to represent influence of flow velocity. The computed

result is used as the initial distribution of the stability analysis.

The main purpose of this study is to judge the stability of the system from the real part of the

maximum eigenvalue. Figure 17 is the result of the forward analysis, Figures 18, 19 and 20 are

results of backward analysis about parameter . When parameter is small, phytoplankton absorbslarge amount of nutrient. When the parameter is 1.1, the system is stable.

References

[1] J.S.Wroblewski and J.J.OBrien(1976), A Spatial Model of Phytoplankton Patchiness. Ma-

rine Biology 35, 161-172.

[2] J.S.Wroblewski(1977), A model of Phytoplankton plume formation during variable Oregon

upwelling, Journal of Marine Research, 358-394.

[3] N.F.Britton (1999), Reaction-diffsion equations and their application to biology , AcademicPress; pp109-137.

[4] Peter J.S. Franks and Changsheng Chen (1996), Plankton prouction in tidal fronts: A model

of Georges Bank in summer, Journal of Marine Research; 54: pp631-651.


30/112


[5] G.Ono and M.Kawahara (2004), Stability Analysis of Multiplication of Plankton Using Pa-

rameter Identification Technique. Int. J. Num. Meth. Fluids Vol. 44 Num. part1 pg71.

[6] Y.Ding and M.Kawahara (1998), Bifurcation Analysis of Brown Tide in Tidal Flow Using

Finite Element Method, Oceanographic Literature Review pp502-502.

[7] J.Matsumoto, T.Umestu and M.kawahara(1998), Shallow Water and Sediment TransportAnalysis by Implicit FEM, Journal of Applied Mechanics , vol.3, 263-274.

[8] N. Nayar and J. M. Ortega (1993), Computing of Selected Eigenvalues of Generalized Eigen-

value Problems, Jour. Comp. Phys.; 108: pp8-14.

[9] Hughes, T.J.R., Framea, L. P. and Balestra, M. (1986), A New Finite Element Formulation

for Computational Fluid Dynamics, V. Comp. Meth. Appli. Meth. Eng. 59, 85-99.


31/112

In: Techniques of Scientific Computing for Energy ...

Editors: F. Magoules and R. Benelmir, pp. 2135

ISBN 1-60021-921-7

c 2007 Nova Science Publishers, Inc.

ACHIEVING HIGH-PERFORMANCE COMPUTING

IN GEOMECHANICS BY DEVELOPMENT

OF PARALLEL FINITE ELEMENT PACKAGE

Felicja Okulicka - Duzewska

Faculty of Mathematics and Information Science

Warsaw University of Technology, Pl. Politechniki 1,

00-661 Warsaw, POLAND

Abstract

The parallelization of the finite element method (FEM) algorithm is considered.

The parallel versions of the FEM package are developed on the base of the sequential

one. The elasto-plastic behavior of the geotechnical constructions can be modelled,

calculated and analyzed by the package. The Cray Fortran compiler directives are

applied for the parallelization of source code for shared memory machines and MPI

library for distributed environment. As the engineering example of the geotechnical

problem the embankment rising, the dam building and the settlement of the under-

ground are remodelled, calculated and analyzed.

1. Introduction

Finite element method (FEM) is the most general and powerful tool for solving engineering

problems. The FEM algorithm, well known and widely used in practice, is very appro-

priate for the parallelization. Due to the parallelization of the code the high performance

calculation can be done, very large structures can be modelled and considerable speed-up

is reached. For smaller engineering problems it would be nice to have result as fast as a

draw a mouse to a new pixel. Considering the finite element modelling, one of the essentialproblems, which we face in software development, is the parallelization of the existing se-

quential codes, which very often are developed by years. The question arises if is it worthy

done and how great effort is required. Advantages and difficulties of the parallelization

of the finite element method package are presented and discussed in the paper. The FEM

package Hydro-Geo oriented at hydro and geotechnical problems is presented in Section

2. The program is developed at Warsaw University of Technology and next extended to

allow the parallel calculations. The sequential version of the program is the starting point

for developing the parallel versions step by step. The package is composed of the three

main programs: the preprocessor for mesh generation and preparing the data, the processorfor main mechanical calculation and the graphical post processor. In the paper two paral-

lel versions of processor are compared: the first working on the shared memory machines

E-mail address: [email protected]


32/112

22 Felicja Okulicka - Duzewska

and the second in the distributed environment. In section 3 the numerical procedure imple-

mented in the package is recalled after [6, 7]. Section 4 contains the algorithm of processor

for shared memory machines. In section 5 the message passing method implemented in

distributed Hydro-Geo is described. The computational results are included in section 6.

The version for shared memory machines is implemented and tested due to the access tothe supercomputer Sun 10000E, owned by COI PW (Computing Center of Warsaw Univer-

sity of Technology). The distributed version of the package is implemented and tested due

the support of the European Community - Access to Research Infrastructure action of the

improving Human potential programme (contract No HPRi-1999-CT00026).

2. Finite Element Method Package Hydro-Geo

The finite element package HYDRO-GEO [8] is oriented at hydro and geotechnical prob-

lems. The structure of the package is drown on Figure 1.

management shell

preprocessor processor postprocessor

datastructuresoil water

graphical

presentation

Auto

CAD

interface

mesh

generation

coupledresults

selection

Figure 1. Relation between various programs in the finite element package HYDRO-GEO.

Finite element method package algorithm can be divided into three separate parts:

1. preprocessor for mesh generation, mesh optimalization and data input

2. processor for stiffness matrix calculation, solving the global set of equations and analysis

of strains and stresses

3. postprocessor for graphical and numerical presentation of the computed resultsThe above tasks are independent and they are realized by separate modules of the package.

In Hydro-geo these programs can run under the management shell or can be executed on

different machines as well, where the data between programs can be sent by the net. The


33/112

Achieving High-Performance Computing in Geomechanics... 23

data transfer between modules are due to the text files. In the package the format of the data

transfer files between modules are fixed. It allows to exchange the parts of the package. It

is very useful when we take under consideration the data preparation and the representation

of the results. In fact in the package a few preprocessors exists. They are written in Fortran

and in C. The main part - processor was developed by years by group of persons. It iswritten in Fortran.

The most time-consuming part of the modelling is the processor in which the numerical

finite element algorithms are implemented what is really worth to be done in parallel. In

the processor the coefficient matrix of the set of equations is calculated for each stage of the

construction building and each time increment. Parallel calculation speeds up the process

and putting data into the distributed memories increases the number of elements that can be

proceeded. The set of equation is solved several times. The parallel solver can also speed

calculation process radically. The structure of the package gives the opportunity to execute

the calculation on parallel machine without changing the format of data input and output.

The parallelization of the preprocessor presented in the paper is done in such a way, that the

procedures responsible for modelling are not change and the program can be developed by

others without problems.

3. Numerical Procedure

Description of different mechanical phenomena such as flow, mechanical behavior, thermal

effects, leads to coupled systems of differential equations. To solve a certain initial bound-

ary value problems, the finite element methods can be used. In such situation where a fewphenomena are taken into account the final form of global equation set takes the block form.

In general, the times when important phenomena has been considered separately belongs to

the past. Now we want to model very complicated and complex effects. For example if we

consider the car engine we have to solve mechanical and thermal differential equations as

a coupled system. The coupled systems appear in modern mechanics very often. Ground

waters flow and mechanical behavior deformation and stressed, transport of the pollutants,

thermal flow etc. Coupled problems are much more complicated, comparing each effect

considered separately but solving of them gives very realistic behavior of complex prob-

lems [12]. To solve system of linear equations obtained during the modelling of coupled

systems the standard solvers, which are available from the Internet, can be used. Here

we consider the direct solvers, the Scalapack [24] library for distributed blocked matrix

equations is very appropriate for the problem. The iterative solvers needs preconditioning

because even for relatively not too big problems the convergence of algorithm is difficult to

reach.

In Hydro-geo the virtual work principle, continuity equation with boundary conditions

is the starting points for numerical formulation. The finite element method is applied to

solve initial boundary value problems. Several procedures stemming from elasto-plastic

modelling can be coupled with the time stepping algorithm during the consolidation pro-

cess. The elasto-plastic soil behavior is modelled by means of visco-plastic theory (Perzyna,1966). The finite element formulation for the elasto-plastic consolidation combines over-

lapping numerical processes. The elasto pseudo-viscoplastic algorithm for numerical mod-

elling of elasto-plastic behavior is used after Zienkiewicz and Cormeau (1974). The sta-


34/112


bility of the time marching scheme was proved by Cormeau (1975). The pseudo-viscous

algorithm developed in finite element computer code Hydro-Geo is successfully applied to

solve a number of boundary value problems, Dluzewski (1993). The visco-plastic proce-

dure was extended to cover the geometrically non-linear problems by Kanchi et al (1978)

and also developed for large strains in consolidation, Dluzewski (1997) [6]. The pseudo-viscous procedure is adopted herein for modelling elasto-plastic behavior in consolidation.

In the procedure two times appear, the first is the real time of consolidation and the second

time is only a parameter of the pseudo-relaxation process. The global set of equations for

the consolidation process is derived as follows

KT L

LT (S+tHi)

ui

pi

=

0 0

0 tHi

ui

pi

+

Fi

q

(1)

where KT is the tangent stiffness array, considering large strains effects, L is the coupling

array, S is the array responsible for the compressibility of the fluid, H is the flow array,

u are the nodal displacements, p are the nodal excesses of the pore pressure, Fi is the

load nodal vector defined below

Fi = FL +Ri

I+Ri

II (2)

Fi is the load increment, RiI is the vector of nodal forces due to pseudo-visco iteration,

RiII is the unbalanced nodal vector due to geometrical nonlinearity. Ri

I takes the following

form

RiI =

(i1)

t+t V

BT(i1)D(t+tt+t

pi )

t+ti1 dv (3)

and is defined in the current configuration of the body. The subscripts indicate the con-

figuration of the body, and superscripts indicate time when the value is defined (notation

after Bathe (1982)). RiI stands for the nodal vector which results from the relaxation

of the stresses. For each time step the iterative procedure is engaged to solve the material

non-linear problem. The i-th indicates steps of iterations. Both local and global criterions

for terminating the iterative process are used. The iterations are continued until the calcu-

lated stresses are acceptable close to the yield surface, F Tolerance at all checked points,

where F is the value of the yield function. At the same time the global criterion for this

procedure is defined at the final configuration of the body. The global criterion takes itsroots from the conjugated variables in the virtual work principle, where the Cauchy stress

tensor is coupled with the linear part of the Almansi strain tensor. For two phase medium,

the unbalanced nodal vector RiII is calculated every iterative pseudo-time step.

Rk1 =

t+tV

NTft+tdV+

t+tS

NTtt+tdS

(k1)t+t V

BT(k1)D(t+tt+t

j(k1)+mt+tt+tp(k1))

(k1)t+t dV (4)

The square norm on the unbalanced nodal forces is used as the global criterion of equi-

librium. The iterative process is continued until both criterions are fulfilled.


35/112


4. Parallel Hydro-Geo For Shared Memory Supercomputers

In programs for shared memory machines all variables are visible by all threads created

during the program execution. The most popular and simple way of the parallelization

is the division of the loops into threads, which run on the separate processors. We can

make the compiler do it automatically by adding the directive autoparallel during

the compilation process. Any changes need not be make in the source code. Explicit

parallelization is reached by putting directives into source code before the parts that can

be execute concurrently. The number of processors should be known during compilation

because it determines the number of threads. The parallel versions of the FEM package are

built on the base of the sequential one. The structure of the package is not changed. In the

first step all auxiliary files for keeping data during the calculation process are cancelled.

All data are put into the memory. The number of read/write on/from the disk operations

is reduced. The order in which single elements are calculated became not important.

That allows us to parallelize the main loops which calculate the local values and the local

matrices for single elements. The Hydro-Geo processor algorithm for shared memory

machines with compiler parallelizing directives can be written as follows [14]:

Start

Data reading, initial computations

For each stage of the construction and each increment of load do

Read the data specific for the stage

Parallelize the following loop

For each element doCalculate the local stiffness matrix,

coupling matrix, flow matrix

Calculate the initial stresses

End do

Calculate the global set of fully coupled system

First part of the solver (forward substitution)

- parallel calculation

For each Gauss point do

Second part of the solver (backward substitution)-parallel calculation

Parallelize the following loop

For each element do

Calculation of strains and stresses

End do

Print the result in the disk file

End do

End do

Stop

The loops calculating values for each element (the local stiffness matrix, coupling

matrix, flow matrix, stresses, strains) are divided into threads that were executed con-


36/112


currently. It is a kind of domain decomposition, done by splitting the set of elements

into subsets. The calculated variables are visible for the rest of program commands and

procedures. Such approach needs big amount of memory for big problems. The speed-up

is reached not only due to parallelism but due to the reduction of disk operations as well.

5. Distributed Calculation

5.1. Distributed FEM Package

The Message Passing Interface (MPI) standard is used as the tool for parallelization [13, 25].

For distributed memory machines the number of processes is created during the execu-

tion of the parallel program. The processes are distinguished by own unique names called

ranks, which are integer numbers. They have their own private variables. Each process has

to have copies of all variables needed for calculations. All results calculated by one process

and needed by another should be send or broadcast. In our approach only one process reads

data. It will be called master. Others obtain the data by broadcast from master, make

calculations using received data and send the results needed for solving global set of equa-

tion back to master. The distributed version bases on the program working in memory - no

auxiliary files for keeping data during the calculation process are created.

The parallelization is similar to the one described in the previous section [14, 17]. The

calculation is done concurrently in such a way that the loops, which calculate the local

values for each element, are divided between processes. Each process calculates local

values connected with single element (the local stiffness matrix, coupling matrix, flow

matrix, initial stresses) for own private subset of elements. The subsets are determined at

the beginning and remain fixed during whole calculation. All processes know which subset

of elements belongs to each one. Each process keeps data connected with elements for

his private subset only. When the local matrixes are calculated, they are sent to master,

which calculates the global matrix for fully coupled system and solves the set of linear

equations. The result is broadcast to all processes to allow them to continue the calculation.

For master process which is the process with rank 0 the algorithm can be written as follows:

Start

Data reading, initial computationsSharing the computation -

determine the subsets of elements calculating by separate processes

Broadcast read data to other processes


For my subset of the set of elements do

Calculate the local stiffness matrix, coupling matrix , flow matrix


End do

Synchronization point 1Gather the local stiffness matrices from all processes

Calculate the global set of fully coupled system



37/112


Solve the set of equations

Synchronization point 2

Broadcast the solution to all processes

For my subset of the set of element do

Calculation of strains and stressesEnd do

End do


Receiving the results from all processes

Printing of the calculated values

End do

Stop

Processes with ranks greater than 0 receive the read data, initialize the data con-

nected with their subsets of elements, calculate the elements of local matrices connected

with their private subsets of elements and send them to the master. The algorithm for slave

processes is as follows:

Start

Receiving of data from process 0



Calculate the local stiffness matrix, coupling matrix, flow matrix


End do


Send the local matrices to process number 0



Receive the solution from the process number 0


Calculation of strains and stressesEnd do

End do


Sending the results to process 0

End do

Stop

The synchronization points are added to secure the proper communication and exchange ofdata.


38/112


Table 1. Time table of the embankment rising

Stage Description Time increment Total time

No days days

0 Initial stresses 0 0I Rising of stage I 12 12

Consolidation 189 201

II Rising of stage II 12 213


III Rising of stage III 12 464


5.2. Parallel Numerical Algorithm for Solving the Linear Equations for Con-solidation Problem

The block formulation of the coupled problems makes natural the application of the block

methods for solving the sets of linear equations. The large matrixes can be split into blocks

and put into separate memories of the net of computers. The parallel calculations are

reached due to the matrix operations on separate blocks. The standard numerical algo-

rithms should be rebuilt for the block version or the standard libraries can be used [3, 4, 5].

For big problems matrix of the system of linear equations is put on distributed memories

and iterative methods is used to obtain solution of the set of equations. For the consolida-tion problem the coefficient matrix is ill-conditioned [22]. The preconditioners are needed

to improve the convergence although for not big problem [1, 2, 9, 10, 11, 12, 19, 21, 22, 23].

6. Engineering Problems

6.1. Embankment Rising

To study the influence of the large deformation description rising of the embankment on

peat is modelled [18].

The Coulomb-Mohr yield criterion and the non-associated flow rule is used with

dilatancy angle equals zero. The permeable boundary below the peat layer is assumed. The

layer is 10m thick. The embankment slope is 1:2. The embankment is built in four stages.

At the beginning the initial stresses are introduced into subsoil. The first stage of the

embankment is risen up to the height of 2.0 m, the second up to 4.0 m, the third up to 6.0

m and the fourth up to 7.0 m. The timetable of the embankment rising is given in the Table

1. The mesh contains 1879 nodes. The example of the calculating results - the pressure are

presented on Figure 4.

The mesh is shown in Fig.2. Six nodded izoparametric elements are used. The

non-consistent formulation of the consolidation is applied (herein, pore pressures are

calculated in all nodes). The three different times are compared: real wall time, processor


39/112


Figure 2. Embankment rising - finite element mesh.

calculation time - user time from the point of view of system and system time i.e. time for

synchronization , management of disk and memory access. The calculations were made for

elastic model on Sun6500. The reached speedup is presented on Figures 5. The speedupdepends on the size of the problem - on shared memory machines we did not calculate big

models. That is why the results are not spectacular.

6.2. Besko Dam

The Besko dam has been risen on the Carpathian flysch [15, 16]. The height of the dam is

about 40m. The inclined schist layers are located in the subsoil. The parallel schist layers

with various material parameters create the specific foundation typical for Polish dams in

the south. The dam is built from concrete. The clay-concrete screen of 0.8 m thickness is

performed. The height of the screen is about 25 m.

Figure 3. Speed up of user time reached on Sun6500.

The numerical modelling is done in three stages. In the first stage the initial stressesare introduced into the subsoil. In the second one, the heavy concrete dam is built. The

special teeth are done between the subsoil and the dam body for better interaction between

the dam and subsoil. The rising of the dam is done by adding the elements. In the third


40/112


stage the loading causes by filling the reservoir is applied.

Figure 4. Embankment rising - excess of the pore pressure.

Figure 5. Besko dam - calculated stresses.

The performance of the different parallel versions of the package is compared for

user time. Real time strongly depends on the number of users concurrently working on

the supercomputer. The system time is connected mostly with synchronization of threads

and the number of input/output commands which is the same in all parallel versions. The

average speed up for sequential version in the memory comparing the sequential version

of the HYDRO-GEO processor working with the auxiliary files is about 10. In the case

of automatic parallelization the speed up for user time comparing the sequential version

working into memory is about 2 and does not change much when we change the number

of processors.

6.3. The Settlements of Warsaw Underground

The settlements of the Warsaw underground structures of the metro station are analyzed.

The calculation are performed in six stages:

1 - introduction of the initial stresses in subsoil, introduction of diaphragm wall, construct-


41/112


ing of the ground ceiling,

2 - soil excavation

3 - foundation plate constructing

4 - adding columns

5 - loading from the station trains and traffic6 - extra loading from 10 floors building

Figure 6. Besko dam - isolines of displacements.

In the each stage the new arrangement of the global equilibrium system is done.Some additional boundary conditions (to support internal walls) are changing the global

numbering of equations. The problem is nonlinear due to elasto-plastic soil models based

on Coulomb-Mohr yield criterion and nonassociated flow rule [2].

The chosen results are shown in the Fig. 7 , 8 and 9.

The three different times are compared: real wall time, processor calculation time - user

time from the point of view of system and system time i.e. time for synchronization,

management of disk and memory access. To compare the speed up the calculations are

done for elastic model. The maximum speed-up for user time is bounded, by Amdahls

law, which says that the maximum speed-up does not depend on number of processors.

The parallel versions are compared with the sequential version working in the memory

because they based on it.

The average speed up is following:

1. for version of the processor obtained by the compilation with the option autoparallel the

speed up is about 2

2. for version of the processor obtained by the compilation with the option parallel (i.e.

explicit and auto) the speed up is about 3,5

3. for version of the processor obtained by the compilation with the option parallel ( i.e.explicit and auto) and with band matrix parallel solver the speed up is about 5.

In our case the half of the calculation of the processor should be done sequentially. The


42/112


Figure 7. Warsaw underground - the finite element mesh, 8-noded izoparametrical elements

are used.

Figure 8. Warsaw underground - the finite displacements in the form of contour lines.

speed-up reached by the parallel package, comparing the sequential version is about 2 for

different problems. Comparison of calculation time of both versions is possible only for

small problems. Bigger problems can not be calculated in sequential way in the memory

only - auxiliary files should be used to keep data between called procedures because the

capacity of the memory of the single machine is too small. For big problems the calculationwithout using disc for writing/reading intermediary results is possible only when the data

is divided between different machines memory. Big problems can be calculated quickly -

keeping all data in memory only in parallel.

Conclusions

The shared memory version in practice does not work for problems with huge number of

elements. Such approach leads to difficulties with shared memory access via bus. It can be

used for programs with large number of iterations for Gauss points. This version is easier toimplement because there is no exchange of data between processes. The compiler ensures

the synchronization. Considering the parallelization of the finite element source code the

first steps are obvious. First the element procedures (calculating the stiffness matrix and


43/112


Figure 9. Warsaw underground - the displacement of the station with the subsoil.

nest calculating the strains and stresses) are parallelized. In the second step the frontal

procedure for solving the set of linear equations are replaced by band matrix solver. The

distributed version is not limited in the number of elements in calculated problems. It can be

used for really huge models. For small models the speed up is small or the parallel version

can work longer than the sequential one because of the communication and synchronization

procedures.

References

[1] O.Axelsson, Iterative solution Methods, Cambridge 1994

[2] O.Axelsson, V.A.Barker, Finite Element Solution of Boundary Value Problems , Aca-

demic Press,Inc. 1984

[3] R.Barret, M.Berry, T.Chan, J.Demmel, J.Donato, J.Dongara, V.Eijkhout, R.Pozo,

C.Romine, H. Van der Vost, Templates for the Solution of Linear Systems: Building

Blocks for Iterative Methods, SIAM, 1994

[4] J.J.Dongara, Performance of Various Computers Using Standard Linear Equations

Software, 1999

[5] J.Demmel, Applied numerical linear algebra , 1997

[6] J. M. Duzewski, Non-linear consolidation in finite element modelling, Proceedings

of the Ninth International Conference on Computer Methods and Advances in Geome-

chanics, Wuhan, China, November 1997

[7] J.M. Duzewski, Nonlinear problems during consolidation process , Advanced Nu-

merical Applications and Plasticity in Geomechanics ed. D.V. Griffiths, and G. Gioda,

Springer Verlag, Lecture Notes in Computer Science, 2001

[8] J.M. Duzewski, HYDRO-GEO - finite element package for geotechnics , hydrotech-

nics and environmental engineering, Warsaw 1997 (in Polish)


44/112


[9] V. Eijkhout, T.Chan , ParPre: A parallel preconditioners Package reference manual

for version 2.0.21, revision 1

[10] M.J.Grote, T.Hycke, Parallel Preconditioning with Sparse Approximate Inverses ,

SIAM Journal of Sci. Comput., 18(1997), pp 838-853

[11] P.Krzyzanowski, On block preconditioner for non-symmetric saddle point problems,

SIAM Journal on Scientific Computing, Vol.23 No 1, 2001, pp157-169

[12] R.W.Lewis, B.A.Schrefler, The Finite Element Method in the Static and Dynamic De-

formation and Consolidation of Porous Media , John Wiley Sons, 1998

[13] MPI: A Message Passing Interface Standard, June 1995

[14] F. Okulicka, High-Performance Computing in Geomechanics by a Parallel Finite Ele-

ment Approach, Lecture Notes in Computer Science 1947, Applied parallel Com-puting, 5th International Workshop, PARA 2000, Bergen, Norway, June 2000,

pp391-398

[15] F. Okulicka, Block parallel solvers for coupled geotechnical problems, 10th Interna-

tional Conference on Computer Methods and Advances in Geomechanics ,January 7 -

12, 2001, Tucson, Arizona USA- vol 1, A.A.Balkema, Rotterdam, Brookfield, 2001,

pp861-866

[16] F.Okulicka, Parallel Calculations of Geotechnical Problems by Means of Parallel Fi-

nite Element Code Hydro-Geo, Proceedings of the IASTED International SymposiaAPPLIED INFORATICS, Innsbruck Austria, February 19-22, 2001, pp 440-443

[17] F.Okulicka, Achieving high performance calculation by the parallelization of the code,

The Eighth International Conference On Advanced Computer Systems ACS2001 , Oc-

tober 17-19, 2001 Mielno, Poland, pp 259-268

[18] F.Okulicka, Parallelization of Finite Element ement Package by MPI library, Interna-

tional Conference of MPI/PVM Users , MPI/PVM 01, Santorini 2001, Lecture Notes

in Computer Science 2131, pp425-436

[19] P.S.Pacheco, A Users guide to MPI, 1998

[20] S Parter, Preconditioning Legrendre spectral collocation methods for elliptic problems

I: Finite difference operators, SIAM Journal on Numerical Analysis , Vol 39, No 1,

2001, 320-347

[21] S Parter, Preconditioning Legrendre spectral collocation methods for elliptic problems

I: Finite element operators , SIAM Journal on Numerical Analysis , Vol. 39, No 1, 2001,

348-362

[22] K.K.Phoon, K.C.Toh, S.H.Chan, F.H.Lee, An Efficient Diagonal Preconditioner for

Finite Element Solution of Biots Consolidation Equations , to appear in International

Journal of Numerical Methods in Engineering.


45/112


[23] Y.Saad, Iterative methods for Sparse linear systems , SIAM 2003

[24] http://www.netlib.org/scalapack/

[25] http://www-unix.mcs.anl.gov/mpi


46/112


47/112

c

LARGE-SCALE DATA VISUALIZATION USING

MULTI-LANGUAGE PROGRAMMING APPLIED

TO ENVIRONMENTAL PROBLEMS

Fred eric Magoulesand Roman Putanowicz

Institut Elie Cartan de Nancy, Universite Henri

Poincare, BP 239, 54506 Vandoeuvre-les-Nancy Cedex, France

Institute of Computer Methods in Civil Engineering

(L5), Cracow University of Technology, Cracow, Poland

Abstract

Environment problems lead to large and complex data sets, which appear to be

difficult to analyze. Scientific visualization which transforms raw data into images has

been recognized as an effective way to understand such data. Actually, most existing

scientific software have their own data format and special visualization interfaces or

independent software are used to display these data. In this paper a technique for the

visualization of large-scale data using multi-language programming is investigated.Mixing Tcl, C++ and Fortran components to the VTK library allows to build efficient

and robust applications. This article presents in details how to build such applica-

tions and provides an elegant solution to the problem of accessing VTK objects from

different languages and how to mix Tcl, C++ and Fortran components in one single

application.

Keywords: image processing, visualization, graphics, scripting language, compiled lan-

guage, multi-language programming

1. Introduction

The amount of data collected and stored electronically is doubling every three years. Even

if the development of standard data interface protocols allows to solve the data access prob-

lems, the analysis of this information becomes an emerging problem. Visualization tech-

nology provides effective data presentation. Unfortunately, we are now reaching the limits

of interactive visualization of large-scale data sets, since the amount of data to be analyzed

is overwhelming. Despite the numerous number of scientific visualization software and the

multiple options available in these software, researchers have particular requirements and

the development of home made visualization software is very common.E-mail address: [email protected]; address all correspondence to this author.E-mail address: [email protected]

In: Techniques of Scientific Computing for Energy ... ISBN 1-60021-921-7

Editors: F. Magouls and R. Benelmir, pp. 37-68 2007 Nova Science Publishers, Inc.


48/112

38 Frederic Magoules and Roman Putanowicz

The Visualization ToolKit named VTK is a software system for computer graphics,

visualization and image processing. VTK library is written in C++ however it provides

interfaces to the scripting languages Tcl, Python and Java. Though it is possible to write a

whole VTK application in a scripting language like Tcl, it is more suitable, for efficiency

reasons, to implement some functionality in a compiled language like C/C++ for instance.This is specially the case when working with large data sets arising from environment analy-

sis as noise reduction for instance. For example, when the noise level distribution generated

by cars or airplanes over a city is analyzed, large data sets are considered. Huge data are

mandatory to model the whole city. An example of such model is illustrated in the Fig-

ures 1, 2 and 3. These Figures have been obtained with the technique presented in this

paper.

Figure 1. Example of a city.

Figure 2. Example of a city (bis).

This article presents in through details how to access VTK objects from different lan-

guages and how to mix Tcl and C++ components in one application. Several source code


49/112

Large-Scale Data Visualization using Multi-Language Programming... 39

Figure 3. Example of a city (ter).

examples are shown in order to help the reader to write a complete application by his own.

The paper is organized as follows. In Section 2, the concept of programmable filters in VtK

are reminded. Then in Section 3, the way to access VTK objects data is detailed. Section 4

presents programmable filters written in C++, and Section 5 presents programmable filters

written in Tcl. In Section 6, dynamically linked functions used as filter method in Tcl are

investigated. Finally, Section 7 contains the conclusions of this paper.

2. Programmable Filters

In programs using VTK library, visualization process can be described in terms of data flow

through so called visualization networkor visualization pipeline [14, 12].

During visualization, a data, which is represented by visualization objects, is passed

between process objects connected into the visualization pipeline. The process objects

operate on input data to generate output data.

The process objects can be divided into the following categories: source objects, filter

objects and mapper objects. Filter objects require one or more input data objects and gen-

erate one or more output data objects. VTK provides several filter objects which perform

various visualization operations (e.g. extracting geometry, extracting and modifying data

attributes, etc).

When new processing capabilities are required, and when they cannot be obtained by

combination of existing filters, new classes of filters can be added. This requires introduc-

tion of a new class and modification of the source code. However it is possible to create

new type filter objects without creating new classes and even to create new kind of filters

on the run time and from the scripting languages like Tcl or Python. To make it possible

VTK provides family of programmable filter class which has all common properties of or-

dinary filter classes except the fact that their processing routine can be set to specified user

function. This way the user has only to write the processing function, create new instance

of programmable filter and use it to build visualization pipeline. Each time the filter is re-


50/112


quested to execute, it will call user specified function. The next section shows how to write

functions for programmable filters (vtkProgrammableFilter in particular) in Tcl, C++

and Fortran.

3. Accessing VTK Objects Data

It might happen that we want to extend visualization programs by functions written in For-

tran or we have large Fortran legacy code we would like to interface with visualization

program written using VTK library. One problem that immediately appears is that in For-

tran (including Fortran 90) we do not have direct access to C++ objects. In theory it is

possible to pass C++ object pointer to a Fortran function and then knowing the memory

layout of the object manipulate it directly from Fortran but this is restricted to simplest

cases and is highly not portable. What instead should be done is to extract all necessary

information from C++ object, pack it into ordinary variables and arrays and pass that datato a Fortran function. When Fortran function returns modified arguments, they are used to

alter the C++ objects or to create new ones.

We assume that the reader is already familiar with basic VTK components and in par-

ticular with VTK data model. If not then we suggest reading chapters 4 and 5 from [14] or

chapter 11 from [12]. Nevertheless we will start our discussion with a very simple example

which introduces one of the VTK array classes vtkDoubleArray

3.1. Creating and Manipulating VTK Arrays

The example below shows how to create an array of double values with 10 rows and 3

columns. Such array could be used for instance to hold points coordinates of a three dimen-

sional mesh in finite element methods.

1 #include 2 #include "vtkDoubleArray.h" 3 using namespace std; 4 int main(void) { 5 vtkDoubleArray *array; 6 int m = 10; 7 i n t n = 3 ; 8 double buff[3]; 9 /* Creating VTK array object */ 10 array = vtkDoubleArray::New(); 11 array->SetNumberOfComponents(n);

12 array->Allocate(m); 13 array->SetNumberOfTuples(m); 14 for (int i=0; i


51/112


15 { 16 buff[0] = buff[1] = buff[2] = (double)i; 17 array->SetTuple(i, buff); 18 } 19 /* copy values from array object to the carray array */ 20 int nrows = array->GetNumberOfTuples(); 21 int ncols = array->GetNumberOfComponents(); 22 double *carray = new double [nrows * ncols]; 23 for (int i=0; i


52/112


2 {

3 assert (array!=NULL);

4 int maxId = array->GetMaxId();

5 // in C/C++ indexing starts from 0

6 double *carray = new double[maxId+1];

7 if (carray != NULL) 8 {

9 for (int i=0; iGetValue(i);

12 }

13 if (length != NULL)

14 {

15 length = maxId+1;

16 }

17 }

18 return carray;

19 }

In appendix A we present more functions which can help to transfer data from and to

VTK objects. Those functions are organized into small library called dpl data passing li-

brary. That library covers only few cases of accessing VTK objects in particular following

classes: vktUnstructuredGrid, vtkPoints, vtkPointData. If necessary, more robust

and universal functions can be easily written based on the dpl examples.

3.3. Passing Data from VTK Object to Fortran Routine

To finish this section we present another example of accessing VTK object. This time we

use dpl functions to extract values of the points scalar attribute called temperature, pack

it into an array and send that array together with its length to a Fortran function which

substitutes each element with its sinus value.

1 #inlcude "dpl.h"

2 #include "vtkProgrammableFilter.h"

3 #include "vtkUnstructuredGrid.h"

4 #include "vtkDataSet.h"

5 void SinusTemperature(void *arg) {

6 vtkProgrammableAttributeDataFilter *myFilter;

7 vtkDataSet *input;

8 vktPointData *pd;

9 double *indata;

10 int length;

11 *myFilter = (vtkProgrammableAttributeDataFilter *)arg;

12 input= myFilter->Input(); 13 pd = input->GetPointData();

14 /* get the point data array */


53/112


15 indata = dplGetScalarsCArray(pd, "temperature", &length);

16 if (indata != NULL)

17 {

18 /* pass the input data to the Fortran routine */

19 sinusfortran (&length, indata);

20 }

21 /* set back the transformed data */

22 dplSetScalarsFromCArray (pd, indata, length, "temperature");

23 }

The code for Fortran function is as follows:

SUBROUTINE SINUSFORTRAN(N, A)

DOUBLE PRECISION A(*)

DO 10 I=1,NA(I) = SIN(A(I))

10 CONTINUE

END

The call of Fortran function in line 19 depends on the combination of C/C++ and For-

tran compilers so the appropriate compiler documentation should be consulted for each

particular compiler.

The example above uses the following helper functions:

double *dplGetScalarsCArray(vtkUnstructuredGrid *,const char *, int *);

int dplSetScalarsFromCArray(vtkUnstructuredGrid *, double *,

int, const char* );

The first of them takes point data object and the name of scalar field and allocates an array

and then copies the values of the scalar field to the array. It returns the length of allocate

array as through the last argument.

The second function does the reverse - it takes the array and sets the named points

attribute to the array values.

4. Programmable Filters in C++

We will start the discussion on vtkProgrammableFilter class by an example of using

programmable filter in a C++ code. Though use of C++ complicates slightly the example, it

allows to show some internal working of the class, which understanding is necessary when

trying to use vtkProgrammableFilter in Tcl.

4.1. Simple Pipeline ExampleTo simplify our discussion and the example we present a program in which visualization

pipeline has been reduced to minimum and consists of only two objects:

The program looks like follows


54/112


vtkUnstructuredGridReader

vtkUnstructuredGridWriter

Figure 4. Simple pipeline

1 #include "vtkUnstructuredGridReader.h"

2 #include "vtkUnstructuredGridWriter.h"

3 int main() {

4 vtkUnstructuredGridReader *reader;

5 vtkUnstructuredGridWriter *writer;

6 reader = vtkUnstructuredGridReader::New();

7 reader->SetFileName("2Dmesh.vtk");

8 writer = vtkUnstructuredGridWriter::New();

9 writer->SetFileName("newMesh.vtk");10 // connect object to form the pipe

11 writer->SetInput(reader->GetOutput());

12 // initialize pipe processing

13 writer->Update();

14 writer->Delete(); 15 reader->Delete();

16 return 0;

17 }

As it can be seen, this program does nothing else as copying 2Dmesh.vtk file into

newMesh.vtk.

4.2. Pipeline with Filter

Now we introduce programmable filter in order to transform an unstructured grid by userspecified function. The layout of the program is shown in the figure below. We assume that

data file specified a scalar attribute for each point. The user function copy the grid topology

and geometry and sets new point attributes which are the old value multiplied by 10. With


55/112


vtkUnstructuredGridReader

vtkProgrammableFilter

vtkUnstructuredGridWriter

user function

Figure 5. Pipeline with filter

user function as above it would be better to use vtkProgrammableAttributeDataFilter

but we will use vtkProgrammableFilter to show how to create new output object andhow to copy grid topology and geometry.

The vtkProgrammableFilter class provides, among others, the following method:

vktProgrammableFilter::SetExecuteMethod (void(*f)(void *), void *arg)

This method takes two arguments: first being the pointer to user function. The user

function must take one argument of void pointer type and return void. The second argument

is the pointer to client data, which will be passed to user function upon its execution. The

client data allow to pass to user function all necessary information the function needs to

perform its job. The client data will usually contain pointer to the filter itself which allows

the function to retrieve filters input and output objects. In the simplest case the client data

will be the pointer to the filter alone.

Lets assume that the function

void ScaleBy10 (void *arg);

is going to be used by the filter. Here is the new program:

1 #include "vtkUnstructuredGridReader.h" 2 #include "vtkUnstructuredGridWriter.h"



56/112


4 void ScaleBy10 (void *arg);

5 int main() {

6 vtkUnstructuredGridReader *reader;

7 vtkUnstructuredGridWriter *writer;

8 vtkProgrammableFilter *filter;

9 reader = vtkUnstructuredGridReader::New();

10 reader->SetFileName("2Dmesh.vtk");

11 writer = vtkUnstructuredGridWriter::New();

12 writer->SetFileName("newMesh.vtk");

13 filter = vtkProgrammableFilter::New();

14 filter->SetExecuteMethod (ScaleBy10, (void*)filter);

15 // connect objects to form the pipe

16 filter->SetInput(reader->GetOutput()):

17 writer->SetInput(filter->GetUnstructuredGridOutput());

18 // initialize pipe processing

19 writer->Update();

20 writer->Delete();

21 filter->Delete

22 reader->Delete();

23 return 0; 24 }

Note the line 14 where as we said we pass pointer to filter object as a client data.

4.3. User Function

The task of the user function ScaleBy10 create new grid with the same geometry and topol-

ogy as the input grid but with points attribute scaled by 10. First the topology and geometry

is copied from the input object. Then point data array are copied and then modified. At the

end the modified data array is inserted into output mesh as the new point data. The code for

user function is given below.

1 #include "vtkDataSet.h"

2 #include "vtkDoubleArray.h"

3 #include "vtkDataArray.h"


5 #include "vtkUnstructuredGrid.h"

6 void ScaleBy10(void *arg) {

7 vtkIdType numPts;

8 vtkDa

scientific computing for energy

Documents