mixed finite element solution of a semi-conductor problem ...+days/2017/pdf/c-as.pdf ·...

LJLL: 15 Dec.2017

Mixed finite element solution ofa semi-conductor problem by

Dissection sparse direct solver

Atsushi Suzuki1

1Cybermedia Center, Osaka [email protected]

December 15, 2017

Mixed finite element formulation

Find (u, p) ∈ V ×Qa(u, v) + b(v, p) = (f, v) ∀v ∈ V,

b(u, q) = (g, q) ∀q ∈ Q.Stokes equations−∇ · 2D(u) +∇p = f , [D(u)]i j = (∂jui + ∂iuj), 1 ≤ i, j < d

∇ · u = 0 ,u = 0 on ΓD .

V := v ∈ H1(Ω)d ; v = 0 on ΓD, Q := L2(Ω).

a(u, v) = 2∫

Ω

D(u) : D(v), b(v, q) = −∫

Ω

∇ · vq.

Poisson equationu = −∇p ,

∇ · u = g ,

u = 0 on ∂Ω .

V := H(div ; Ω), Q := L2(Ω).

a(u, v) = 2∫

Ω

u · v, b(v, q) = −∫

Ω

∇ · vq.

matrix form of mixed finite element method : 1/2

~u ∈ RNu , ~p ∈ RNp

K

[~u~p

]=

[A BT

B −εI

] [~u~p

]=

[~f~g

]

I A: coercive (A~x, ~x ) > 0 ∀~x 6= ~0I BT : satisfies discrete inf-sup conditon, i.e. kerBT = ~0I ε > 0

Schur complement matrix S = εI +BA−1BT

(S ~p, ~p ) = ε(~p, ~p ) + (BA−1BT ~p, ~p ) ≥ ε(~p, ~p ) > 0

I LDLT -factorization is possible for any orderingI we need to set appropriate regularization ε > 0

√ε-perturbation

a regularization techniqueA11 A12

A210 αβ 0

→A11 A12

A21

√ε αβ 0

I iterative refinement to improve accuracy of a solutionI user can/have to specify perturbation parameter for unsymmetric

matrix (default = 10−13 for Pardiso)

matrix form of mixed finite element method : 2/2

~u ∈ RNu , ~p ∈ RNp

K

[~u~p

]=

[A BT

B 0

] [~u~p

]=

[~f~g

]

I A: coercive (A~x, ~x ) > 0 ∀~x 6= 0I BT : satisfies discrete inf-sup conditon, i.e. kerBT = ~0

Schur complement matrix S = BA−1BT

(S ~p, ~p ) = (BA−1BT ~p, ~p ) = (A−1BT ~p,BT ~p )

kerBT = ~0∧~p 6= ~0⇒ (A−1BT ~p,BT ~p ) > 0.

I LDLT -factorization of K with “block” ordering as [~u, ~p ]I not clear with node-wise [u1, u2, p] ordering

⇒ postponing factorization + 2× 2 pivoting

pivoting strategy

full pivoting : A = ΠTLLUΠR partial pivoting : A = ΠLU

find maxk<i,j≤n|A(i, j)| find maxk<i≤n|A(i, k)|

k k

symmetric pivoting : A = ΠTLDUΠ 2× 2 pivoting : A = ΠT L D UΠ

find maxk<i≤n|A(k, k)| find maxk<i,j≤ndet∣∣∣∣A(i, i) A(i, j)A(j, i) A(j, j)

∣∣∣∣k k

sym. pivoting is mathematically not always possible

understanding pivoting strategy by solution in subspaces

A = ΠTLDUΠ: symmetric pivotingD: diagonal, L: lower triangle, L(i, i) = 1, U : upper tri., U(i, i) = 1.

I index set i1, i2, · · · , imI Vm = span[~ei1 , ~ei2 , · · · , ~eim ] ⊂ RN

I Pm : RN → Vm orthogonal projection.

find ~u ∈ Vm (A~u− ~f,~v ) = 0 ∀~v ∈ Vm.

∃Π : A = ΠTLDUΠ⇒ ∃i1, i2, · · · , iN s.t. PmAP

Tm : invertible on Vm 1 ≤ ∀m ≤ N .

2× 2 pivoting: Vm−1, Vm, Vm+1, Vm+2, Vm+3, by skipping Vm+1.

J. R. Bunch, L. Kaufman. Some stable methods for calculatinginertia and solving symmetric linear systems,Math. Comput, 31 (1977) 163–179.

R. Bank, T.-F. Chan. An analysis of the composite step biconjugate gradientmethod. Numer. Math, 66 (1993) 295–320.

nested dissection by graph decomposition

8 94

c

d

6

a b5 e f7

2 31

8 9

4

a b

5

c d

6

e f

7

2 3

1

sparse solver

dense solver

dense solver

dense solver

A. George. Numerical experiments using dissection methods to solve n by n gridproblems. SIAM J. Num. Anal. 14 (1977),161–179.

software package:METIS : V. Kumar, G. Karypis, A fast and high quality multilevel scheme forpartitioning irregular graphs. SIAM J. Sci. Comput. 20 (1998) 359–392.SCOTCH : F. Pellegrini J. Roman J, P. Amestoy, Hybridizing nested dissectionand halo approximate minimum degree for efficient sparse matrix ordering.Concurrency: Pract. Exper. 12 (2000) 69–84.

recursive generation of Schur complement»A11 A21

A21 A22

–=

»A11 0A21 S22

– »I1 A−1

11 A12

0 I2

–S22 = A22 −A21A−1

11 A12 = A22 − (A21U−111 )D−1

11 L−111 A12 : recursively computed

8 9 a b c d e f 4 5 6 7 2 3 1

88

99

aa

bb

cc

dd

ee

ff

84

94

a5

b5

c6

d6

e7

f7

82

92

a2

b2

d3

e3

f3

91

b1

c1

d1

e1

44

55

66

77

42

52

73

61

22

33

11

21

31

48 49

5a 5b

6c 6d

7e 7f

28 29 2a 2b

3d 3e 3f

1b 1c 1d 1e

24 25

16

37

12 1319

42

52

73

63

22

33

11

21

31

24 25

36 37

12 13

22

33

11

21

31

12 13

41

51

61

71

14 15 16 17

Schur complement

by

Schur complement

by

11

Schur complement

by

sparse solver

dense solver

dense solver

44

55

66

77

dense factorization

sparse part : completely in paralleldense part : better use of BLAS 3; dgemm, dtrsm

symmetric pivoting with postponing for block strategyI nested-dissection decomposition may produce singular

sub-matrix for indefinite matrix

τ : given threshold for postponing, 10−2 for MUMPS, Dissection|A(i, i)|/|A(i− 1, i− 1)| < τ ⇒ A(k, j)i≤k,j are postponed

i − 1i

44

55

66

77

22

33

11

31

21

71

61

51

41

73

63

52

42

invertible entries

candidates ofnull pivot

Schur complement matrix from postponed pivots is computedkernel detection algorithm : QR-factorization by MUMPS

Parallel performace of Dissection solver : 1/2

matrices are given from TOSHIBA memory by a joint project:n=569, 139, nnz=8, 338, 291, cond=1.90090141·105

sovler relative error errordissection 7.5867639·10−6 4.5545655·10−13

pardiso 6.4318791·10−2 1.8114969·10−7

[email protected] [email protected]# cores elapsed CPU elapsed CPU

1 17.019 17.018 27.028 27.0302 9.217 18.300 13.564 26.9803 6.281 18.490 9.251 27.3904 5.224 20.461 7.085 27.7808 3.744 28.53012 2.686 29.34016 2.185 30.620

Parallel performace of Dissection solver : 2/2

n=1, 867, 029 nnz=29, 311, 205, cond=8.93145673·105

sovler relative error errordissection 1.2970417·10−5 3.3747264·10−12

pardiso 7.0464281·10−1 1.6031101·10−5

[email protected] x2

double precision quadruple precision# cores elapsed CPU elapsed CPU

1 197.58 197.89 23,988.0 23,988.02 102.73 200.02 12,225.0 24,007.04 55.22 204.66 6,195.5 24,022.08 30.17 208.47 3,154.5 24,016.016 18.08 222.63 1,567.3 24,036.024 14.95 250.26 1,088.9 24,059.032 13.47 289.29 824.5 24,083.036 13.38 311.15 754.1 24,124.0

quadruple arithmetic⇐ double-double qd library 2.3.17+ Dissection C++ template

Drift-Diffusion system at stationary state : 1/2

I ϕ : electrostatic potentialI n : electron concentrationI p : hole concentration

div(εE) = q(p− n+ C(x)) E = −∇ϕ−divJn = 0 Jn = −q(µnn∇ϕ− µnDn∇n)

divJp = 0 Jp = −q(µpp∇ϕ+ µpDp∇p)

following Maxwell-Boltzmann statistics : p = niexp(ϕp − ϕVth

)

I q : positive electron chargeI ε : dielectric constant of the materialsI ϕp : quasi-Fermi levelI ni : intrinsic concentration of the semiconductorI Vth = KBT/q : thermal voltageI KB : Boltzmann constantI T : lattice temperature

Drift-Diffusion system at stationary state : 2/2

dimensionless system (by unit scaling)

−div(λ2∇ϕ) = p− n+ C(x)−divJn = 0 Jn = ∇n− n∇ϕ

divJp = 0 Jp = −(∇p+ p∇ϕ)

with boundary conditions

ϕ = f on ΓD∂ϕ

∂ν= 0 on ΓN

n = g on ΓD Jn · ν = 0 on ΓN ⇐∂n

∂ν= 0

p = h on ΓD Jp · ν = 0 on ΓN ⇐∂p

∂ν= 0

Slotboom variables, η, ξ : n = ηeϕ, p = ξe−ϕ

∇n = ∇ηeϕ + ηeϕ∇ϕ = ∇ηeϕ + n∇ϕJn = ∇n− n∇ϕ = eϕ∇η

e−ϕJn = ∇η eϕJp = −∇ξ

Finite volume discretization with Scharfetter-Gummel method

approximation of eϕJp = −∇ξ in an interval [xi, xi+1], h = xi+1 − xi∫ xi+1

xi

eϕdx Jp i+1/2 = −h∇ξi+1/2 ' −(ξi+1 − ξi)

I ϕ : assumed to be linear in the interval [xi, xi+1].∫ xi+1

xi

eϕdx =1dϕd x

[eϕ(x)

]xi+1

xi

=h

ϕi+1 − ϕi(eϕi+1 − eϕi)

Jp i+1/2 ' −(ξi+1 − ξi)ϕi+1 − ϕi

h

1eϕi+1 − eϕi

= −ϕi+1 − ϕi

h

(e−ϕi+1ξi+1

1− eϕi−ϕi+1− e−ϕiξieϕi+1−ϕi − 1

)= −ϕi+1 − ϕi

h

(pi+1

1− eϕi−ϕi+1− pi

eϕi+1−ϕi − 1

)= − 1

h(B(ϕi − ϕi+1)pi+1 −B(ϕi+1 − ϕi)pi)

I B(x) = x/(ex − 1) : Bernoulli function

mixed variational formulation : 1 / 2

Slotboom variable ξ : p = ξe−ϕ

div(Jp) = 0 in Ω,eϕJp = −∇ξ in Ω.

function space :H(div) = v ∈ L2(Ω)2 ; div v ∈ L2(Ω),

Σ = v ∈ H(div) ; v · ν = 0 on ΓN

integration by parts leads to∫Ω

eϕJp · v =−∫

Ω

∇ξ · v =∫

Ω

ξ∇ · v −∫

∂Ω

ξv · ν∫Ω

eϕJp · v −∫

Ω

ξ∇ · v = −∫

ΓD

heϕv · ν −∫

ΓN

ξv · ν ∀v ∈ Σ

∫Ω

∇ · Jp q = 0 ∀q ∈ L2(Ω)

mixed variational formulation : 2 / 2

mixed-type weak formulation

find (Jp, ξ) ∈ Σ× L2(Ω)∫Ω

eϕJp · v −∫

Ω

ξ∇ · v =−∫

ΓD

heϕv · ν ∀v ∈ Σ

−∫

Ω

∇ · Jp q = 0 ∀q ∈ L2(Ω)

symmetric indefinite

replacing ξ = eϕp again,

find (Jp, p) ∈ Σ× L2(Ω)∫Ω

eϕJp · v −∫

Ω

eϕp∇ · v =−∫

ΓD

heϕv · ν ∀v ∈ Σ

−∫

Ω

∇ · Jp q = 0 ∀q ∈ L2(Ω)

unsymmetric indefinite

hybridization of mixed formulation + mass lumping⇒ FVM

boundary conditions for a diode device (thanks to Dr. Sho )

(4) (3)

(5)

(6) (2)

(1)

N region : x < 1/4 ∧ y > 3/4,C(x, y) = nd = 1017

P region : othersC(x, y) = −na = −1020

n p = n2i , ni = 1.08× 1010

charge neutralityp− n+C(x, y) = 0 on (1), (4).

N region :n =

√n2

i + C2/4 + C/2 ' nd

P region :p =

√n2

i + C2/4− C/2 ' na

(1) ϕ = − log(na

ni) + 1

Vthϕappl n =

ni

nap =

na

ni(2), (3), (5), (6) ∂νϕ = 0 ∂νn = 0 ∂νp = 0

(4) ϕ = log(nd

ni) n =

nd

nip =

ni

nd

nonlinear iteration to obtain the stationary state

a variant of Gummel mapϕm, nm, pm : given→ ϕm+1, nm+1, pm+1 by a fixed point methodSlotboom variable ξm, ηm

pm = ξme−ϕm

nm = ηmeϕm

to find a solution of the nonlinear eq. : −div(λ2∇ϕ) = p− n+ C(x)

F (ηm, ξm ; ϕ,ψ) = λ2

∫Ω

∇ϕ · ∇ψ −∫

Ω

(ξme−ϕ − ηme−ϕ + C)ψ = 0

differential calculus with δϕ ∈ H1(Ω) ; ψ = 0 on ΓD leads to

F (ηm, ξm ; ϕm + δϕ, ψ)− F (ηm, ξm ; ϕm, ψ)

= λ2

∫Ω

∇δϕ · ∇ψ −∫

Ω

ξm(e−ϕm−δϕ − e−ϕm

)ψ − ηm

(eϕm+δϕ − eϕm

)ψ

= λ2

∫Ω

∇δϕ · ∇ψ +∫

Ω

(ξme−ϕm

+ ηmeϕm)δϕψ

= λ2

∫Ω

∇δϕ · ∇ψ +∫

Ω

(pm + nm)δϕψ ϕm+1 = ϕm + δϕ

FreeFem++ scriptFinite element approximation

I Jp ∈ H(div) : Raviar-Thomas : RT1(K) = (P1(K))2 + ~xP1(K),I p ∈ L2(Ω) : piecewise linear : P1(K),I ϕ ∈ H1(Ω) : piecewise linear : P1(K).

load "Element_Mixte"load "Dissection"defaulttoDissection();fespace Vh(Th, RT1); fespace Ph(Th, P1);fespace Xh(Th, P1);Vh [up1, up2], [v1, v2]; Ph pp, q;Xh phi; // obtained in Gummel mapproblem DDp([up1, up2, pp],[v1, v2, q],

solver=sparsesolver,strategy=3,tolpivot=1.0e-2,tgv=1.0e+30) =

int2d(Th,qft=qf9pT)(exp(phi) * (up1 * v1 + up2 * v2)- exp(phi) * pp * (dx(v1) + dy(v2))+ q * (dx(up1) + dy(up2)))

+int1d(Th, 1)(gp1 * exp(phi) * (v1 * N.x + v2 * N.y))+int1d(Th, 4)(gp4 * exp(phi) * (v1 * N.x + v2 * N.y))+ on(2, 3, 5, 6, up1 = 0, up2 = 0);

matrix representation and preconditioning

linear system for hole concentration by RT1/P1[A(~ϕ) B1(~ϕ)T

−B2 0

] [~u~p

]=

[~f(~ϕ)~0

]matrices and vectors weighted with exponential of electrostaticpotential

~v TA(~ϕ)~u =∫

Ω

eϕu · v

~v TB1(~ϕ)~p = −∫

Ω

eϕp∇ · v

~v T ~f(~ϕ) = −∫

ΓD

heϕv · ν

~q TB2~u = −∫

Ω

q∇ · u

preconditioned system by scaling matrix [W ]i i = 1/[A(~ϕ)]i i[W A(~ϕ) W B1(~ϕ)T

−B2 0

] [~u~p

]=

[W ~f(~ϕ)

~0

]

numerical result of a semi-conductor problem

I compute thermal equilibrium with p = niexp(−ϕ)Vth andn = niexp(ϕ)Vth

I Newton iteration for the potential equation and fixed pointiteration for the whole system : a kind of Gummel map

electrostatic potential hole concentration

Raviart-Thomas finite element

RT0(K) = (P0(K))2 + ~xP0(K) ⊂ (P1(K))2.I K : triangle elementI Ei : edges of KI ~νi : outer normal of K on Ei

I ~Ei : normal to edge Ei

~νi~Ei

Ei

Pi

K

~v ∈ RT0(K) ⇒ ~v|Ei · ni ∈ P0(Ei), div~v ∈ P0(K)

finite element basis

~Ψi(~x) = σi|Ei||K|

(~x− ~Pi) σi = ~Ei · ~νi, Pi : node of K

finite element vector value is continuous on the middle point on Ei.inner finite element approximation of H(div ; Ω)∫

K

eϕ~Ψi · ~Ψj ←∫

K

eϕ1λ1+ϕ2λ2+ϕ3λ3λkλl by exact quadrature∫K

eϕ~Ψi · ~Ψj '1|K|

∫K

eP

k ϕkλk

∫K

~Ψi · ~Ψj : exponential fitting

λ1, λ2, λ3 : barycentric coordinates of K

summary

I Dissection direct solver can factorize indefinite unsymmetricmatrix with symmetric pivoting

I mixed finite element formulation for Dirft-Diffusion equations withstandard numerical integration instead of Schafetter-Gummelscheme / exponential fitting

I possible extension with higher order elements withRT1(K)/P1(K)/P2(K)

ReferencesI F. Brezzi, L. D. Marini, S. Micheletti, P. Pietra, R. Sacco, S. Wang.

Discretization of semiconductor device problems (I)F. Brezzi et al., Handbook of Numerical Analysis vol XIII,Elsevier 2005

I personal communication with Dr. Sho Shohiro @ Osaka Univ.

mixed finite element solution of a semi-conductor problem ...+days/2017/pdf/c-as.pdf ·...

Documents