mixed finite element solution of a semi-conductor problem ...+days/2017/pdf/c-as.pdf ·...
TRANSCRIPT
LJLL: 15 Dec.2017
Mixed finite element solution ofa semi-conductor problem by
Dissection sparse direct solver
Atsushi Suzuki1
1Cybermedia Center, Osaka [email protected]
December 15, 2017
Mixed finite element formulation
Find (u, p) ∈ V ×Qa(u, v) + b(v, p) = (f, v) ∀v ∈ V,
b(u, q) = (g, q) ∀q ∈ Q.Stokes equations−∇ · 2D(u) +∇p = f , [D(u)]i j = (∂jui + ∂iuj), 1 ≤ i, j < d
∇ · u = 0 ,u = 0 on ΓD .
V := v ∈ H1(Ω)d ; v = 0 on ΓD, Q := L2(Ω).
a(u, v) = 2∫
Ω
D(u) : D(v), b(v, q) = −∫
Ω
∇ · vq.
Poisson equationu = −∇p ,
∇ · u = g ,
u = 0 on ∂Ω .
V := H(div ; Ω), Q := L2(Ω).
a(u, v) = 2∫
Ω
u · v, b(v, q) = −∫
Ω
∇ · vq.
matrix form of mixed finite element method : 1/2
~u ∈ RNu , ~p ∈ RNp
K
[~u~p
]=
[A BT
B −εI
] [~u~p
]=
[~f~g
]
I A: coercive (A~x, ~x ) > 0 ∀~x 6= ~0I BT : satisfies discrete inf-sup conditon, i.e. kerBT = ~0I ε > 0
Schur complement matrix S = εI +BA−1BT
(S ~p, ~p ) = ε(~p, ~p ) + (BA−1BT ~p, ~p ) ≥ ε(~p, ~p ) > 0
I LDLT -factorization is possible for any orderingI we need to set appropriate regularization ε > 0
√ε-perturbation
a regularization techniqueA11 A12
A210 αβ 0
→A11 A12
A21
√ε αβ 0
I iterative refinement to improve accuracy of a solutionI user can/have to specify perturbation parameter for unsymmetric
matrix (default = 10−13 for Pardiso)
matrix form of mixed finite element method : 2/2
~u ∈ RNu , ~p ∈ RNp
K
[~u~p
]=
[A BT
B 0
] [~u~p
]=
[~f~g
]
I A: coercive (A~x, ~x ) > 0 ∀~x 6= 0I BT : satisfies discrete inf-sup conditon, i.e. kerBT = ~0
Schur complement matrix S = BA−1BT
(S ~p, ~p ) = (BA−1BT ~p, ~p ) = (A−1BT ~p,BT ~p )
kerBT = ~0∧~p 6= ~0⇒ (A−1BT ~p,BT ~p ) > 0.
I LDLT -factorization of K with “block” ordering as [~u, ~p ]I not clear with node-wise [u1, u2, p] ordering
⇒ postponing factorization + 2× 2 pivoting
pivoting strategy
full pivoting : A = ΠTLLUΠR partial pivoting : A = ΠLU
find maxk<i,j≤n|A(i, j)| find maxk<i≤n|A(i, k)|
k k
symmetric pivoting : A = ΠTLDUΠ 2× 2 pivoting : A = ΠT L D UΠ
find maxk<i≤n|A(k, k)| find maxk<i,j≤ndet∣∣∣∣A(i, i) A(i, j)A(j, i) A(j, j)
∣∣∣∣k k
sym. pivoting is mathematically not always possible
understanding pivoting strategy by solution in subspaces
A = ΠTLDUΠ: symmetric pivotingD: diagonal, L: lower triangle, L(i, i) = 1, U : upper tri., U(i, i) = 1.
I index set i1, i2, · · · , imI Vm = span[~ei1 , ~ei2 , · · · , ~eim ] ⊂ RN
I Pm : RN → Vm orthogonal projection.
find ~u ∈ Vm (A~u− ~f,~v ) = 0 ∀~v ∈ Vm.
∃Π : A = ΠTLDUΠ⇒ ∃i1, i2, · · · , iN s.t. PmAP
Tm : invertible on Vm 1 ≤ ∀m ≤ N .
2× 2 pivoting: Vm−1, Vm, Vm+1, Vm+2, Vm+3, by skipping Vm+1.
J. R. Bunch, L. Kaufman. Some stable methods for calculatinginertia and solving symmetric linear systems,Math. Comput, 31 (1977) 163–179.
R. Bank, T.-F. Chan. An analysis of the composite step biconjugate gradientmethod. Numer. Math, 66 (1993) 295–320.
nested dissection by graph decomposition
8 94
c
d
6
a b5 e f7
2 31
8 9
4
a b
5
c d
6
e f
7
2 3
1
sparse solver
dense solver
dense solver
dense solver
A. George. Numerical experiments using dissection methods to solve n by n gridproblems. SIAM J. Num. Anal. 14 (1977),161–179.
software package:METIS : V. Kumar, G. Karypis, A fast and high quality multilevel scheme forpartitioning irregular graphs. SIAM J. Sci. Comput. 20 (1998) 359–392.SCOTCH : F. Pellegrini J. Roman J, P. Amestoy, Hybridizing nested dissectionand halo approximate minimum degree for efficient sparse matrix ordering.Concurrency: Pract. Exper. 12 (2000) 69–84.
recursive generation of Schur complement»A11 A21
A21 A22
–=
»A11 0A21 S22
– »I1 A−1
11 A12
0 I2
–S22 = A22 −A21A−1
11 A12 = A22 − (A21U−111 )D−1
11 L−111 A12 : recursively computed
8 9 a b c d e f 4 5 6 7 2 3 1
88
99
aa
bb
cc
dd
ee
ff
84
94
a5
b5
c6
d6
e7
f7
82
92
a2
b2
d3
e3
f3
91
b1
c1
d1
e1
44
55
66
77
42
52
73
61
22
33
11
21
31
48 49
5a 5b
6c 6d
7e 7f
28 29 2a 2b
3d 3e 3f
1b 1c 1d 1e
24 25
16
37
12 1319
42
52
73
63
22
33
11
21
31
24 25
36 37
12 13
22
33
11
21
31
12 13
41
51
61
71
14 15 16 17
Schur complement
by
Schur complement
by
11
Schur complement
by
sparse solver
dense solver
dense solver
44
55
66
77
dense factorization
sparse part : completely in paralleldense part : better use of BLAS 3; dgemm, dtrsm
symmetric pivoting with postponing for block strategyI nested-dissection decomposition may produce singular
sub-matrix for indefinite matrix
τ : given threshold for postponing, 10−2 for MUMPS, Dissection|A(i, i)|/|A(i− 1, i− 1)| < τ ⇒ A(k, j)i≤k,j are postponed
i − 1i
44
55
66
77
22
33
11
31
21
71
61
51
41
73
63
52
42
invertible entries
candidates ofnull pivot
Schur complement matrix from postponed pivots is computedkernel detection algorithm : QR-factorization by MUMPS
Parallel performace of Dissection solver : 1/2
matrices are given from TOSHIBA memory by a joint project:n=569, 139, nnz=8, 338, 291, cond=1.90090141·105
sovler relative error errordissection 7.5867639·10−6 4.5545655·10−13
pardiso 6.4318791·10−2 1.8114969·10−7
[email protected] [email protected]# cores elapsed CPU elapsed CPU
1 17.019 17.018 27.028 27.0302 9.217 18.300 13.564 26.9803 6.281 18.490 9.251 27.3904 5.224 20.461 7.085 27.7808 3.744 28.53012 2.686 29.34016 2.185 30.620
Parallel performace of Dissection solver : 2/2
n=1, 867, 029 nnz=29, 311, 205, cond=8.93145673·105
sovler relative error errordissection 1.2970417·10−5 3.3747264·10−12
pardiso 7.0464281·10−1 1.6031101·10−5
double precision quadruple precision# cores elapsed CPU elapsed CPU
1 197.58 197.89 23,988.0 23,988.02 102.73 200.02 12,225.0 24,007.04 55.22 204.66 6,195.5 24,022.08 30.17 208.47 3,154.5 24,016.016 18.08 222.63 1,567.3 24,036.024 14.95 250.26 1,088.9 24,059.032 13.47 289.29 824.5 24,083.036 13.38 311.15 754.1 24,124.0
quadruple arithmetic⇐ double-double qd library 2.3.17+ Dissection C++ template
Drift-Diffusion system at stationary state : 1/2
I ϕ : electrostatic potentialI n : electron concentrationI p : hole concentration
div(εE) = q(p− n+ C(x)) E = −∇ϕ−divJn = 0 Jn = −q(µnn∇ϕ− µnDn∇n)
divJp = 0 Jp = −q(µpp∇ϕ+ µpDp∇p)
following Maxwell-Boltzmann statistics : p = niexp(ϕp − ϕVth
)
I q : positive electron chargeI ε : dielectric constant of the materialsI ϕp : quasi-Fermi levelI ni : intrinsic concentration of the semiconductorI Vth = KBT/q : thermal voltageI KB : Boltzmann constantI T : lattice temperature
Drift-Diffusion system at stationary state : 2/2
dimensionless system (by unit scaling)
−div(λ2∇ϕ) = p− n+ C(x)−divJn = 0 Jn = ∇n− n∇ϕ
divJp = 0 Jp = −(∇p+ p∇ϕ)
with boundary conditions
ϕ = f on ΓD∂ϕ
∂ν= 0 on ΓN
n = g on ΓD Jn · ν = 0 on ΓN ⇐∂n
∂ν= 0
p = h on ΓD Jp · ν = 0 on ΓN ⇐∂p
∂ν= 0
Slotboom variables, η, ξ : n = ηeϕ, p = ξe−ϕ
∇n = ∇ηeϕ + ηeϕ∇ϕ = ∇ηeϕ + n∇ϕJn = ∇n− n∇ϕ = eϕ∇η
e−ϕJn = ∇η eϕJp = −∇ξ
Finite volume discretization with Scharfetter-Gummel method
approximation of eϕJp = −∇ξ in an interval [xi, xi+1], h = xi+1 − xi∫ xi+1
xi
eϕdx Jp i+1/2 = −h∇ξi+1/2 ' −(ξi+1 − ξi)
I ϕ : assumed to be linear in the interval [xi, xi+1].∫ xi+1
xi
eϕdx =1dϕd x
[eϕ(x)
]xi+1
xi
=h
ϕi+1 − ϕi(eϕi+1 − eϕi)
Jp i+1/2 ' −(ξi+1 − ξi)ϕi+1 − ϕi
h
1eϕi+1 − eϕi
= −ϕi+1 − ϕi
h
(e−ϕi+1ξi+1
1− eϕi−ϕi+1− e−ϕiξieϕi+1−ϕi − 1
)= −ϕi+1 − ϕi
h
(pi+1
1− eϕi−ϕi+1− pi
eϕi+1−ϕi − 1
)= − 1
h(B(ϕi − ϕi+1)pi+1 −B(ϕi+1 − ϕi)pi)
I B(x) = x/(ex − 1) : Bernoulli function
mixed variational formulation : 1 / 2
Slotboom variable ξ : p = ξe−ϕ
div(Jp) = 0 in Ω,eϕJp = −∇ξ in Ω.
function space :H(div) = v ∈ L2(Ω)2 ; div v ∈ L2(Ω),
Σ = v ∈ H(div) ; v · ν = 0 on ΓN
integration by parts leads to∫Ω
eϕJp · v =−∫
Ω
∇ξ · v =∫
Ω
ξ∇ · v −∫
∂Ω
ξv · ν∫Ω
eϕJp · v −∫
Ω
ξ∇ · v = −∫
ΓD
heϕv · ν −∫
ΓN
ξv · ν ∀v ∈ Σ
∫Ω
∇ · Jp q = 0 ∀q ∈ L2(Ω)
mixed variational formulation : 2 / 2
mixed-type weak formulation
find (Jp, ξ) ∈ Σ× L2(Ω)∫Ω
eϕJp · v −∫
Ω
ξ∇ · v =−∫
ΓD
heϕv · ν ∀v ∈ Σ
−∫
Ω
∇ · Jp q = 0 ∀q ∈ L2(Ω)
symmetric indefinite
replacing ξ = eϕp again,
find (Jp, p) ∈ Σ× L2(Ω)∫Ω
eϕJp · v −∫
Ω
eϕp∇ · v =−∫
ΓD
heϕv · ν ∀v ∈ Σ
−∫
Ω
∇ · Jp q = 0 ∀q ∈ L2(Ω)
unsymmetric indefinite
hybridization of mixed formulation + mass lumping⇒ FVM
boundary conditions for a diode device (thanks to Dr. Sho )
(4) (3)
(5)
(6) (2)
(1)
N region : x < 1/4 ∧ y > 3/4,C(x, y) = nd = 1017
P region : othersC(x, y) = −na = −1020
n p = n2i , ni = 1.08× 1010
charge neutralityp− n+C(x, y) = 0 on (1), (4).
N region :n =
√n2
i + C2/4 + C/2 ' nd
P region :p =
√n2
i + C2/4− C/2 ' na
(1) ϕ = − log(na
ni) + 1
Vthϕappl n =
ni
nap =
na
ni(2), (3), (5), (6) ∂νϕ = 0 ∂νn = 0 ∂νp = 0
(4) ϕ = log(nd
ni) n =
nd
nip =
ni
nd
nonlinear iteration to obtain the stationary state
a variant of Gummel mapϕm, nm, pm : given→ ϕm+1, nm+1, pm+1 by a fixed point methodSlotboom variable ξm, ηm
pm = ξme−ϕm
nm = ηmeϕm
to find a solution of the nonlinear eq. : −div(λ2∇ϕ) = p− n+ C(x)
F (ηm, ξm ; ϕ,ψ) = λ2
∫Ω
∇ϕ · ∇ψ −∫
Ω
(ξme−ϕ − ηme−ϕ + C)ψ = 0
differential calculus with δϕ ∈ H1(Ω) ; ψ = 0 on ΓD leads to
F (ηm, ξm ; ϕm + δϕ, ψ)− F (ηm, ξm ; ϕm, ψ)
= λ2
∫Ω
∇δϕ · ∇ψ −∫
Ω
ξm(e−ϕm−δϕ − e−ϕm
)ψ − ηm
(eϕm+δϕ − eϕm
)ψ
= λ2
∫Ω
∇δϕ · ∇ψ +∫
Ω
(ξme−ϕm
+ ηmeϕm)δϕψ
= λ2
∫Ω
∇δϕ · ∇ψ +∫
Ω
(pm + nm)δϕψ ϕm+1 = ϕm + δϕ
FreeFem++ scriptFinite element approximation
I Jp ∈ H(div) : Raviar-Thomas : RT1(K) = (P1(K))2 + ~xP1(K),I p ∈ L2(Ω) : piecewise linear : P1(K),I ϕ ∈ H1(Ω) : piecewise linear : P1(K).
load "Element_Mixte"load "Dissection"defaulttoDissection();fespace Vh(Th, RT1); fespace Ph(Th, P1);fespace Xh(Th, P1);Vh [up1, up2], [v1, v2]; Ph pp, q;Xh phi; // obtained in Gummel mapproblem DDp([up1, up2, pp],[v1, v2, q],
solver=sparsesolver,strategy=3,tolpivot=1.0e-2,tgv=1.0e+30) =
int2d(Th,qft=qf9pT)(exp(phi) * (up1 * v1 + up2 * v2)- exp(phi) * pp * (dx(v1) + dy(v2))+ q * (dx(up1) + dy(up2)))
+int1d(Th, 1)(gp1 * exp(phi) * (v1 * N.x + v2 * N.y))+int1d(Th, 4)(gp4 * exp(phi) * (v1 * N.x + v2 * N.y))+ on(2, 3, 5, 6, up1 = 0, up2 = 0);
matrix representation and preconditioning
linear system for hole concentration by RT1/P1[A(~ϕ) B1(~ϕ)T
−B2 0
] [~u~p
]=
[~f(~ϕ)~0
]matrices and vectors weighted with exponential of electrostaticpotential
~v TA(~ϕ)~u =∫
Ω
eϕu · v
~v TB1(~ϕ)~p = −∫
Ω
eϕp∇ · v
~v T ~f(~ϕ) = −∫
ΓD
heϕv · ν
~q TB2~u = −∫
Ω
q∇ · u
preconditioned system by scaling matrix [W ]i i = 1/[A(~ϕ)]i i[W A(~ϕ) W B1(~ϕ)T
−B2 0
] [~u~p
]=
[W ~f(~ϕ)
~0
]
numerical result of a semi-conductor problem
I compute thermal equilibrium with p = niexp(−ϕ)Vth andn = niexp(ϕ)Vth
I Newton iteration for the potential equation and fixed pointiteration for the whole system : a kind of Gummel map
electrostatic potential hole concentration
Raviart-Thomas finite element
RT0(K) = (P0(K))2 + ~xP0(K) ⊂ (P1(K))2.I K : triangle elementI Ei : edges of KI ~νi : outer normal of K on Ei
I ~Ei : normal to edge Ei
~νi~Ei
Ei
Pi
K
~v ∈ RT0(K) ⇒ ~v|Ei · ni ∈ P0(Ei), div~v ∈ P0(K)
finite element basis
~Ψi(~x) = σi|Ei||K|
(~x− ~Pi) σi = ~Ei · ~νi, Pi : node of K
finite element vector value is continuous on the middle point on Ei.inner finite element approximation of H(div ; Ω)∫
K
eϕ~Ψi · ~Ψj ←∫
K
eϕ1λ1+ϕ2λ2+ϕ3λ3λkλl by exact quadrature∫K
eϕ~Ψi · ~Ψj '1|K|
∫K
eP
k ϕkλk
∫K
~Ψi · ~Ψj : exponential fitting
λ1, λ2, λ3 : barycentric coordinates of K
summary
I Dissection direct solver can factorize indefinite unsymmetricmatrix with symmetric pivoting
I mixed finite element formulation for Dirft-Diffusion equations withstandard numerical integration instead of Schafetter-Gummelscheme / exponential fitting
I possible extension with higher order elements withRT1(K)/P1(K)/P2(K)
ReferencesI F. Brezzi, L. D. Marini, S. Micheletti, P. Pietra, R. Sacco, S. Wang.
Discretization of semiconductor device problems (I)F. Brezzi et al., Handbook of Numerical Analysis vol XIII,Elsevier 2005
I personal communication with Dr. Sho Shohiro @ Osaka Univ.