large-scale nonlinear device-level powersite.ieee.org/pes-hpcgrid/files/2019/08/4_pesgm2019.pdf ·...
TRANSCRIPT
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Large-Scale Nonlinear Device-Level PowerElectronic Circuit Simulation on GPUs1
Venkata Dinavahi
University of AlbertaEdmonton, Alberta, Canada.
Aug. 2019
1S. Yan, Z. Zhou, V. Dinavahi, “Large-Scale Nonlinear Device-Level PowerElectronic Circuit Simulation on Massively Parallel Graphics ProcessingArchitectures,” IEEE Trans. on Power Electronics, vol. 33, no. 6, pp.4660-4678, June 2018.
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Outline
1 Introduction
2 GPU Specs
3 Massive-Thread Parallel Modules for Power Electronic CircuitSimulation
4 Case Studies and Data Analysis
5 Conclusion
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Introduction
Power Electronic Circuit Simulation
Computational speed is an overriding concern in large-scalepower electronic circuit simulation using device-level circuitsimulators.
Modeling complex systems composed of power electronicsubsystems, such as HVDC grids, transportation systems,renewable energy systems, and microgrids, can be verychallenging.
Compromise between system size, modeling complexity,and simulation duration to obtain a reasonable executiontime for the application.
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
IntroductionApplications
Computational bottlenecks arise due to repeated nonlinear transientsimulations required for:
robust design of power electronic systems involving optimization tofine-tune parameters at the circuit and component levels that needseveral design iterations and hundreds of simulation runs;
simulation-based statistical and sensitivity analysis to reducedesign costs and time;
comprehensive fault and reliability analysis of systems using amatrix of faults representing possible device/component failures.
visualization and analysis of large sets of post-simulation results toextract performance indices;
detailed modeling of complex mixed-signal and multi-domainsubsystems with widely different time constants, for e.g., electronic,electrical, magnetic, thermal, and hydraulic systems.
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
IntroductionSimulation Tools
Variety of device-level simulators exist such as SaberRD, Orcad,PSIM, PLECS, LTSpice, PECS, PETS, DesignLab, etc.
Plethora of models for semiconductor devices such as diodes,BJTs, JFETs, MOSFETs, IGBTs, and basic circuit components;
Assortment of studies such as nonlinear dc, transient, linear ac(small signal), and Monte Carlo analysis. Native mixed-signalcapabilities or provide co-simulation interfaces.
Common attribute: single-thread sequential program execution onCPU. Some distributed processing possible on multiple CPU cores.
Model order reduction for improving execution speed includecircuit size reduction, and using averaged or linearized models forthe switching devices to observe only the system-level behavior.
Evaluating comprehensive system behavior entails maintainingmodels of varying size and complexity on different simulation tools.
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Stream Multiprocessors of GPUsYAN et al.: LARGE-SCALE NONLINEAR DEVICE-LEVEL POWER ELECTRONIC CIRCUIT SIMULATION ON MASSIVELY PARALLEL GRAPHICS 4661
attained task parallelism is coarse grained at best. The result-ing computational efficiency of device-level circuit simulatorswas primarily derived from an increase in CPU clock speed,which until the mid-2000s could be relied upon to provide thenecessary acceleration. However, computer chip manufacturesno longer rely on clock speed for higher processing power;they have implemented multicore CPU and many-core GPU ar-chitectures to increase chip performance. While multiple threadconcepts on CPUs such as hyperthreading were introduced earlyon, they were hardly taken advantage of by circuit simulatorsmainly due to the cumbersome task of rewriting the programcode to enable multiple threads of execution.
Model order reduction is the usual course for improving com-putational speed in device-level circuit simulators. Model sim-plifications include circuit size reduction and using averaged orlinearized models for the switching devices to observe only thesystem-level behavior. However, evaluating the system’s com-prehensive behavior entails maintaining many different modelsof varying size and complexity on different simulation tools. Itwould be far more effective if the same simulation tool couldefficiently show both the system-level and device-level resultsover long time frames. Taking IGBT as an example, there areseveral models ranging from detailed physics-based model toideal switch model. Hefner brought up the first complete ana-lytical physics-based model available for a device-level circuitsimulator, which is implemented in SaberRD [9], [10]. As themost common voltage source converter for HVdc applications,modular multilevel converter (MMC) is normally simulated us-ing simplified IGBT and diode models because of its complexstructure consisting of a series of submodules (SM); device-leveldetails of IGBT and diode cannot be observed using simplifiedsystem-level models of the MMC.
Since GPU offers a massively parallel architecture composedof thousands of cores grouped into streaming multiprocessorsthough its clock frequency is relevantly lower than CPU, ithas a higher compute power and throughput in floating pointcalculations than traditional CPUs, as compared in Table III.The attained data parallelism is fine-grained, which conformsthe single instruction, multiple data (SIMD) format; therefore,to fully exploit GPU acceleration, the device models and thenumerical algorithms have to be rewritten into the SIMD for-mat [11]. Mature application programming interfaces are avail-able for SIMD abstraction such as CUDA, DirectCompute, andOpenCL. The user develops C/C++ code interlaced with spe-cial constructs and functions to access the parallel cores anddistributed memories on the GPU. Furthermore, optimized nu-merical libraries such as CUBLAS [12] and CUFFT [13] areavailable for linear solvers and data processing. GPU-basedmassively parallel processing has been used worldwide formyriad applications, such as computational fluid dynamics,life sciences, medical imaging, game physics, seismic simula-tions, etc., and impressive acceleration has been reported [14]–[17]. For power system computation, GPUs have been used forvarious studies, such as transient stability simulation [18], elec-tromagnetic transient simulation [19], dynamic state estima-tion [20], [21], ionized field computation [22], and power flowcalculation [23].
TABLE IGPU SPECIFICATIONS
Chip GK110 GP104(Kepler) (Pascal)
Fabrication 28 nm 16 nmNumber of single precision cores 2880 2560Memory bandwidth 336 GB/s 320 GB/sMemory size 6 GB 8 GBMemory type GDDR5 GDDR5XCore clock 837 MHz 1607 MHz
In this paper, a massively parallel simulation of large-scalepower electronic circuits using nonlinear physics-based device-level models on the GPU architecture is proposed. The massive-thread implementation of physics-based IGBT and power diodeutilize the Hefner’s IGBT model and Lauritzen’s diode model[24]. The numerical solution modules including the Newton–Raphson iterative method, matrix equation solution using par-tial LU decomposition are implemented in the massive-threadframework. Based on the MMC circuit characteristic, a mathe-matical method is proposed to decompose the semiblock Jaco-bian matrix. In addition, a variable time-stepping scheme usingthe predictor–corrector method is adopted to increase simula-tion efficiency. The GPU-based massive-thread simulation iscompared with SaberRD simulator as well as a complete CPUimplementation, in terms of accuracy and computational effi-ciency.
This paper is organized as follows. Section II briefly de-scribes the GPU hardware architecture and CUDA abstractionfor parallel computation. Section III describes the details of thedeveloped massive-thread parallel modules for the IGBT, powerdiode, and the numerical algorithms. Section IV shows the casestudies of MMC with simulation results and discussion. Finally,Section V gives the conclusion of this paper.
II. BACKGROUND ON GPU ARCHITECTURE AND
PROGRAMMING INTERFACE
Two GPU architectures, NVIDIA Kepler GK110 (2012) andPascal GP104 (2016), whose specifications are given in Table I,are used to implement the massively parallel power electroniccircuit simulation codes. In hybrid computational systems, CPUand GPU cooperate as host and device, respectively. The hostsends all essential instructions and data to the device throughPCIe 3.0 × 16 interface up to 15.754 GB/s. Instructions are dis-tributed through GigaThread interface to each streaming mul-tiprocessor and data are transferred to global memory on theGPU board. Every 32 threads in the streaming multiprocessorare distributed into one warp as an execution unit, which oper-ate simultaneously, while other threads are parallelized by thepipeline automatically. Finally, results saved in global memoryare transferred back to host through PCIe 3.0 bus again.
Growing from NVIDIA’s Fermi architecture, the Kepler ar-chitecture has 15 streaming multiprocessors with registers,caches, and shared memory, shown in Fig. 1(a) [25]. Eachmultiprocessor contains 192 CUDA cores, which is 3× that
(a) Kepler (b) Pascal
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Nonlinear Physics-Based Power Diode Model
Physics-based Diode Model
P+ N+
vE vM
i-region
+ -
i(t)
CJ
iR
RS
v(t)
+ -v(t) +
i(t)
gR
2vE
iReq
gJ
-
RS
v(t)
iJeq
Reverse recovery
Junction capacitance
Contact resistor
(a)
(b) (c)
Anode Cathode
Anode Cathode
Anode Cathode
CJ
2vM
RM
vA(t) vin(t) vK(t)
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Nonlinear Physics-Based Power Diode Model
Nonlinear Diode Model
Reverse recovery phenomena:
iR(t) =qE (t)− qM(t)
TM, (1)
0 =dqM(t)
dt+
qM(t)
τ− qE (t)− qM(t)
TM, (2)
qE (t) = ISτ(evE (t)
VT − 1), (3)
Voltage drop across i-region:
vM(t) =VTTM i(t)
qM(t). (4)
Voltage drop across diode:
v(t) = 2vM(t) + 2vE (t) + RS
[iE (t) +
dqJ(t)
dt
], (5)
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Nonlinear Physics-Based Power Diode Model
Nonlinear Diode Model
Charge on junction capacitance:
qJ(t) =
∫CJ(t)d(2vE ). (6)
The junction capacitance CJ(t) is given as
CJ(t)=
CJ0
(1− 2vE (t)
φB)m
vE <φB
4
[2m+2mvE (t)
φB− (m−1)2m
]CJ0 vE ≥
φB
4
, (7)
qM =∆t · qE (t)
2TM(1 + k1∆t2 )
+qhist(t −∆t)
1 + k1∆t2
, (8)
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Nonlinear Physics-Based Power Diode Model
where the history term is given as
qhist(t −∆t) =∆t
2TMqE (t −∆t)− k1∆t
2qM(t −∆t). (9)
The equivalent reverse recovery current iReq is given as
iReq = k2ISτ(evE (t)
vT −1)− qhist(t −∆t)
TM(1 + k1∆t2 )−2vE (t)gR , (10)
where gR is the dynamic conductance defined as
gR =1
2vTk2ISτe
vE (t)
VT . (11)
The equivalent junction capacitance current iJeq is obtained as
iJeq = iJ(t)− 2vE (t)gJ , (12)
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Nonlinear Physics-Based Power Diode Model
The equivalent junction conductance gJ is given as
gJ =2
∆tCJ(t). (13)
The discretized and linearized diode equations can be obtained asfollows:
GDiode · VDiode = IDiodeeq , (14)
where
GDiode =
gR +gJ −gR−gJ 0
−gR−gJ gR +gJ + 1RM+RS
− 1RM+RS
0 − 1RM+RS
1RM+RS
, (15)
VDiode = [vA, vin, vK ]T and (16)
IDiodeeq = [−iReq−iJeq, iReq+iJeq , 0]T . (17)
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Nonlinear Physics-Based Power Diode Model
Massive-Thread Parallel Implementation of Diode Model
P+ N+
vE vM
i-region
+ -
i(t)
CJ
iR
RS
v(t)
+ -v(t) +
i(t)
gR
2vE
iReq
gJ
-
RS
v(t)
iJeq
Reverse recovery
Junction capacitance
Contact resistor
(a)
(b) (c)
Anode Cathode
Anode Cathode
Anode Cathode
CJ
2vM
RM
vA(t) vin(t) vK(t)
Global Memory
vE(t)
converge?
No
Yes
RJ1
RJ2
RJn
...
Reverse recovery &
Junction capacitance
Kernel0
qhist
vE(t)
qhist(t-Δt)gR(t)
iReq(t)
qhist(t)
gJ(t)
iJeq(t)
NS1
NS2
NSn
...
Nonlinear
solution
Kernel1
v(t)
Node voltage v(t)
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Nonlinear Physics-Based IGBT Model
Physics-based IGBT Model
CathodeGate
Anode
p+
DepletionRegion
n-
p+
BJTb
Cdsj Ccer
n+CmCoxsMOSFETCoxd
Cgdj
d rb
Cebj+Cebd
s
e
c
Gate Cathode
CgsCgd
imosCdsj
imult
Cmult
icssCceribssCeb
rbAnode
(a) (b)
g
g
d s c
e
b
Gate
a
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Nonlinear Physics-Based IGBT Model
IGBT nodal equation:
GIGBT · VIGBT = IIGBTeq , (18)
where
VIGBT = [vc vg va vd ve ]T , (19)
Using N-R method to solve the nonlinear matrix equation (18),∆VIGBT is obtained to update VIGBT iteratively:
GIGBT(n)∆VIGBT(n+1) = −IIGBT(n), (20)
where
IIGBT(n) = GIGBT(n)VIGBT(n) − IIGBT(n)eq . (21)
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Nonlinear Physics-Based IGBT Model
4666 IEEE TRANSACTIONS ON POWER ELECTRONICS, VOL. 33, NO. 6, JUNE 2018
Fig. 6. Massive thread parallel implementation of the physics-based IGBT model.
iteratively. Therefore, the iterative equation is given as
GIGBT(n)ΔV IGBT(n+1) = −I IGBT(n) (42)
where
I IGBT(n) = GIGBT(n)V IGBT(n) − I IGBT(n)eq . (43)
3) Parallel Massive-Thread Mapping: For a system con-taining n IGBTs, the massive-thread parallel implementation is
shown in Fig. 6 and is described in Algorithm 2. Among the 12kernels involved in the module, Kernel0 and Kernel1 check p-i-njunction and MOSFET junction voltage limitation within succes-sive Newton–Raphson iterations; intermediate parameters p0 ,Q, and M are processed in Kernel2 , Kernel3 and Kernel4 ; sixnonlinear capacitors Cgd , Cgs , Cdsj , Cmult , Ceb , and Ccer areupdated in Kernel5 and Kernel6 ; with those capacitances, theequivalent conductance GQ and parallel current source IQeq
I IGBTeq =
⎡⎢⎢⎢⎢⎢⎣
ic eq
ig eq
ia eq
id eq
ie eq
⎤⎥⎥⎥⎥⎥⎦
=
⎡⎢⎢⎢⎢⎢⎢⎣
imult eq + iC mult eq + icss eq + iC cer eq + iC gs eq + iC dsj eq + imos eq
−iC gs eq + iC dg eq
−iT eq
iC ceb eq + ibss eq − imos eq − iC dg eq − iC dsj eq − imult eq − iC mult eq
icss eq − ibss eq − iC cer eq − iC ceb eq + iT eq
⎤⎥⎥⎥⎥⎥⎥⎦
(40)
GIGBT =⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
gmult ds + gmult gs +
gmult bc + gcss bc +
gC cer bc + gC gs+gC dsj + gmos ds+
gmos gs
−gmult gs − gC gs−gmos gs
−gmult ae − gcss ae
−gmult ds − gC mult bc−gcss bc + gcss eb−gC cer bc − gdsj
−gmos ds + gmult eb
−gmult ae − gcss ae +
gcss eb − gmult eb
−gC gs gC gs + gC dg 0 −gC dg 0
−gT bc 0 gT ae gT bc − gT eb −gT ae + gT eb
gbss bc − gmos gs −−gmos ds − gC dsj −gmult ds − gmult gs
gC mult bc
gmos gs − gC dg+
gmult gs
gmult ae
gC eb − gbss bc + gbss eb +
gmos ds + gC dg + gC dsj +
gmult ds − gmult eb − gC mult bc
−gC eb + gbss eb −gmult ae + gmult eb
−gcss bc − gC cer bc −gbss bc + gT bc
0 gcss ae − gT ae
gcss bc − gcss eb + gC cer bc −gbss eb + gbss bc − gC eb −
gT bc + gTeb
−gcss ae + gcss eb+
gbss eb + gC eb+
gT ae − gTeb
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
(41)
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Nonlinear Physics-Based IGBT Model
Discretized and Linearized IGBT Model
CathodeGate
Anode
p+
DepletionRegion
n-
p+
BJTb
Cdsj Ccer
n+
CmCoxsMOSFETCoxd
Cgdj
d rb
Cebj+Cebd
s
e
c
GateCathode
CgsCgd
imosCdsj
imult
Cmult
icssCceribssCeb
rb
Anode
(a) (b)
g
g
d s c
e
b
Cathode
Gate
gC
gs
i Cgse
q
gC
gd
i Cgdeq
gmosds
imoseq
gmosgsvgs
gC
dsj
i Cdsj
eq
gm
ult
ds
gm
ult
gsv
gs
gm
ult
aev
ae
gm
ult
ebv e
b
i mult
eq
gC
mult
bc
i Cm
ult
eq
gC
eb
i Ceb
eq
gbss
bcv
bs
i bss
eq
gbss
eb
i Cce
req
gC
cerb
cvbc
gcs
sebv e
b
i css
eq
gcs
saev
ae
gcs
sbcv
bc
Anode
gT
bcv
bc
gT
ae
gT
ebv e
b
i Teq
Base
a
Cathode
Gate
gC
gs
i Cgs_
eq
gC
gd
i Cgd_
eq
gmos_ds
imos_eq
gmos_gsvgs
gC
dsj
i Cdsj
_eq
gm
ult
_ds
gm
ult
_gsv
gs
gm
ult
_aev
ae
gm
ult
_eb
v eb
i mult
_eq
gC
mult
_bc
i Cm
ult
_eq
gC
eb
i Ceb
_eq
gbss
_bcv
bs
i bss
_eq
i Cce
r_eq
gC
cer_
bcv
bc
gcs
s_eb
v eb
i css
_eq
gcs
s_aev
ae
gcs
s_bcv
bc
Anode
gT
bcv
bc
gT
ae
gT
ebv e
b
i Teq
Base
gbss
_eb
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Nonlinear Physics-Based IGBT Model
Massive-Thread Parallel Implementation of IGBT Model
Global Memory
Vn(t)
Ccer(t-∆t)
Q(t-∆t)
IQ(t-∆t)
∆t
vds(t-∆t)
Kernel2
p0
Kernel3
Qvn
eb(t)
Cgs1
Cgd Cgs CdsjCeb
Kernel4
M
vnds(t)
Kernel7 Cgs1
iTibssicss imosimult
GnI(t)
InIeq(t)
Kernel6
CcerCmult
Qngs(t)
Qngd(t)
Cngd(t)
Qnds(t)
Cndsj(t)
vngs(t)
Qneb(t)Cn
eb(t)Qn
ce(t-∆t)
Cncer(t-∆t)
InQce(t-∆t)
Mn(t)
∆tnvn
ds(t-∆t)
Qnce(t)
Cncer(t)
Qnmult(t)
Cnmult(t)
GnQ(t)
InQeq(t)
InQ(t)
Kernel5
Kernel11
Kernel9
Kernel0
vneb(t)
vn-1eb(t)
vn-1bc(t)
vnbc(t) BJT
Kernel1
...
Qn(t)
vneb(t)
vnbc(t)
vngs(t)
vn-1gs(t)
vngs(t)
Kernel10
∆Vn(t)
∆V(t) converge?
No
Yes
Vn(t)
Vn(t)
Charge in capacitors Q
Current through capacitors IQ
Time step size ∆t
Collector-emitter redistribution capacitance Ccer
Node voltages V
FET
rb
IGBT-AE1
Jacobian
IGBT-AE2
IGBT-AEn
Kernel8
...
IGBT-AE1
IGBT-AE2
IGBT-AEn
Vn(t)
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Nonlinear Physics-Based IGBT Model
Modular Multi-Level Converter (MMC) Circuit
SM
SMn
RL
...
va
vc
La
La
+Vdc
-Vdc
Cm
LL vsa iaib
iSM S1D1
D2
ia1
ia2
S2
SM2
SM1
SMn...
SM2
SM1
SMn
...
vb
Lb
Lb
SM2
SM1
SMn
...SM2
SM1
vsbvsc
SMn
...
Lc
Lc
SM2
SM1
SMn
...
SM2
SM1
ic
ib1
ib2
ic1
ic2Phase U
nit
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Nonlinear Physics-Based IGBT Model
MMC Control System
(a)
(b)AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Behavioral IGBT Model
Behavioral IGBT model
Global Memory
iarm(t-∆t)
g
Node voltage update
for each SM
vcap_h(t-∆t)
icap(t-∆t)
vcap_h(t)
icap(t)
Gate signal g
History capacitor voltage Vcap_h
Arm current iarm
Kernel0
vSM
rSM
iarm(t) NV1
NV2
NVn
...
Capacitor current icap
TC1
TC2...
TCn
vcap_h
rcap
r1
r2
vSM
rSM
Ideal
IGBT
Ideal
Diode
vce0
vd0
rS_IGBT
rS_dv1
v2
(a) (b) (c) (d)
Thèvenin circuit (TC)
Node voltage (NV)
(e) (f)
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Behavioral IGBT Model
Massive-Thread Implementation of Behavioral IGBT Model
Global Memory
iarm(t-∆t)
g
Node voltage update
for each SM
vcap_h(t-∆t)
icap(t-∆t)
vcap_h(t)
icap(t)
Gate signal g
History capacitor voltage Vcap_h
Arm current iarm
Kernel0
vSM
rSM
iarm(t) NV1
NV2
NVn
...
Capacitor current icap
TC1
TC2...TCn
vcap_h
rcap
r1
r2
vSM
rSM
Ideal
IGBT
Ideal
Diode
vce0
vd0
rS_IGBT
rS_dv1
v2
(a) (b) (c) (d)
Thèvenin circuit (TC)
Node voltage (NV)
(e) (f)
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Numerical Solvers
Numerical Solvers
Modules
Massive-Thread Parallel Newton-Raphson Algorithm.
Novel Relaxation Based Jacobian Matrix Update.
Parallel Partial LU Decomposition.
Predictor-Corrector Variable Time-Stepping Scheme.
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Results and Accuracy Comparison
Results
Single-phase five-level physics-based MMC simulation resultscomparison between GPU (top) and SaberRD (bottom).
40 45 50 55 60 65 70 75 80
-900
-600
-300
0
300
600
900
VGPU
out
(V)
40 45 50 55 60 65 70 75 80
-900
-600
-300
0
300
600
900
Simulation time (ms)
VSaberRD
out
(V)
(a) 5-level MMC Vout comparison
57 58 59 60 61 62 63 64 65-900
-600
-300
0
300
600
900
VGPU
out
(V)
57 58 59 60 61 62 63 64 65-900
-600
-300
0
300
600
900
Simulation time (ms)
VSaberRD
out
(V)
(b) 5-level MMC zoomed Vout comparison
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25 2.5 2.75
100
200
300
400
500
60Hz, 502.7V
2.5KHz, 39.4V
Mag
nitu
deGPU
(V)
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25 2.5 2.75
100
200
300
400
500
60Hz, 506.6V
2.5KHz, 45.3V
Frequency (KHz)
Mag
nitu
deSaberRD
(V)
(c) 5-level MMC Vout FFT comparison
40 45 50 55 60 65 70 75 80
-200
-100
0
100
200
IGPU
load
(A)
40 45 50 55 60 65 70 75 80
-200
-100
0
100
200
Simulation time (ms)
ISaberRD
load
(A)
(d) 5-level MMC Iload comparison
57 58 59 60 61 62 63 64 65-200
-100
0
100
200
IGPU
load
(A)
57 58 59 60 61 62 63 64 65-200
-100
0
100
200
Simulation time (ms)
ISaberRD
load
(A)
(e) 5-level MMC zoomed Iload comparison
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25 2.5 2.75
40
80
120
160
60Hz, 155.5A
2.5KHz, 3.55AMag
nitu
deGPU
(A)
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25 2.5 2.75
40
80
120
160
60Hz, 156.0A
2.5KHz, 3.81A
Frequency (KHz)
Mag
nitu
deSaberRD
(A)
(f) 5-level MMC Iload FFT comparison
62 64 66 68 70 72 74
460
480
500
VGPU
cap
(V)
V cap1
V cap2
V cap3
V cap4
62 64 66 68 70 72 74
460
480
500
Simulation time (s)
VSaberRD
cap
(V)
V cap1
V cap2
V cap3
V cap4
(g) 5-level MMC arm Vcap comparison
13.474 13.476 13.478 13.480 13.482
0
200
400
600
800
VGPU
ce(V
)
Vce
0
50
100
tr = 0.19μs
90%Ic
10%Ic
tond = 1.09μs
Gate on
IGPU
c(A
)
Vce
Ic
13.474 13.476 13.478 13.480 13.482
0
200
400
600
800
VSaberRD
ce(V
)
Vce
0
50
100
tr = 0.20μs
90%Ic
10%Ic
tond = 1.12μs
Gate on
Simulation time (ms)
ISaberRD
c(A
)
Vce
Ic
(h) IGBT gate on waveform comparison
13.214 13.216 13.218 13.220
0
200
400
600
VGPU
ce(V
)
Vce
0
20
40
60
80
tf = 0.36μs
90%Ic
10%Ic
toffd = 3.1μs
Gate off
IGPU
c(A
)
Vce
Ic
13.214 13.216 13.218 13.220
0
200
400
600
VSaberRD
ce(V
)
Vce
0
20
40
60
80
tf = 0.35μs
90%Ic
10%Ic
toffd = 3.09μs
Gate off
ISaberRD
c(A
)
Vce
Ic
(i) IGBT gate off waveform comparison
4674 IEEE TRANSACTIONS ON POWER ELECTRONICS, VOL. 33, NO. 6, JUNE 2018
Fig. 17. Single-phase five-level physics-based MMC simulation results comparison between GPU (top) and SaberRD (bottom). (a) Five-level MMC Voutcomparison. (b) Five-level MMC zoomed Vout comparison. (c) Five-level MMC Vout FFT comparison. (d) Five-level MMC Iload comparison. (e) Five-levelMMC zoomed Iload comparison. (f) Five-level MMC Iload FFT comparison. (g) Five-level MMC arm Vcap comparison. (h) IGBT gate ON waveform comparison.(i) IGBT gate OFF waveform comparison.
TABLE IIDEVICE-LEVEL SWITCHING TIME AND POWER DISSIPATION FOR
IGBT-DIODE UNIT
Switching time (μs) Power dissipation (W)
SaberRD GPU SaberRD GPU
tIGBTd (on)
1.12 1.09 P IGBTon 112.31 113.02
tIGBTr 0.20 0.19 P IGBT
off 74.01 74.84tIGBTd (off) 3.09 3.10 P IGBT
con d 287.52 289.06
tIGBTf 0.35 0.36 P Diode
con d 7.48 7.45tDioder r 0.66 0.64 P Diode
r r 9.95 10.07
switching times and power dissipation of the lower IGBT-Diodeunit in the SM, referred to as S2 and D2 , respectively, in Fig. 7,during switching and conducting states, which are calculated
using
P IGBT =
∫vce(t)ic(t)dt
Ts(68)
PDiode =
∫vf (t)if (t)dt
Ts(69)
for IGBT and diode, respectively, where Ts is the switchingperiod.
The simulation results of 3-phase 11-level MMC circuit withnonlinear physics-based models, including output voltages andload currents, are shown in Fig. 18(a), which are comparedwith the simulation results of the same MMC circuit withbehavior-based models shown in Fig. 18(b). The waveformsof physics-based model have much more detail than those frombehavior-based model because of the nonlinear capacitance anddynamic current sources in the former model. When the MMC
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Results and Accuracy Comparison
Results
Three-phase 11-level physics-based and behavior-based MMCsimulation comparison.
60 62 64 66 68 70 72 74 76 78 80
-2
-1
0
1
2
3-ph
ase
VGPU
out
(kV
)
Va
Vb
Vc
60 62 64 66 68 70 72 74 76 78 80
-100
-50
0
50
100
3-ph
ase
IGPU
load
(A)
Ia
Ib
Ic
60 62 64 66 68 70 72 74 76 78 80
380
390
400
410
420 Lower arm Upper arm
Simulation time (s)
VGPU
cap
(V)
(a) 11-level physics-based MMC simulation
60 62 64 66 68 70 72 74 76 78 80
-2
-1
0
1
2
3-ph
ase
VGPU
out
(kV
)
Va
Vb
Vc
60 62 64 66 68 70 72 74 76 78 80
-100
-50
0
50
1003-
phas
eIG
PU
load
(A)
Ia
Ib
Ic
60 62 64 66 68 70 72 74 76 78 80
380
390
400
410
420 Lower arm Upper arm
Simulation time (s)
VGPU
cap
(V)
(b) 11-level behavior-based MMC simulation
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Results and Accuracy Comparison
Results
201-level behavior-based MMC output voltages, load currents, andarm capacitor voltages.
50 52 54 56 58 60 62 64 66
-40
-20
0
20
40Zoomed area
Simulation time (ms)
VGPU
out
(kV
)
Va
Vb
Vc
(a) 201-level behavior-based MMC Vout
50 52 54 56 58 60 62 64 66-1000
-500
0
500
1000
Zoomed area
Simulation time (ms)
IGPU
load
(A)
Ia
Ib
Ic
(b) 201-level behavior-based MMC Iload
50 52 54 56 58 60 62 64 66
240
280
320
360
400
440
480Lower arm Upper arm
Simulation time (ms)
VGPU
cap
(V)
Zoomed area
(c) 201-level behavior-based MMC arm Vcap
53.5 53.6 53.7 53.8 53.9 54.0 54.1 54.2 54.3
29.5
29.6
29.7
29.8
29.9
30.0
Simulation time (ms)
VGPU
out
(kV
)
Va
(d) 201-level behavior-based MMC zoomedVout
57.44 57.48 57.52 57.56 57.60 57.64
492.40
492.45
492.50
492.55
492.60
Simulation time (ms)
IGPU
load
(A)
Ia
(e) 201-level behavior-based MMC zoomedIload
55.96 55.98 56.00 56.02 56.04
266.8
266.9
267.0
Simulation time (ms)
VGPU
cap
(V)
(f) 201-level behavior-based MMC armzoomed Vcap.
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Execution Time and Speedup Comparison
Results
Single-phase nonlinear physics-based MMC circuit execution timeand speedup comparison.
4676 IEEE TRANSACTIONS ON POWER ELECTRONICS, VOL. 33, NO. 6, JUNE 2018
TABLE IVSABERRD, CPU, AND GPU EXECUTION TIMES FOR THE ONE-PHASE
NONLINEAR PHYSICS-BASED MMC
NL NS M Execution time (s) Speedup
SaberRD CPU GPUK GPUP GPUK GPUP
2 2 53 50.8 143.1 70.9 0.35 0.723 4 100 99.9 162.4 81.6 0.62 1.224 6 149 143.8 175.5 87.4 0.82 1.655 8 202 192.5 184.7 90.2 1.04 2.136 10 - 240.9 195.2 94.1 1.23 2.567 12 - 297.3 212.7 101.3 1.40 2.938 14 - 351.8 227.5 107.5 1.55 3.279 16 - 411.5 237.9 111.9 1.73 3.6810 18 - 481.5 250.2 115.6 1.92 4.1711 20 - 557.8 268.5 121.2 2.08 4.60
Fig. 20. One-phase nonlinear physics-based MMC circuit execution time andspeedup comparison.
since the run time of the CPU program was close to that ofSaberRD according to existing numbers, the CPU results wereused to calculate speedup replacing the incomplete results ofSaberRD for circuits with more than eight SMs. As shown inFig. 20, although the speedup rates of GPUs are less than 1 whenthe levels of MMC circuits are lower, up to 4 for GPUK and 2for GPUP , they increase steadily along with the circuit scale,which approach 2.08 for GPUK and 4.60 for GPUP . Benefittingfrom the higher clock and memory data rate listed in Table III,GPUP doubles the speedup of GPUK .
Although some speedups are shown in the simulations ofsingle-phase MMC circuits, they are not representative of thefull computational capability of GPUs for MMC simulation. Inorder to feed more data to the GPUs to realize their computepotential, 3-phase physics-based MMC circuits from 2-levelto 11-level are simulated for 100 ms with the variable time-stepping method, whose execution time is listed in Table V. Forthree-phase systems, SaberRD is difficult to converge beyondtwo-level MMC circuit, and therefore, it is absent from the com-parison. In addition, two Pascal GPU (GTX 1080) are involvedin the competition since there is sufficient compute load to beprocessed. When the 3-phase MMC circuit reaches 11-level with60 SMs, the acceleration of GPUs is obvious, which is 5.61 forGPUK and 9.58 for GPUP . Moreover, the 2-GPUP platformobtains the speedup of up to 14.67, which almost triples theperformance of Kepler GPU, as shown in Fig. 21. The specificstructure of MMC circuit consisting of upper and lower arms
TABLE VCPU AND GPU EXECUTION TIMES OF THREE-PHASE NONLINEAR
PHYSICS-BASED MMC CIRCUIT
NL NS M Execution time (s) Speedup
CPU GPUK GPUP 2GPUP GPUK GPUP 2GPUP
2 6 150.8 172.2 82.4 79.9 0.88 1.83 1.893 12 285.9 185.9 93.4 83.2 1.54 3.06 3.444 18 423.5 206.7 103.2 88.7 2.05 4.10 4.775 24 600.3 227.2 115.9 95.8 2.64 5.18 6.276 30 774.1 251.5 129.2 101.5 3.08 5.99 7.637 36 991.1 286.2 148.4 109.2 3.46 6.68 9.088 42 1249.2 302.3 164.2 117.8 4.13 7.61 10.609 48 1495.4 322.9 182.3 127.2 4.63 8.20 11.7610 54 1782.4 357.2 202.5 134.4 4.99 8.80 13.2611 60 2137.3 381.1 223.2 145.7 5.61 9.58 14.67
Fig. 21. Three-phase nonlinear physics-based MMC circuit execution timesand speedup comparison.
TABLE VICPU AND GPU EXECUTION TIMES OF THREE-PHASE BEHAVIOR-BASED
MMC CIRCUIT
NL NS M NI G B T Execution time (s) Speedup
CPU GPUK GPUP 2GPUP GPUK GPUP 2GPUP
11 60 120 37.2 35.5 14.1 14.3 1.05 2.64 2.6021 120 240 67.4 35.9 14.4 14.5 1.88 4.68 4.6541 240 480 129.2 36.2 14.5 14.6 3.57 8.91 8.8551 300 600 156.0 36.6 14.6 14.7 4.26 10.68 10.6181 480 960 245.8 38.2 14.9 14.9 6.43 16.50 16.50101 600 1200 302.8 40.9 16.4 15.1 7.40 18.46 20.05126 750 1500 376.2 41.9 16.8 15.4 8.98 22.39 24.43151 900 1800 453.2 43.1 17.2 15.8 10.52 26.35 28.68176 1050 2100 526.2 46.2 18.8 16.8 11.39 27.99 31.32201 1200 2400 595.6 47.3 19.0 17.1 12.59 31.35 34.83251 1500 3000 755.2 49.7 20.1 17.6 15.20 37.57 42.91301 1800 3600 894.5 53.2 21.4 18.5 16.81 41.80 48.35401 2400 4800 1188.2 56.1 23.2 19.9 21.18 51.22 59.71501 3000 6000 1495.6 62.5 25.5 21.6 23.93 58.65 69.24
simplifies the task management and the computational load bal-ancing of 2-GPUP platform, and only the data of one commonnode need to be exchanged between the two MMC arms.
Since the Jacobian matrix is prone to singularity as the volt-age levels increase, the nonlinear physics-based MMC circuitsare hard to converge; therefore, the MMC circuits larger than11-level are simulated with behavior-based model from 11-levelto 501-level for 1 s with a 10 μs fixed time-step method, whose
1 2 3 4 5 6 7 8 9 10 11 120
100
200
300
400
500
600
Voltage level number of MMC circuits
Exe
cutio
ntim
e(s
)
SaberRD timeCPU time
GPUK timeGPUP time
0
1
2
3
4
5
Spe
edup
SaberRD timeCPU time
GPUK timeGPUP time
GPUK speedupGPUP speedup
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Execution Time and Speedup Comparison
Results
Three-phase nonlinear physics-based MMC circuit execution timesand speedup comparison.
4676 IEEE TRANSACTIONS ON POWER ELECTRONICS, VOL. 33, NO. 6, JUNE 2018
TABLE IVSABERRD, CPU, AND GPU EXECUTION TIMES FOR THE ONE-PHASE
NONLINEAR PHYSICS-BASED MMC
NL NS M Execution time (s) Speedup
SaberRD CPU GPUK GPUP GPUK GPUP
2 2 53 50.8 143.1 70.9 0.35 0.723 4 100 99.9 162.4 81.6 0.62 1.224 6 149 143.8 175.5 87.4 0.82 1.655 8 202 192.5 184.7 90.2 1.04 2.136 10 - 240.9 195.2 94.1 1.23 2.567 12 - 297.3 212.7 101.3 1.40 2.938 14 - 351.8 227.5 107.5 1.55 3.279 16 - 411.5 237.9 111.9 1.73 3.6810 18 - 481.5 250.2 115.6 1.92 4.1711 20 - 557.8 268.5 121.2 2.08 4.60
Fig. 20. One-phase nonlinear physics-based MMC circuit execution time andspeedup comparison.
since the run time of the CPU program was close to that ofSaberRD according to existing numbers, the CPU results wereused to calculate speedup replacing the incomplete results ofSaberRD for circuits with more than eight SMs. As shown inFig. 20, although the speedup rates of GPUs are less than 1 whenthe levels of MMC circuits are lower, up to 4 for GPUK and 2for GPUP , they increase steadily along with the circuit scale,which approach 2.08 for GPUK and 4.60 for GPUP . Benefittingfrom the higher clock and memory data rate listed in Table III,GPUP doubles the speedup of GPUK .
Although some speedups are shown in the simulations ofsingle-phase MMC circuits, they are not representative of thefull computational capability of GPUs for MMC simulation. Inorder to feed more data to the GPUs to realize their computepotential, 3-phase physics-based MMC circuits from 2-levelto 11-level are simulated for 100 ms with the variable time-stepping method, whose execution time is listed in Table V. Forthree-phase systems, SaberRD is difficult to converge beyondtwo-level MMC circuit, and therefore, it is absent from the com-parison. In addition, two Pascal GPU (GTX 1080) are involvedin the competition since there is sufficient compute load to beprocessed. When the 3-phase MMC circuit reaches 11-level with60 SMs, the acceleration of GPUs is obvious, which is 5.61 forGPUK and 9.58 for GPUP . Moreover, the 2-GPUP platformobtains the speedup of up to 14.67, which almost triples theperformance of Kepler GPU, as shown in Fig. 21. The specificstructure of MMC circuit consisting of upper and lower arms
TABLE VCPU AND GPU EXECUTION TIMES OF THREE-PHASE NONLINEAR
PHYSICS-BASED MMC CIRCUIT
NL NS M Execution time (s) Speedup
CPU GPUK GPUP 2GPUP GPUK GPUP 2GPUP
2 6 150.8 172.2 82.4 79.9 0.88 1.83 1.893 12 285.9 185.9 93.4 83.2 1.54 3.06 3.444 18 423.5 206.7 103.2 88.7 2.05 4.10 4.775 24 600.3 227.2 115.9 95.8 2.64 5.18 6.276 30 774.1 251.5 129.2 101.5 3.08 5.99 7.637 36 991.1 286.2 148.4 109.2 3.46 6.68 9.088 42 1249.2 302.3 164.2 117.8 4.13 7.61 10.609 48 1495.4 322.9 182.3 127.2 4.63 8.20 11.7610 54 1782.4 357.2 202.5 134.4 4.99 8.80 13.2611 60 2137.3 381.1 223.2 145.7 5.61 9.58 14.67
Fig. 21. Three-phase nonlinear physics-based MMC circuit execution timesand speedup comparison.
TABLE VICPU AND GPU EXECUTION TIMES OF THREE-PHASE BEHAVIOR-BASED
MMC CIRCUIT
NL NS M NI G B T Execution time (s) Speedup
CPU GPUK GPUP 2GPUP GPUK GPUP 2GPUP
11 60 120 37.2 35.5 14.1 14.3 1.05 2.64 2.6021 120 240 67.4 35.9 14.4 14.5 1.88 4.68 4.6541 240 480 129.2 36.2 14.5 14.6 3.57 8.91 8.8551 300 600 156.0 36.6 14.6 14.7 4.26 10.68 10.6181 480 960 245.8 38.2 14.9 14.9 6.43 16.50 16.50101 600 1200 302.8 40.9 16.4 15.1 7.40 18.46 20.05126 750 1500 376.2 41.9 16.8 15.4 8.98 22.39 24.43151 900 1800 453.2 43.1 17.2 15.8 10.52 26.35 28.68176 1050 2100 526.2 46.2 18.8 16.8 11.39 27.99 31.32201 1200 2400 595.6 47.3 19.0 17.1 12.59 31.35 34.83251 1500 3000 755.2 49.7 20.1 17.6 15.20 37.57 42.91301 1800 3600 894.5 53.2 21.4 18.5 16.81 41.80 48.35401 2400 4800 1188.2 56.1 23.2 19.9 21.18 51.22 59.71501 3000 6000 1495.6 62.5 25.5 21.6 23.93 58.65 69.24
simplifies the task management and the computational load bal-ancing of 2-GPUP platform, and only the data of one commonnode need to be exchanged between the two MMC arms.
Since the Jacobian matrix is prone to singularity as the volt-age levels increase, the nonlinear physics-based MMC circuitsare hard to converge; therefore, the MMC circuits larger than11-level are simulated with behavior-based model from 11-levelto 501-level for 1 s with a 10 μs fixed time-step method, whose
1 2 3 4 5 6 7 8 9 10 11 120
250
500
750
1000
1250
1500
1750
2000
2250
Voltage level number of MMC circuits
Exe
cutio
ntim
e(s
)
CPU timeGPUK timeGPUP time
2GPUP time
0
2
4
6
8
10
12
14
16
Spe
edup
CPU timeGPUK timeGPUP time
2GPUP timeGPUK speedupGPUP speedup
2GPUP speedup
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Execution Time and Speedup Comparison
Results
Three-phase behavior-based MMC circuit execution times andspeedup comparison.
4676 IEEE TRANSACTIONS ON POWER ELECTRONICS, VOL. 33, NO. 6, JUNE 2018
TABLE IVSABERRD, CPU, AND GPU EXECUTION TIMES FOR THE ONE-PHASE
NONLINEAR PHYSICS-BASED MMC
NL NS M Execution time (s) Speedup
SaberRD CPU GPUK GPUP GPUK GPUP
2 2 53 50.8 143.1 70.9 0.35 0.723 4 100 99.9 162.4 81.6 0.62 1.224 6 149 143.8 175.5 87.4 0.82 1.655 8 202 192.5 184.7 90.2 1.04 2.136 10 - 240.9 195.2 94.1 1.23 2.567 12 - 297.3 212.7 101.3 1.40 2.938 14 - 351.8 227.5 107.5 1.55 3.279 16 - 411.5 237.9 111.9 1.73 3.6810 18 - 481.5 250.2 115.6 1.92 4.1711 20 - 557.8 268.5 121.2 2.08 4.60
Fig. 20. One-phase nonlinear physics-based MMC circuit execution time andspeedup comparison.
since the run time of the CPU program was close to that ofSaberRD according to existing numbers, the CPU results wereused to calculate speedup replacing the incomplete results ofSaberRD for circuits with more than eight SMs. As shown inFig. 20, although the speedup rates of GPUs are less than 1 whenthe levels of MMC circuits are lower, up to 4 for GPUK and 2for GPUP , they increase steadily along with the circuit scale,which approach 2.08 for GPUK and 4.60 for GPUP . Benefittingfrom the higher clock and memory data rate listed in Table III,GPUP doubles the speedup of GPUK .
Although some speedups are shown in the simulations ofsingle-phase MMC circuits, they are not representative of thefull computational capability of GPUs for MMC simulation. Inorder to feed more data to the GPUs to realize their computepotential, 3-phase physics-based MMC circuits from 2-levelto 11-level are simulated for 100 ms with the variable time-stepping method, whose execution time is listed in Table V. Forthree-phase systems, SaberRD is difficult to converge beyondtwo-level MMC circuit, and therefore, it is absent from the com-parison. In addition, two Pascal GPU (GTX 1080) are involvedin the competition since there is sufficient compute load to beprocessed. When the 3-phase MMC circuit reaches 11-level with60 SMs, the acceleration of GPUs is obvious, which is 5.61 forGPUK and 9.58 for GPUP . Moreover, the 2-GPUP platformobtains the speedup of up to 14.67, which almost triples theperformance of Kepler GPU, as shown in Fig. 21. The specificstructure of MMC circuit consisting of upper and lower arms
TABLE VCPU AND GPU EXECUTION TIMES OF THREE-PHASE NONLINEAR
PHYSICS-BASED MMC CIRCUIT
NL NS M Execution time (s) Speedup
CPU GPUK GPUP 2GPUP GPUK GPUP 2GPUP
2 6 150.8 172.2 82.4 79.9 0.88 1.83 1.893 12 285.9 185.9 93.4 83.2 1.54 3.06 3.444 18 423.5 206.7 103.2 88.7 2.05 4.10 4.775 24 600.3 227.2 115.9 95.8 2.64 5.18 6.276 30 774.1 251.5 129.2 101.5 3.08 5.99 7.637 36 991.1 286.2 148.4 109.2 3.46 6.68 9.088 42 1249.2 302.3 164.2 117.8 4.13 7.61 10.609 48 1495.4 322.9 182.3 127.2 4.63 8.20 11.7610 54 1782.4 357.2 202.5 134.4 4.99 8.80 13.2611 60 2137.3 381.1 223.2 145.7 5.61 9.58 14.67
Fig. 21. Three-phase nonlinear physics-based MMC circuit execution timesand speedup comparison.
TABLE VICPU AND GPU EXECUTION TIMES OF THREE-PHASE BEHAVIOR-BASED
MMC CIRCUIT
NL NS M NI G B T Execution time (s) Speedup
CPU GPUK GPUP 2GPUP GPUK GPUP 2GPUP
11 60 120 37.2 35.5 14.1 14.3 1.05 2.64 2.6021 120 240 67.4 35.9 14.4 14.5 1.88 4.68 4.6541 240 480 129.2 36.2 14.5 14.6 3.57 8.91 8.8551 300 600 156.0 36.6 14.6 14.7 4.26 10.68 10.6181 480 960 245.8 38.2 14.9 14.9 6.43 16.50 16.50101 600 1200 302.8 40.9 16.4 15.1 7.40 18.46 20.05126 750 1500 376.2 41.9 16.8 15.4 8.98 22.39 24.43151 900 1800 453.2 43.1 17.2 15.8 10.52 26.35 28.68176 1050 2100 526.2 46.2 18.8 16.8 11.39 27.99 31.32201 1200 2400 595.6 47.3 19.0 17.1 12.59 31.35 34.83251 1500 3000 755.2 49.7 20.1 17.6 15.20 37.57 42.91301 1800 3600 894.5 53.2 21.4 18.5 16.81 41.80 48.35401 2400 4800 1188.2 56.1 23.2 19.9 21.18 51.22 59.71501 3000 6000 1495.6 62.5 25.5 21.6 23.93 58.65 69.24
simplifies the task management and the computational load bal-ancing of 2-GPUP platform, and only the data of one commonnode need to be exchanged between the two MMC arms.
Since the Jacobian matrix is prone to singularity as the volt-age levels increase, the nonlinear physics-based MMC circuitsare hard to converge; therefore, the MMC circuits larger than11-level are simulated with behavior-based model from 11-levelto 501-level for 1 s with a 10 μs fixed time-step method, whose
0 50 100 150 200 250 300 350 400 450 5000
250
500
750
1000
1250
1500
Voltage level number of MMC circuits
Exe
cutio
ntim
e(s
)
CPU timeGPUK timeGPUP time
2GPUP time
0
10
20
30
40
50
60
70
Spe
edup
CPU timeGPUK timeGPUP time
2GPUP timeGPUK speedupGPUP speedup
2GPUP speedup
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
ConclusionPower Electronic Circuit Simulation
System-level and device-level simulations provide differentfocus in power electronic circuit simulation.
Adopting nonlinear device-level models to constructlarge-scale converter circuits gives an opportunity to studydetailed device interactions while simultaneously addressingthe challenge of large-scale system solution.
The cost is high computational burden due to complex systemmodel. Sequential implementation results in higher executiontimes, with the growth of converter levels, which makes thesimulation impractical.
Massively parallel algorithms and GPU implementation ofdevice-level models and corresponding numerical solversprovides significant acceleration while improving convergenceand accuracy.
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Thank you!
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
IGBT Model Equations
Currents:Steady-state collector current:
icss =iT
1 + b+
4bDpQ
(1 + b)W 2, (22)
The anode current:
iT =vaerb. (23)
Base resistance:
rb =
{W
qµnANBveb ≤ 0
WqµeffAneff
veb > 0, (24)
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
BJT current:
ibss =Q
τHL+
4Q2N2scl isne
Q2Bn
2i
, (25)
MOSFET current:
imos =
0 vgs < vT
Kp(vgs − vT )vds − Kpv2ds
2 vds ≤ vgs − vTKp(vgs−vT )2
2 vds > vgs − vT
, (26)
Avalanche multiplication current:
imult = (M − 1)(imos + icss + ic cer ) + Migen, (27)
Charges and Capacitances:Gate-source charge:
Qgs = Cgsvgs . (28)
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Gate-drain capacitance:
Cgd =
{Coxd vds ≤ vgs − vTdCgdjCoxd
Cgdj+Coxdvds > vgs − vTd
, (29)
Gate-drain overlap depletion capacitance:
Cgdj =AgdεsiWgdj
, (30)
Gate-Drain charge:
Qgd =
Coxdvdg vds ≤ vgs − vTd
qNBεsiA2gd
Coxd[CoxdWgdj
εsiAgd−
ln(1 +CoxdWgdj
εsiAgd)]− vds > vgs − vTd
CoxdvTd
. (31)
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Drain-source depletion capacitance:
Cdsj =(A− Agd)εsi
Wdsj, (32)
Drain-source depletion charge:
Qds = Ads
√2εsi (vds + 0.6)qNscl . (33)
Emitter-base capacitance Ceb is solved from ∂Qeb∂Veb
as
Ceb = −qNBεsiA2
Q − Qbi. (34)
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Collector-emitter redistribution capacitance:
Ccer =QCbcj
3QB, (35)
The carrier multiplication charge and capacitance:
Qmult = (M − 1)Qce and Cmult = (M − 1)Ccer . (36)
in+1mos =inmos eq + gn
mos gsvn+1gs + gn
mos dsvn+1ds , (37)
in+1T =inTeq + gn
Taevn+1ae + gn
Tbcvn+1bc + gn
Tebvn+1eb , (38)
in+1css =incss eq + gn
css bcvn+1bc + gn
css aevn+1ae
+ gncss ebv
n+1eb , (39)
in+1bss =inbss eq + inbss ebv
n+1eb + gn
bss bcvn+1bc , (40)
in+1mult =inmult eq + gn
mult dsvn+1ds + gn
mult dsvn+1ds
+ gnmult aev
n+1ae + gn
mult ebvn+1eb . (41)
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Parallel N-R
To find the solution of a nonlinear system F(X ) consisting of kunknown variables, utilizing the Jacobian matrix JF (X ) for linearapproximation gives the following equation,
0 = F(Xn) + JF (Xn)(Xn+1 − Xn). (42)
The Jacobian matrix JF (X) is a k × k matrix of first-order partialderivatives of F given as:
JF =dF
dX=
∂F1
∂X1· · · ∂F1
∂Xk...
. . ....
∂Fk∂X1
. . .∂Fk∂Xk
. (43)
Solving for the root of F(X ) is numerically replaced by solving (42)for (Xn+1 − Xn) and updating Xn+1 from Xn.
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Solution process is repeated until ||Xn+1 − Xn|| is converged.According to KCL,
ΣI(V) = 0. (44)
Applying (42) to (44) results in
JVn(Vn+1 − Vn) = −ΣIn. (45)
The voltage vector for the solution of KCL at each node is given as:
−Vn = [v c (n) vg (n) va(n) vd(n)
v e (n)]T , (46)
Applying the N-R iteration results in,
GIGBT(n)(Vn+1 − Vn) = −In, (47)
where the conductance matrix GIGBT(n)is the Jacobian matrix
JVn, and the right hand side vector is given as
−In = [ic (n) ig (n) ia(n) id(n)
ie (n)]T . (48)
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Massive-Thread Parallel N-R
Global Memory
CU1
CU2
CUn
...
ΔVn≤ ε?
NoYes
Kernel0
JM1
JM2
JMn
...
Kernel1
Vn
Vn
In
ΔVn VU1
VU2...
Kernel3ΔVn
Solve matrix equation V lt d tGn
Vn
In
Gn
In
Gn
VnVUn
VnGaussian1
Gaussian2...
Kernel2
Shared memory
Gaussiann
Voltage difference ΔVnJacobian matrxi Gn
Sum of current at each node In
Voltage vector Vn
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Relaxation Based Jacobian Update
Although the sparse matrix G origSM is irregular, it can be reshaped by
eliminating two elements, G6,11 and G11,6 brought about by thecapacitance, using the relaxation algorithm. After thetransformation, the linearized equation of a SM is given as
Gorig(n)SM ·∆Vn+1 = −Iorig(n), (51)
is updated toGn
SM ·∆Vn+1 = −In, (52)
and In comes from Iorig(n) whose 6th and 11th elements areadjusted by G6∆vn6 and G11∆vn11 with known values from previousiteration. Although there is an outer Gauss-Jacobi loop over theN-R iterations to guarantee the convergence of ∆vn, the updatedGSM results in a better bordered-block pattern, which benefitsparallel implementation on GPU.
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
MMC Sub-Module Conductance
4668 IEEE TRANSACTIONS ON POWER ELECTRONICS, VOL. 33, NO. 6, JUNE 2018
Fig. 9. Massive-thread parallel implementation of the Newton–Raphsonmethod.
the dq voltages, the abc phase reference voltages are calculatedfor modulation. In the capacitor voltage averaging and balancedcontrol, as shown in Fig. 8(b), the averaging control signal VA
ensures that the average voltage value of all the capacitors ineach phase track the command value; and the balancing controlsignal VB employs the phase-shifted carrier signals to force thedc voltage of each capacitor to follow the reference value.
D. Massive-Thread Parallel Implementation of theNewton–Raphson Method
1) Algorithm: To find the solution of a nonlinear systemF (X) consisting of k unknown variables, utilizing the Jacobianmatrix JF (X) for linear approximation gives the followingequation:
0 = F (Xn ) + JF (Xn )(Xn+1 − Xn ). (44)
The Jacobian matrix JF (X) is a k × k matrix of first-orderpartial derivatives of F given as
JF =dF
dX=
⎡⎢⎢⎢⎢⎣
∂F1
∂X1· · · ∂F1
∂Xk...
. . ....
∂Fk
∂X1. . .
∂Fk
∂Xk
⎤⎥⎥⎥⎥⎦
. (45)
Solving for the root of F (X) is numerically replaced by solv-ing (44) for (Xn+1 − Xn ) and updating Xn+1 from Xn Thesolution process is repeated until the difference ||Xn+1 − Xn ||is converged.
According to KCL,
ΣI(V ) = 0 (46)
where I(V ) refers to the sum of currents leaving each node.Applying (44) to (46) results in
JVn (V n+1 − V n ) = −ΣIn . (47)
In the nonlinear physics-based IGBT model, the voltage vec-tor for the solution of KCL at each node of collector, gate, anode,drain, and emitter is given as
−V n = [vc (n) vg (n) va (n) vd (n)ve (n) ]T .
Applying the Newton–Raphson iterative equation results in
GIGBT (n)(V n+1 − V n ) = −In (48)
where the conductance matrix GIGBT (n)is the Jacobian matrix
JVn , and the right-hand side vector is given as
−In = [ic (n) ig (n) ia (n) id(n)
ie (n) ]T .
See (49) and (50) shown at the bottom of this page.
GorigSM =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
G1,1 G1,2 G1,3 G1,4 G1,5 G1,6 G1,7 G1,8 G1,9 G1,10 G1,11
G2,1 G2,2 G2,3 G2,4 G2,5 G2,6
G3,1 G3,2 G3,3 G3,4 G3,5 G3,6
G4,1 G4,2 G4,3 G4,4 G4,5 G4,6
G5,1 G5,2 G5,3 G5,4 G5,5 G5,6
G6,1 G6,2 G6,3 G6,4 G6,5 G6,6 G6,11
G7,1 G7,7 G7,8 G7,9 G7,10 G7,11
G8,1 G8,7 G8,8 G8,9 G8,10 G8,11
G9,1 G9,7 G9,8 G9,9 G9,10 G9,11
G10,1 G10,7 G10,8 G10,9 G10,10 G10,11
G11,1 G11,6 G11,7 G11,8 G11,9 G11,10 G11,11
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
(49)
GSM =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
G1,1 G1,2 G1,3 G1,4 G1,5 G1,6 G1,7 G1,8 G1,9 G1,10 G1,11
G2,1 G2,2 G2,3 G2,4 G2,5 G2,6
G3,1 G3,2 G3,3 G3,4 G3,5 G3,6
G4,1 G4,2 G4,3 G4,4 G4,5 G4,6
G5,1 G5,2 G5,3 G5,4 G5,5 G5,6
G6,1 G6,2 G6,3 G6,4 G6,5 G6,6
G7,1 G7,7 G7,8 G7,9 G7,10 G7,11
G8,1 G8,7 G8,8 G8,9 G8,10 G8,11
G9,1 G9,7 G9,8 G9,9 G9,10 G9,11
G10,1 G10,7 G10,8 G10,9 G10,10 G10,11
G11,1 G11,7 G11,8 G11,9 G11,10 G11,11
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
(50)
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Partial LU Decomposition for MMC
For a k-SM MMC circuit, each cascade node connects to twononlinear IGBT-Diode units. Therefore, the complete structure ofthe Jacobian matrix GMMC is shown in Fig. 41(a).
YAN et al.: LARGE-SCALE NONLINEAR DEVICE-LEVEL POWER ELECTRONIC CIRCUIT SIMULATION ON MASSIVELY PARALLEL GRAPHICS 4669
Fig. 10. Jacobian matrices for the MMC circuit: (a) Original GMMC; (b) updated G∗MMC using relaxation.
2) Massive-Thread Parallel Implementation: In themassive-thread parallel module for the Newton–Raphsonmethod, four kernels are involved, as shown in Fig. 9 andAlgorithm 3. First, Kernel 0 and Kernel1 update In , whichequals the sum of currents leaving the nodes, and the Jacobianmatrix JF (Xn ), which is the conductance matrix with currentunits (CUs) and Jacobian matrix units (JMs), respectively.All parameters are copied to shared memory for Kernel2 tosolve (44) by Gaussian elimination method. Then, V n+1 isupdated and verified if it satisfies the convergence criteriain Kernel3 with voltage units (VUs). The iterations continueuntil V is converged or the maximum number of iterations isreached.
E. Block Jacobian Matrix Decomposition for MMC
Applying the physics-based IGBT and power diode modulementioned above, each SM has five nodes for the IGBT andone more inside node for the power diode, which results in theJacobian matrix (the original G matrix for the SM) (49).
1) Matrix Update Using a Relaxation Algorithm: Althoughthe sparse matrix Gorig
SM is irregular, it can be reshaped byeliminating two elements, G6,11 and G11,6 brought aboutby the capacitance, using the relaxation algorithm. After the
transformation, the linearized equation of an SM is given as
Gorig(n)SM · ΔV n+1 = −Iorig(n) (51)
is updated to
GnSM · ΔV n+1 = −In (52)
where GSM is given in (50) and In comes from Iorig(n) whose6th and 11th elements are adjusted by G6Δvn
6 and G11Δvn11
with known values from previous iteration. Although there is anouter Gauss–Jacobi loop over the Newton–Raphson iterationsto guarantee the convergence of Δvn , the updated GSM resultsin a better bordered-block pattern, which benefits to the parallelimplementation on the GPU.
2) Partial LU Decomposition for MMC: For a k-SM MMCcircuit, each cascade node connects to two nonlinear IGBT-diode units. Therefore, the complete structure of the Jacobianmatrix GMMC is shown in Fig. 10(a). The connections on Node1 and Node 11 of every GSM are relaxed to decouple the largeJacobian matrix. Thus, the updated Jacobian matrix G∗
MMC isshown in Fig. 10(b), and the Newton–Raphson linearized equa-tion, given as
GnMMC · ΔV n+1 = −In (53)
The connection on Node 1 and Node 11 of every GSM are relaxedto decouple the large Jacobian matrix. Thus, the updatedJacobian matrix G∗
MMC is shown in Fig. 41(b), and theNewton-Raphson linearized equation, given as
GnMMC ·∆Vn+1 = −In, (53)
is also updated as
G∗(n)MMC ·∆Vn+1 = −I∗(n), (54)
where −I∗(n) is −In adjusted by previous iteration values.Processing the LU decompositions for all A matrices in G∗
MMC , weobtain
lj · uj = Aj (j = 1, 2, · · · , 2k). (55)
Then, the border vectors, f j and gj in Fig. 41, can be foundaccording to the following relations:
{f j · uj = cj
lj · gj = dj. (56)
4670 IEEE TRANSACTIONS ON POWER ELECTRONICS, VOL. 33, NO. 6, JUNE 2018
Fig. 11. Partial LU decomposition for G∗M M C .
Fig. 12. Blocked linear MMC system solver: (a) Blocked forward substitution; (b) blocked backward substitution.
is also updated as
G∗(n)MMC · ΔV n+1 = −I∗(n) (54)
where −I∗(n) is −In adjusted by previous iteration values.Processing the LU decompositions for all A matrices in G∗
MMC ,we obtain
lj · uj = Aj (j = 1, 2, · · · , 2k). (55)
Then, the border vectors, f j and gj in Fig. 11, can be foundaccording to the following relations:
{f j · uj = cj
lj · gj = dj. (56)
Since lj and uj are lower and upper triangular matri-ces, f j and gj can be solved with forward and backwardsubstitutions. Finally, the elements at the connecting nodes,hi (i = 1, 2, . . . , k) in Fig. 11, are calculated with ei in Fig. 10as
hi = ei − f 2i−1 · g2i−1 − f 2i · g2i . (57)
The proposed partial LU decomposition can be computed inparallel due to the decoupled structure of updated Jacobian ma-trix G∗
MMC , including the LU decomposition of Aj , calcula-tion of border vectors f j , gj , and updating of connecting nodeelements hi .
3) Blocked Forward and Backward Substitutions: After ob-taining the semilower and upper triangular matrices utilizingpartial LU decomposition, blocked forward and backward sub-stitutions are utilized to obtain the solution of the linear system.For the equation LU · x = b, defining
U · x = y (58)
we obtain
U · y = b. (59)
The blocked forward substitution shown in Fig. 12(a) is usedto solve for y in (59). All elements of y2i−1 and most elementsof y2i except for the last one can be solved directly; then, the lastelement of previous block y2i−2 can be obtained with solved el-ements in y2i−1 and y2i ; and so forth, the missing element of y2i
can be obtained from the solution of the next block. Fortunately,y2k in the last block can be fully solved since it has no over-lap border vector f . Because of the decoupled structure of L,the blocked forward substitution can be accomplished by GPU-based parallelism. Similarly, the solution of (58) can be obtainedby the blocked backward substitution shown in Fig. 12(b) withyj . First, all connecting elements hi can be solved in parallel.Second, the border vector gj can be eliminated by updating theright-hand side vector. Finally, backward substitution is imple-mented on all blocked U j in parallel.
Since lj and uj are lower and upper triangular matrices, f j and gjcan be solved with forward and backward substitutions. Lastly, theelements at the connecting nodea, hi (i = 1, 2, · · · , k) in Fig. 41,are calculated with ei in Fig. 41, as
hi = ei − f2i−1 · g2i−1 − f2i · g2i . (57)
The proposed partial LU decomposition can be computed inparallel due to the decoupled structure of updated Jacobian matrixG∗
MMC , including the LU decomposition of Aj , calculation ofborder vectors f j , gj , and updating of connecting node elements hi .
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Blocked Forward and Backward Substitutions
For the equation LU · x = b, defining
U · x = y, (58)
we obtainU · y = b. (59)
4670 IEEE TRANSACTIONS ON POWER ELECTRONICS, VOL. 33, NO. 6, JUNE 2018
Fig. 11. Partial LU decomposition for G∗M M C .
Fig. 12. Blocked linear MMC system solver: (a) Blocked forward substitution; (b) blocked backward substitution.
is also updated as
G∗(n)MMC · ΔV n+1 = −I∗(n) (54)
where −I∗(n) is −In adjusted by previous iteration values.Processing the LU decompositions for all A matrices in G∗
MMC ,we obtain
lj · uj = Aj (j = 1, 2, · · · , 2k). (55)
Then, the border vectors, f j and gj in Fig. 11, can be foundaccording to the following relations:
{f j · uj = cj
lj · gj = dj. (56)
Since lj and uj are lower and upper triangular matri-ces, f j and gj can be solved with forward and backwardsubstitutions. Finally, the elements at the connecting nodes,hi (i = 1, 2, . . . , k) in Fig. 11, are calculated with ei in Fig. 10as
hi = ei − f 2i−1 · g2i−1 − f 2i · g2i . (57)
The proposed partial LU decomposition can be computed inparallel due to the decoupled structure of updated Jacobian ma-trix G∗
MMC , including the LU decomposition of Aj , calcula-tion of border vectors f j , gj , and updating of connecting nodeelements hi .
3) Blocked Forward and Backward Substitutions: After ob-taining the semilower and upper triangular matrices utilizingpartial LU decomposition, blocked forward and backward sub-stitutions are utilized to obtain the solution of the linear system.For the equation LU · x = b, defining
U · x = y (58)
we obtain
U · y = b. (59)
The blocked forward substitution shown in Fig. 12(a) is usedto solve for y in (59). All elements of y2i−1 and most elementsof y2i except for the last one can be solved directly; then, the lastelement of previous block y2i−2 can be obtained with solved el-ements in y2i−1 and y2i ; and so forth, the missing element of y2i
can be obtained from the solution of the next block. Fortunately,y2k in the last block can be fully solved since it has no over-lap border vector f . Because of the decoupled structure of L,the blocked forward substitution can be accomplished by GPU-based parallelism. Similarly, the solution of (58) can be obtainedby the blocked backward substitution shown in Fig. 12(b) withyj . First, all connecting elements hi can be solved in parallel.Second, the border vector gj can be eliminated by updating theright-hand side vector. Finally, backward substitution is imple-mented on all blocked U j in parallel.
AMPS Panel Session IEEE PES GM, Atlanta, 2019.
Introduction GPU Specs Massive-Thread Parallel Modules for Power Electronic Circuit Simulation Case Studies and Data Analysis Conclusion
Predictor-Corrector Variable Time-Stepping Scheme
Reset Δt
Start: initialize t, Δt, LTE
Update t
t_last discontinuous
point?
Forward Euler
approximation
for V_pred Corr
ecto
r N
-R S
chem
eUpdate error between
latest two iteration
error<tolerance?No
Yes
No
t discontinuous point?
No Yes
Corr
ecto
r N
-R S
chem
e
Update error between
latest two iteration
error<tolerance?No
Update LTE
Yes Yes
LTE<tolerance?
Reduce Δt
t = t_end?
Update t_last
Increase Δt
End
Yes
No
No
Yes
Backward Euler
approximation for V_corr
Pre
dic
tor
Sch
eme
Trapezoidal method
approximation for V_corr
AMPS Panel Session IEEE PES GM, Atlanta, 2019.