-
Distribution Power Flow Model
• Forward / Backward Sweep
• Small complex Matrix-Vector Mult
Monte Carlo Simulation:
Distribution System Probabilistic Monitoring System
Problems from Power System
• Uncertainties in electric power distribution system:
• Wind power, DG, PHEV, responsive load, etc.
• Lack of real time measurements:
• AMI, Smart metering may help, but still in preliminary stage.
• Probabilistic Model for Uncertainties:
• Wind forecast model, EV pattern, Load profile.
• A Probabilistic Solution of Distribution System
Advances in High Performance Computing
• Fully utilizing the computing power is a difficult problem
• NO free speedup, parallelism, memory , special instructions
• Require knowledge in specific domain & computer architecture
• Potential Benefit: Moore’s law!
High Performance Computing Enabled Power System Solution
• Monte Carlo based Probabilistic Solution of Distribution System:
• “Golden standard” for probabilistic power flow
• Well fitted problem on modern computing platform
• Extensible to contingency analysis, steady state time-series…
An Affordable Supercomputing Center for Distribution Substation
Performance Result: High Performance Computing Engine
Monte Carlo Results and Web User Interface
Motivation and Background
A Multi-core High Performance Computing Framework for
Probabilistic Solutions of Distribution Systems
Tao Cui and Franz Franchetti Email: [email protected]
Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA
Methods & System Latest Results
Conclusions & Future Work
Workstation + GPU accelerator Core i7 + Tesla C1060 1Tflop/s peak performance $5,000 class
Image: Nvidia
Image: Dell
+
Year 2010:
Program optimization / parallelization:
• Enable fast computation of large amount of power flow.
Performance can be further increased on new platform:
• Tracking new development in CPU micro-architecture.
• GPU: small, less powerful but many more cores.
Applications of fast distribution power flow solver:
• Probabilistic monitor of distribution system
• Fast time series solution; statistical analysis…
Active Role of Distribution System in Smart Grid
Input with Randomness
Physically Deterministic Method: Load Flow
Output result with probability features
Algorithm
y
x
*Original figure by Andrew Hsu (EESG)
abc abc abcn m m
abc abc abcm n m
I c V d I
V A V B I
Aggressive Code Optimizations on Monte Carlo Solver
• Data structure optimization • Algorithm level optimization
• SIMD vectorization • Multithreading:
Squeezing Computation Power out of the Computer Architecture.
Push Performance to the Hardware Peak.
0.01
0.1
1
10
4 16 64 256
Speed (Gflops) C++STL Speed Optimized C++STL Intel Fully Optimized
C Array C Pattern
SSE SSE+Pthread
Node (Bus) Numbers
~500x
>40x
Performance 4 core Core2Extreme 2.66 GHz
RNG & Load Flow in Buf AN
RNG & Load Flow in Buf A2
RNG & Load Flow in Buf A1
Real Time Interval
(SCADA Interval)
Scheduling Thd 0
Computing Thd 2
Computing Thd 1
Computing Thd N
Sync Signal
KDE in all Buf Bs Result Out
RNG & Load Flow in Buf BN
RNG & Load Flow in Buf B2
RNG & Load Flow in Buf B1
Sync
Signal Out
Switch Buffer A,B
KDE in all Buf As Result Out
RNG: Random Number Generator
KDE: Kernel Density Estimation
0
20
40
60
80
100
120
140
160
180
Core2Extreme
2.66GHz(4-core, SSE)
Xeon X5680
3.33GHz(6-core, SSE)
Core i5-2400
3.10GHz(4-core, AVX)
2 Xeon7560
2.27GHz(16-core, SSE)
Speed (Gflop/s)
Optimized Scalar
SIMD (SSE or AVX)
Multi-Core
Performance on Different Machines for IEEE37 Testfeeder
Load Flow Solver
Load Flow Solver
Load Flow Solver
… 2~64 parallel
threads ...
0
0.5
1
0
0.5
1
0
0.5
1
Generate Random Samples Solve Load Flows Estimate Density
-10 -5 0 5 10 150
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Problem Size
(IEEE Test Feeders)
Approx.
flops
Approx. Time /
Core2 Extreme
Approx. Time
/ Core i5
Baseline. C++ ICC –o3
(~300x faster then Matlab)
Comments
IEEE37: one iteration 12 K ~ 0.3 us ~ 0.3 us
IEEE37: one load flow (5 Iter) 60 K ~ 1.5 us ~ 1.5 us 0.01 kVA error
IEEE37: 1 million load flow 60 G ~ < 2 s ~ < 1 s ~ 60 s (>5 hrs Matlab) SCADA Interval:
4 secondsIEEE123: 1 million load flow 200 G ~ < 10 s ~ < 3.5 s ~ 200 s (>15 hrs Matlab)
Translate speed (Gflop/s) into run time:
0.8 0.9 1
0
5
10
15
20
100 run
Ph
ase
A
0.98 1 1.020
20
40
60
80
100
Ph
ase
B
0.93 0.94 0.950
50
100
150
Ph
ase
C
voltage, (p.u.)
0.8 0.9 1
0
5
10
15
20
1000 run
0.98 1 1.020
20
40
60
80
100
0.93 0.94 0.950
50
100
150
voltage, (p.u.)
0.8 0.9 1
0
5
10
15
20
10000 run
0.98 1 1.020
20
40
60
80
100
0.93 0.94 0.950
50
100
150
voltage, (p.u.)
0.8 0.9 1
0
5
10
15
20
100000 run
0.98 1 1.020
20
40
60
80
100
0.93 0.94 0.950
50
100
150
voltage, (p.u.)
0.8 0.9 1
0
5
10
15
20
1000000 run
0.98 1 1.020
20
40
60
80
100
0.93 0.94 0.950
50
100
150
voltage, (p.u.)
0.8 0.9 1
0
5
10
15
20
10000000 run
0.98 1 1.020
20
40
60
80
100
0.93 0.94 0.950
50
100
150
voltage, (p.u.)
Out: Voltage on
Node 738
-400 -200 0 200 4000
1
2
3
4x 10
-3
Active power, kw
In: Active Power
P~ u=0,std=100kw
on Phase A of
Node 738,711,741
Web Server
@ ECE CMU
Web UIMonte Carlo solver
Multi-core Desktop
Computing ServerMeterMeterData Aggregator
Simulated Meters Using Computing Nodes
Web UI
Web UI
Web UI
Campus NetworkInternet
// input[0~2]: vector real part;
// input[3~5]: vector imaginary part;
// constant: parameters (r or c on left).
switch (matrix_pattern){
case real_diagonal_equal_matrix:
output[0] = *constant * input[0];
...
output[5] = *constant * input[5];
break;
case imag_diagonal_equal_matrix:
output[0] = -*constant * input[3];
output[1] = -*constant * input[4];
output[2] = -*constant * input[5];
output[3] = *constant * input[0];
output[4] = *constant * input[1];
output[5] = *constant * input[2];
break;
//cases for other frequent matrix patterns
...
default: //default full 3x3 complex matrix
...
}
r 0 0
0 r 0
0 0 r
i 0 0
0 i 0
0 0 i
c11 c12 c13c21 c22 c23c31 c32 c33
r: real number,
i: imaginary number
c: complex number
...
Matrix Pattern: Pseudo-code for Matrix-Vector Multiplication:
Code Generation (Spiral)
N0
N2
N3 N4 N5
N1
Load
Substation
Load Load Load
Load
pN0
pN1
pN2
pN3
pN4
Forward
Sweep
Pointer
Array
pN5
pN5
pN4
pN3
pN2
pN1
Backward
Sweep
Pointer
Array
pN0
Array Access
N5 Data:
Parameters
Current
Voltage
N4 Data:
Parameters
Current
Voltage
Consecutive
in Data Array
...
Sample FBS Load Flow Result
Sample FBS Load Flow Result
SIMD Instructions
Scalar Instructions
Vector Register:
4 floats in SSE
Scalar Register:
1 float
• This work is supported by NSF through awards 0931978 and 0702386. • Related Publications:
1. T. Cui, F. Franchetti, “A Multi-Core High Performance Computing Framework for Distribution Power Flow,” The 43rd North American Power Symposium (NAPS), Boston, USA, Aug 2011.
2. T. Cui, F. Franchetti, “A Multi-Core High Performance Computing Framework for Probabilistic Solutions of Distribution Systems,” IEEE PES General Meeting 2012, San Diego, CA, USA.
Acknowledgement & Publications