design and analysis of safety critical systemsseilercontrol/papers/... · • existing design...
TRANSCRIPT
AEROSPACE ENGINEERING AND MECHANICS
Design and Analysis of
Safety Critical Systems
Peter Seiler and Bin HuDepartment of Aerospace Engineering & Mechanics
University of Minnesota
September 30, 2013
AEROSPACE ENGINEERING AND MECHANICS
Uninhabited Aerial Systems (UAS)
2Agricultural Monitoring Emergency Response (NASA/JPL)
Public Safety (AeroVironment)Flight Research (UMN UAV Lab)
http://www.uav.aem.umn.edu/
AEROSPACE ENGINEERING AND MECHANICS
Design Challenges for Low-Cost UAS
3
Modeling/System Identification
Guidance and Controls
Human Factors
Safety Critical Software
Navigation
AEROSPACE ENGINEERING AND MECHANICS
Design Challenges for Low-Cost UAS
4
Systems Design and Reliability
AEROSPACE ENGINEERING AND MECHANICS
Recent Policy Changes
5
Increased reliabilityneeded to integrate UAS into the national airspace
AEROSPACE ENGINEERING AND MECHANICS
6
Outline
• Existing design techniques in commercial aviation
• Analytical redundancy is rarely used
• Certification issues
• Tools for Systems Design and Certification
• Motivation for model-based fault detection and isolation (FDI)
• Extended fault trees
• Stochastic false alarm and missed detection analysis
• Conclusions and future work
AEROSPACE ENGINEERING AND MECHANICS
7
Outline
• Existing design techniques in commercial aviation
• Analytical redundancy is rarely used
• Certification issues
• Tools for Systems Design and Certification
• Motivation for model-based fault detection and isolation (FDI)
• Extended fault trees
• Stochastic false alarm and missed detection analysis
• Conclusions and future work
AEROSPACE ENGINEERING AND MECHANICS
8
Commercial Fly-by-Wire
Boeing 787-8 Dreamliner
• 210-250 seats
• Length=56.7m, Wingspan=60.0m
• Range < 15200km, Speed< M0.89
• First Composite Airliner
• Honeywell Flight Control Electronics
Boeing 777-200
• 301-440 seats
• Length=63.7m, Wingspan=60.9m
• Range < 17370km, Speed< M0.89
• Boeing’s 1st Fly-by-Wire Aircraft
• Ref: Y.C. Yeh, “Triple-triple redundant
777 primary flight computer,” 1996.
AEROSPACE ENGINEERING AND MECHANICS
9
777 Primary Flight Control Surfaces [Yeh, 96]
• Advantages of fly-by-wire:
• Increased performance (e.g. reduced drag with smaller rudder), increased functionality (e.g. “soft” envelope protection), reduced weight, lower recurring costs, and possibility of sidesticks.
• Issues: Strict reliability requirements
• <10-9 catastrophic failures/hr
• No single point of failure
AEROSPACE ENGINEERING AND MECHANICS
10
Classical Feedback Diagram
Sensors
Primary
Flight
Computer
Pilot
Inputs Actuators
Reliable implementation of this classical
feedback loop adds many layers of complexity.
AEROSPACE ENGINEERING AND MECHANICS
11
Triplex Control System Architecture
Sensors
Primary
Flight
Computer
Column
ActuatorControl
Electronics
Pilot
Inputs
Each PFC votes on redundant
sensor/pilot inputs
Each ACE votes on redundant
actuator commands
All data communicated
on redundant data buses
Actuators
AEROSPACE ENGINEERING AND MECHANICS
12
777 Triple-Triple Architecture [Yeh, 96]
Sensors
x3
Databus
x3
Triple-Triple
Primary Flight
Computers
Actuator Electronics
x4
AEROSPACE ENGINEERING AND MECHANICS
13
777 Triple-Triple Architecture [Yeh, 96]
Sensors
x3
Databus
x3Actuator Electronics
x4
Left PFC
INTEL
AMD
MOTOROLA
Triple-Triple
Primary Flight
Computers
AEROSPACE ENGINEERING AND MECHANICS
14
Redundancy Management
• Main Design Requirements:
• < 10-9 catastrophic failures per hour
• No single point of failure
• Must protect against random and common-mode failures
• Basic Design Techniques
• Hardware redundancy to protect against random failures
• Dissimilar hardware / software to protect against common-mode failures
• Voting: To choose between redundant sensor/actuator signals
• Encryption: To prevent data corruption by failed components
• Monitoring: Software/Hardware monitoring testing to detect latent faults
• Operating Modes: Degraded modes to deal with failures
• Equalization to handle unstable / marginally unstable control laws
• Model-based design and implementation for software
AEROSPACE ENGINEERING AND MECHANICS
15
Redundancy Management
• Main Design Requirements:
• < 10-9 catastrophic failures per hour
• No single point of failure
• Must protect against random and common-mode failures
• Basic Design Techniques
• Hardware redundancy to protect against random failures
• Dissimilar hardware / software to protect against common-mode failures
• Voting: To choose between redundant sensor/actuator signals
• Encryption: To prevent data corruption by failed components
• Monitoring: Software/Hardware monitoring testing to detect latent faults
• Operating Modes: Degraded modes to deal with failures
• Equalization to handle unstable / marginally unstable control laws
• Model-based design and implementation for software
AEROSPACE ENGINEERING AND MECHANICS
16
Outline
• Existing design techniques in commercial aviation
• Analytical redundancy is rarely used
• Certification issues
• Tools for Systems Design and Certification
• Motivation for model-based fault detection and isolation (FDI)
• Extended fault trees
• Stochastic false alarm and missed detection analysis
• Conclusions and future work
AEROSPACE ENGINEERING AND MECHANICS
Analytical Redundancy
17
Small UASs cannot support the weight
associated with physical redundancy.
Approach: Use model-based or data-
driven techniques to detect faults.
Parity-equation architecture (Wilsky)
AEROSPACE ENGINEERING AND MECHANICS
Analytical Redundancy
18
Small UASs cannot support the weight
associated with physical redundancy.
Approach: Use model-based or data-
driven techniques to detect faults.
Research Objectives:
• Hardware, models, data
(Freeman, Balas)
• Advanced filter design
• Tools for systems design,
analysis and certification
Parity-equation architecture (Wilsky)
AEROSPACE ENGINEERING AND MECHANICS
Analytical Redundancy
19
Small UASs cannot support the weight
associated with physical redundancy.
Approach: Use model-based or data-
driven techniques to detect faults.
Research Objectives:
• Hardware, models, data
(Freeman, Balas)
• Advanced filter design
• Tools for systems design, analysis and certification
Parity-equation architecture (Wilsky)
AEROSPACE ENGINEERING AND MECHANICS
Tools for Systems Design and Certification
20
Diagram Reference: R. Isermann.
Fault-Diagnosis Systems: An
Introduction from Fault Detection to
Fault Tolerance. Springer-Verlag, 2006.
AEROSPACE ENGINEERING AND MECHANICS
Tools for Systems Design and Certification
21
Why are new tools required?
Example: Fault Tree Analysis
Diagram Reference: R. Isermann.
Fault-Diagnosis Systems: An
Introduction from Fault Detection to
Fault Tolerance. Springer-Verlag, 2006.
AEROSPACE ENGINEERING AND MECHANICS
Fault Tree Analysis
22
AEROSPACE ENGINEERING AND MECHANICS
Fault Tree Analysis
23
Probability of hardware component
failure can be estimated from field data.
AEROSPACE ENGINEERING AND MECHANICS
Fault Tree Analysis
24
Probability of hardware component
failure can be estimated from field data.
Model-based fault detection introduces new failure
models (false alarms, missed detections, etc.)
AEROSPACE ENGINEERING AND MECHANICS
Extended Fault Tree Analysis
25
References
1. Aslund, Biteus, Frisk, Krysander,
and Nielsen. Safety analysis of
autonomous systems by extended
fault tree analysis. IJACSP, 2007.
2. Hu and Seiler, A Probabilistic
Method for Certification of
Analytically Redundant Systems,
SysTol Conference, 2013.
Incorporate failure modes due to false
alarms and missed detections (per hour)(Enumerate time-correlated failures and apply total
law of probability)
AEROSPACE ENGINEERING AND MECHANICS
Example: Dual-Redundant Architecture
Objective: Compute reliability of system assuming
sensors have a mean-time between failure of 1000Hrs.
26
)(ks
Switch
Fault DetectionLogic (FDI)
Primary
Sensor
Back-up
Sensor
)(1 km
)(2
km
)(kd )(ˆ km
AEROSPACE ENGINEERING AND MECHANICS
Failure Modes
27
MissedDetection, MN
FalseAlarm, FN
ProperDetection, DN
Early FalseAlarm, EN
TimeT1
PrimaryFails
T1+N0
MissedDetection
0 N
TimeTS
FalseAlarm
T2+N0
SystemFailure
0 NT2
BackupFails
TimeT1
PrimaryFails
T2+N0
SystemFailure
0 NTS
FailureDetected
T2
BackupFails
TimeT1
PrimaryFails
T2+N0
SystemFailure
0 NTS
FailureDetected
T2
BackupFails
AEROSPACE ENGINEERING AND MECHANICS
System Failure Rate
• Notation:
• Approximate system failure probability:
28
Sensor failure per hour
False alarm per hour
Detection per failure
AEROSPACE ENGINEERING AND MECHANICS
System Failure Rate
• Notation:
• Approximate system failure probability:
29
Primary sensor fails
+ missed detection
False alarm +
Backup sensor fails
Failure detected +
Backup sensor fails
Sensor failure per hour
False alarm per hour
Detection per failure
AEROSPACE ENGINEERING AND MECHANICS
System Failure Rate
• Notation:
• Approximate system failure probability:
30
Primary sensor fails
+ missed detection
False alarm +
Backup sensor fails
Failure detected +
Backup sensor fails
Sensor failure per hour
False alarm per hour
Detection per failure
Question: How canwe compute these probabilities?
AEROSPACE ENGINEERING AND MECHANICS
False Alarm Analysis
31
What is the conditional
probability of an alarm given
that no fault has occurred?
Abstraction: Discrete-
time uncertain linear
system driven by noise.
AEROSPACE ENGINEERING AND MECHANICS
Problem Formulation
32
(Healthy) Dynamics for residual
Simple Thresholding
Objective:Assume nk is a stationary Gaussian process and assume
known dynamic model for residuals.
Compute the probability PN that |rk| > T for some k in {1,…,N}.
AEROSPACE ENGINEERING AND MECHANICS
Problem Formulation
33
(Healthy) Dynamics for residual
Simple ThresholdingReferences
1. Glaz and Johnson. Probability inequalities for
multivariate distributions with dependence
structures. JASA, 1984
2. Hu and Seiler, Probability Bounds for False Alarm
Analysis of Fault Detection Systems, Allerton, 2013.
Theorem:There exist bounds γk (k=1,…,N) such that
1. γk ≥ PN
2. γk are monotonically non-increasing in k
3. γk requires evaluation of k-dim. Gaussian integrals
AEROSPACE ENGINEERING AND MECHANICS
Results: Effects of Correlation
34
False Alarm Probabilities and Bounds for N=360,000
For each (a,T), P1 = 10-11
which gives NP1=3.6 x 10-6
Neglecting correlationsis accurate for small a
…but not for a near 1.
kkkk fnarr ++=+1
≤
=else1
if0 Trd k
k
Residual Generation Decision Logic
AEROSPACE ENGINEERING AND MECHANICS
Worst-case False Alarm Probability
35
Issue:Model depends on unknown (uncertain) parameters, ∆ ϵ ∆∆∆∆.
Objective:Compute the worst-case false alarm probability
Main Result:Robust H2 analysis results can be used to compute worst-
case residual variance. This yields bounds on PN*.
Reference
Hu and Seiler, Worst-Case False Alarm Analysis of
Aerospace Fault Detection Systems, Submitted to
ACC, 2014.
AEROSPACE ENGINEERING AND MECHANICS
36
Conclusions
• Commercial aircraft achieve high levels of reliability.
• Analytical redundancy is rarely used (Certification Issues)
• Model-based fault detection methods are an alternative that
enables size, weight, power, and cost to be reduced.
• Tools for Systems Design and Certification
• Extended fault trees
• Stochastic false alarm and missed detection analysis
• Methods to validate analysis using flight test data (Hu and
Seiler, 2014 AIAA)
AEROSPACE ENGINEERING AND MECHANICS
Acknowledgments
• NASA Langley NRA NNX12AM55A: “Analytical Validation Tools for
Safety Critical Systems Under Loss-of-Control Conditions,”
Technical Monitor: Dr. Christine Belcastro
• Air Force Office of Scientific Research: Grant No. FA9550-12-
0339, "A Merged IQC/SOS Theory for Analysis of Nonlinear
Control Systems,” Technical Monitor: Dr. Fariba Fahroo.
• NSF Cyber-Physical Systems: Grant No. 0931931, “Embedded
Fault Detection for Low-Cost, Safety-Critical Systems,” Program
Manager: Theodore Baker.
37
AEROSPACE ENGINEERING AND MECHANICS
Backup Slides
38
AEROSPACE ENGINEERING AND MECHANICS
Dual-Redundant Architecture
Objective: Efficiently compute the probability PS,N that
the system generates “bad” data for N0 consecutive
steps in an N-step window.
39
)(ks
Switch
Fault DetectionLogic (FDI)
Primary
Sensor
Back-up
Sensor
)(1 km
)(2
km
)(kd )(ˆ km
AEROSPACE ENGINEERING AND MECHANICS
Assumptions
1. Knowledge of probabilistic performance
a. Sensor failures: P[ Ti=k ] where Ti := failure time of sensor i
b. FDI False Alarm: P[ TS≤N | T1=N+1 ]
c. FDI Missed Detection: P[ TS≥k+N0 | T1=k ]
2. Neglect intermittent failures
3. Neglect intermittent switching logic
4. Sensor failures and FDI logic decision are independent
• Sensors have no common failure modes.
40
AEROSPACE ENGINEERING AND MECHANICS
Failure Modes
41
MissedDetection, MN
FalseAlarm, FN
ProperDetection, DN
Early FalseAlarm, EN
TimeT1
PrimaryFails
T1+N0
MissedDetection
0 N
TimeTS
FalseAlarm
T2+N0
SystemFailure
0 NT2
BackupFails
TimeT1
PrimaryFails
T2+N0
SystemFailure
0 NTS
FailureDetected
T2
BackupFails
TimeT1
PrimaryFails
T2+N0
SystemFailure
0 NTS
FailureDetected
T2
BackupFails
AEROSPACE ENGINEERING AND MECHANICS
System Failure Probability
• Apply basic probability theory:
42
AEROSPACE ENGINEERING AND MECHANICS
System Failure Probability
• Apply basic probability theory:
• Knowledge of probabilistic performance
a. Sensor failures: P[ Ti=k ] where Ti := failure time of sensor i
43
AEROSPACE ENGINEERING AND MECHANICS
System Failure Probability
• Apply basic probability theory:
• Knowledge of probabilistic performance
a. Sensor failures: P[ Ti=k ] where Ti := failure time of sensor i
b. FDI False Alarm: P[ TS≤N | T1=N+1 ]
44
AEROSPACE ENGINEERING AND MECHANICS
System Failure Probability
• Apply basic probability theory:
• Knowledge of probabilistic performance
a. Sensor failures: P[ Ti=k ] where Ti := failure time of sensor i
b. FDI False Alarm: P[ TS≤N | T1=N+1 ]
c. FDI Missed Detection: P[ TS≥k+N0 | T1=k ]
45
AEROSPACE ENGINEERING AND MECHANICS
• Sensor Failures: Geometric distribution with parameter q
• Residual-based threshold logic
Example
46
)()()1( kfknkr +=+
≤
=else1
)( if0)(
Tkrkd
Residual Decision Logic
f is an additive fault
n is IID Gaussian noise, variance=σ
Threshold, T)(kd)(kr
)(1
km
)(ky
Fault Detection
Filter
AEROSPACE ENGINEERING AND MECHANICS
Example
• Per-frame false alarm probability can be easily computed
• Approximate per-hour false
alarm probability
47
[ ] ∫−
−===T
T
F drrpdP )(1Fault No 1(k) Pr
( )2
21
σT
F erfP −=
For each k, r(k) is N(0,σ2) :
5 10 15 20 25 300
0.5
1
1.5
2x 10
-3
Time Window, N
PF
A(N
)
PFA
(30) = 0.0019 for σ = 0.25
F
N
FS NPPNTNTP ≈−−=+=≤ )1(1]1|[ 1
Per-frame detection probability PD
can be similarly computed.
AEROSPACE ENGINEERING AND MECHANICS
System Failure Rate
• Notation:
• Approximate system failure probability:
48
Sensor failure per hour
False alarm per hour
Detection per failure
AEROSPACE ENGINEERING AND MECHANICS
System Failure Rate
• Notation:
• Approximate system failure probability:
49
Primary sensor fails
+ missed detection
False alarm +
Backup sensor fails
Failure detected +
Backup sensor fails
Sensor failure per hour
False alarm per hour
Detection per failure
AEROSPACE ENGINEERING AND MECHANICS
System Failure Rate
50
0 5 10 15 20
10-6
10-5
10-4
10-3
T/σ
PS
,N
f/σ=1
f/σ=6
f/σ=10
Sensor mean time between failure = 1000hr
and N=360000 ( = 1 hour at 100Hz rate)
AEROSPACE ENGINEERING AND MECHANICS
• Example analysis assumed IID fault detection logic.
• Many fault-detection algorithms use dynamical models
and filters that introduce correlations in the residuals.
• Question: How can we compute the FDI performance
metrics when the residuals are correlated in time?
• FDI False Alarm: P[ TS≤N | T1=N+1 ]
• FDI Missed Detection: P[ TS≥k+N0 | T1=k ]
Correlated Residuals
51
AEROSPACE ENGINEERING AND MECHANICS
False Alarm Analysis with Correlated Residuals
• Problem: Analyze the per-hour false alarm probability for a simple
first-order fault detection system:
• The N-step false alarm probability PN is the conditional probability
that dk=1 for some 1≤k≤N given the absence of a fault.
52
kkkk fnarr ++=+1
≤
=else1
if0 Trd k
k
Residual Generation (0<a<1) Decision Logic
Residuals are correlated in time due to filtering
f is an additive fault
n is IID Gaussian noise, variance=1
There are N=360000 samples per hour for a 100Hz system
∫ ∫− −
−=T
T
N
T
T
NRN drdrrrpP LL11
),...,(1
AEROSPACE ENGINEERING AND MECHANICS
False Alarm Analysis
• Residuals satisfy the Markov property:
• PN can be expressed as an N-step iteration of 1-
dimensional integrals:
53
∫
∫
∫
−
−
−−−−
−=
=
=
=
T
TN
T
T
T
TNNNNNNN
NN
drrprfP
drrrprfrf
drrrprfrf
rf
11111
2122211
111
)()(1
)()()(
)()()(
1)(
M
kkkk fnarr ++=+1 ( ) ( )kkkk rrprrrp 111 ,, ++ =L
( ) ( ) ( ) ( )111211 ,, rprrprrprrp kkkR ⋅= − LL
∫ ∫− −
−=T
T
N
T
T
NRN drdrrrpP LL11
),...,(1
This has the appearance of a power iteration ANx
AEROSPACE ENGINEERING AND MECHANICS
False Alarm Probability
• Theorem: Let λ1 be the maximum eigenvalue and ψ1
the corresponding eigenfunction of
Then where
• Proof
• This is a generalization of the matrix power iteration
• The convergence proof relies on the Krein-Rutman theorem
which is a generalization of the Perron-Frobenius theorem.
• For a=0.999 and N=360000, the approximation error is 10-156
54
1
1
−≈ N
N cP λ
∫−=T
Tdyxypyx )|()()( 111 ψψλ
Ref: B. Hu and P. Seiler. False Alarm Analysis of Fault Detection Systems with
Correlated Residuals, Submitted to IEEE TAC, 2012.
1,1 ψ=c