1 integrity service excellence anomalous events in non-destructive inspection data 18 dec 2012...
TRANSCRIPT
1
Integrity Service Excellence
Anomalous Events in Non-Destructive Inspection Data
18 Dec 2012
Jeremy S. Knopp
AFRL/RXCA
Air Force Research Laboratory
2
Disclaimer
• The views expressed in this presentation are those of
the author and do not reflect the official policy or
position of the United States Air Force, Department of
Defense, or the United States Government
3
Outline
• Historical Perspective of Aircraft Structural Integrity
Program (ASIP)
• Probability of Detection (POD)
• Nondestructive Evaluation System Reliability
Assessment Handbook (MIL-HDBK-1823A) Revision
• Research Objectives to Improve State-of-the-Art POD
Evaluation
4
Aircraft Management StrategiesSafe Life – No Periodic Inspection Required.
– Fly a certain number of hours and retire.– Considers the effects of cyclic loading on the airframe with full-scale fatigue test.– For example, testing to 40,000 hours ensures safe life of 10,000 hours.
• Used by US Navy.Damage Tolerance Assessment (DTA) – Periodic Inspection to Detect Damage
– Fly and inspect, reassess time to next inspection based on fatigue crack growth analysis, usage, and results of inspection.• Assumes imperfections are present in the early stages of aircraft service.• REQUIRES RELIABLE AND VALIDATED NDI• Used by US Air Force.
Condition-based Maintenance (CBM) – Periodic Inspection and/ or onboard monitoring to Characterize Damage.– Perform repairs only when needed.
• Will minimize maintenance costs.• Requires damage characterization, not just detection.• Desired by US Air Force to maximize availability of assets while minimizing
sustainment costs.Condition-based Maintenance (CBM+) – Periodic Inspection to Characterize Damage
– CBM plus prognosis to estimate capability and remaining life for optimal maintenance scheduling.
5
The USAF Aircraft Structural Integrity Program (ASIP)
• Provides the engineering discipline and management framework …
– associated with establishing and maintaining structural safety … – in the most cost-effective manner …– through a set of defined inspections, repairs, modifications and
retirement actions
• Based on a preventative maintenance strategy that starts in acquisition and continues until retirement
ASIP Processes involve engineers and managers working together to control the risks of structural failure
6
“Wright” approach to Structural Integrity
• Approach used by Wright brothers
began in 1903.
• Essentially the same approach used
by USAF for over 50 years.
• They performed stress analysis and
conducted static tests far in excess
of the loads expected in flight.
• Safety factor applied to forces that
maintained static equilibrium with
weight.
7
• Air Force Strategic Air Command lost
two B-47 Bombers on the same day!
• Metal fatigue caused the wings on two
aircraft to fail catastrophically in flight.
• Standard static test and abbreviated
flight load survey proved structure
would support at least 150% of its
design limit load.
• No assurance that structure would
survive smaller cyclic loads in actual
flight.
B-47 Experience, 1958
8
ASIP Initiated
• Aircraft Structural Integrity Program (ASIP) initiated on
12 Jun 1958 with 3 primary objectives:– Control structural fatigue in aircraft fleet.– Develop methods to accurately predict service life.– Establish design and testing methods to avoid structural
problems in future aircraft systems.
• Led to the “safe-life” approach.– Probabilistic approach to establishing the aircraft service life
capability.– Safe-life established by conducting a full-scale airframe fatigue
test and dividing the number of successfully test simulated flight hours by a scatter factor (usually 4).
9
F-111 Experience, 1969
• Wing separation at ~100 hours (safe-life qualified 4000
hours). Crack initiated from a manufacturing defect.• Two-phase program initiated.
• Phase 1 (allow operations at 80% of designed capability)– Material crack growth data collected to
develop flaw growth model.– Cold proof test to demonstrate that
critical size flaws not present in critical forgings
– Improved NDI for use in reinspection
• Phase 2 (allow operations at 100% of designed capability)– Incorporated NDI during production.– Used fracture mechanics to determine
inspection intervals.
10
Damage Tolerance Update, 1974
• In response to F-111 mishap, ASIP incorporated Damage
Tolerance requirements.– Objective was to prevent airframe failures resulting from the safe
life approach .
• ASIP provides 3 options to satisfy damage tolerance
requirement– Slow crack growth (most common option)– Fail-safe multiple load path– Fail-safe crack-arrest
• Primary basis for aircraft structure maintenance program
for last 30+ years.– Inspection requirements based on initial flaw assumptions (slow
crack growth) and NDI capability.
• Today - Inspection burden is increasing due to age of fleet!– NDE Research needed to reduce the future maintenance burden.
11
Evolution of Structural Integrity Approaches
Each change was made to enhance our ability to protect structural integrity (prevent structural failures)
Today, preventing structural failures requires anticipating events that ensure continuing airworthiness, reliability, availability, and cost-effectiveness
ASIP Approach Prevent Structural Failures
Cost-Effectively1950 1960 1970 1980 1990 2000 2010 2020
Prevent Static Load Failures
Prevent Fatigue Failures
Protect for Potential Damage
Risk Assessment/Management
Timeframe Associated with ASIP Approach
MIL-STD-1530C
12
USAF Structural Reliability
• USAF aircraft losses since 1971:
– 18 due to a structural failure
– 19 due to a structural failure that was caused by maintenance, pilot error, flight control failures, etc.
• Next chart plots overall USAF aircraft loss rate from
1947 – 2002 and structures contribution since 1971
– Overall loss rate calculated for each year (total losses per year / total fleet flight hours per year)
– Loss rate due to structures is cumulative since many years without losses due to structural failure
13
USAF Structural Reliability
USAF Aircraft Loss Rate (Destroyed Aircraft)
1.E-08
1.E-07
1.E-06
1.E-05
1.E-04
1.E-03
1940 1950 1960 1970 1980 1990 2000 2010
Nu
mb
er
of
Air
cra
ft L
os
se
s /
Fli
gh
t H
ou
rs All Causes
Structures = 37
Structures = 18
1 C. Babish, “USAF ASIP: Protecting Safety for 50 Years”, Aircraft Structural Integrity Program Conference (2008)
14
Rare Events
• Nov 2, 2007 – Loss of F-15C
airplane, 0 casualties
• Aircraft operated within limits
• Mishap occurred due to a
fatigue failure in a forward
fuselage single-load-path.
• Hot spot missed during design
and testing and aggravated by
rogue flaw.
• NDI can be used to prevent
fracture at this hot spot.
15
Reliability of NDT
• Probability of Detection1
• Given a population of cracks of size ‘a’– geometry, material, orientation, location, …
• Given a defined inspection system
• POD(a) = Probability that selected cracks of size ‘a’
from the population will be detected– POD(a) = Proportion of all size ‘a’ cracks from the population
that would be detected
1 A. P. Berens, NDE Reliability Data Analysis. In American Society for Metals Handbook Vol 17 Nondestructive Evaluation and Quality Control, pp. 689-701. ASM International, 1989.
16
Reliability of NDT
• POD curve
• Two parameters – (μ and σ)
• μ is a50
• σ describes slope of
the curve. Steep
curve is ideal.0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
flaw size (mm)
PO
D
a90/95 or andea50
a90
17
Inspection Intervals
Equivalent (standard spectrum) or Flight hours
Cra
ck s
ize -
a
ASIP Damage Tolerance Inspection Intervals
Inspections occur at 1/2 the timeassociated with the time it takesfor a crack to grow from initial sizeto failure, e.g., T2 = 0.5*(T3 - T1)
aCR
aNDE
a0
T1 T2 T3
Tf T3
acr-miss
18
Reliability in NDT
• What is ande?
• aNDE is the “reliably” detected
crack size for the applied
inspection system.
• Traditionally, reliably detected
size has been considered to
be the a90 or a90/95 crack size
from the estimate of the NDE
system POD(a).
• Variations of this can be
investigated.
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
flaw size (mm)
PO
D
a90/95 or andea50
a90
19
Reliability of NDE
• Development of POD was a very important contribution
to quantifying performance of NDE
• Necessary for effective ASIP program. Damage
Tolerance approach requires validated NDE capability.
• Quantifying largest flaw that can be missed is
important.
• Capability of detecting small flaws less important.
• First serious investigation
– Packman et al 19671
– Four NDI methods (X-ray, dye penetrant, magnetic particle, and ultrasonics)
1 P.F. Packman et al. The applicability of a fracture mechanics – nondestructive testing design criterion. Technical Report AFML-TR-68-32, Air Force Materials Laboratory, USA, May 1968.
20
Reliability of NDT
• Rummel et al 19741
– NASA Space Shuttle Program
– Five NDI methods (X-ray, fluorescent penetrant, eddy current, acoustic emission, and ultrasonics)
• Lewis et al 19782 (a.k.a – “Have Cracks Will Travel”)
– Major US Air Force program to determine reliability.
– Perhaps the largest program of this kind in history.
– Disappointing results concerning NDI capabiliity.
• Both studies inspired more advanced statistical analysis1 W.D. Rummel et al, The detection of fatigue cracks by nondestructive testing methods. Technical Report NASA CR 2369, NASA Martin Marietta Aerospace, USA, Feb 1974.
2 W.H. Lewis et al, Reliability of nondestructive inspection – final report. Technical Report SA-ALC/MME 76-6-38-1, San Antonio Air Logistics Center, USA, Dec 1978.
21
Statistical Analysis – POD
• Two types of data collected
– “Hit/Miss” – binary data in terms of whether or not a flaw is found
– “â vs a” – continuous response data has more information
(â = signal magnitude, a = size)
• Statistical rigor introduced in USAF study conducted by
Berens and Hovey in 19811.
– Previous analysis methods grouped “hit/miss” data into bins and used binomial statistics to evaluate POD.
– Berens and Hovey introduced mathematical model based on log-logistic cumulative distribution function to evaluate POD. This is still standard practice.
1 A.P. Berens and P.W. Hovey, “Evaluation of NDE Reliability Characterization,” AFWAL-TR-81-4160, Vol 1, Air Force Wright- Aeronautical Laboratories, Wright-Patterson Air Force Base, Dec 1981.
22
Statistical Analysis – POD
• Hit/Miss analysis
– Sometimes only detection information available (i.e. penetrant testing). Can also be used if constant variance assumption is violated.
– Model assumes POD is a function of flaw size.
– For logit model (logistic)
– For probit model (lognormal) is the standard normal cumulative distribution function.
– Maximum likelihood estimates and
)log()log(
)( 10 aa
aPOD
0 1
1
)exp(1
)exp()(
z
zz
)(z
1 A. P. Berens, NDE Reliability Data Analysis. In American Society for Metals Handbook Vol 17 Nondestructive Evaluation and Quality Control, pp. 689-701. ASM International, 1989.
23
Statistical Analysis – POD
• Hit/Miss analysis
– Unchanged since Berens and Hovey except for confidence bound calculations.
– Confidence bound calculations are not available in any commercial software package.
– Traditional Wald method for confidence bound calculation is anti-conservative with hit/miss data.
– Likelihood ratio method for confidence bound calculation is used in the revised MIL-HNBK-1823A. This is a very complicated calculation. See Annis and Knopp for details1.
1 C. Annis and J.S. Knopp, “Comparing the Effectiveness of a90/95 calculations”, Rev. Prog. Quant. Nondestruct. Eval. Vol 26B pp. 1767–1774, 2007
24
Statistical Analysis – POD
• Hit/Miss analysis
– example
Size, a (inches)
0.0 0.1 0.2 0.3 0.4
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Pro
babi
lity
of D
etec
tion,
PO
D |
a
mh1823
EXAMPLE 3 hm.xls
a50
a90
a90 95
a50 0.1156
a90 0.1709
a90 95 0.1974
PODa
a
link function = 1 logit
0.1156
0.025147
n hits 92
n total 134
EXAMPLE 3 hm.xls
+
0.10 0.11 0.12 0.13
0.015
0.020
0.025
0.030
0.035
0.040
0 1
1 1
EXAMPLE 3 hm.xls
mh1823
loglikelihood ratioCheng & Iles approx
1 MIL-HDBK-1823A, Non-Destructive Evaluation System Reliability Assessment (2009).
25
Statistical Analysis – POD
• “â vs a” analysis (â = signal strength, a = flaw size)
– Magnitude of signal contains information.
– More information results in more statistical confidence, which ultimately reduces sample size requirements.
– Again, regression model assumes POD is function of flaw size.
– Censored regression almost always involved, so commercial package such as SAS or S-Plus necessary.
where,
)log()(
aaPOD
2:Note
1
0 ))(log(
thresholda
1
1 MIL-HDBK-1823A, Non-Destructive Evaluation System Reliability Assessment (2009).
Regression variance
26
Statistical Analysis – POD
• â vs a analysis
• Basically a linear
model.
• Wald confidence
intervals sufficient.
• Delta method used
to generate
confidence intervals
on POD curve.0
200
400
600
800
1000
1200
1400
Size, a (mils)
resp
onse
, â
100
101
102
2 3 4 5 6 7 9 2 3 4 5 6 7 9
mh1823
EXAMPLE 1 â vs a.xls
0.2
0.4
0.6
0.8
PO
D(a
)
+
+
+
Pfalse call0.11
---------------
-
a50 8.8
a90 12.69
a90 95 13.68
27
MIL-HDBK-1823A
28
MIL-HDBK-1823A Summary
• Completed in 2007; released in 2009
• 132 pages
• All new figures (65)
• Approximately 70% new text
• Based on best-practices for NDE and statistical
analysis
• 100% new software available– â vs. a– hit/miss
29
MIL-HDBK-1823ASupport Website
• Download the Handbook
• Request the mh1823 POD software
http://mh1823.com/mh1823
30
Addressing Deficiencies (1)
• Concern exists on performing a POD calculation on poor data sets– Poor data sets can be defined as:
• Limited in sample size• Data does not follow typical POD model fits
– Problem when wrong model used for statistical inference
– Worst case scenario: a fictitious a90/95 may be obtained.
• One possible remedy is a ‘4 parameter model’: – Proposed by Moore and Spencer in 1999,
– However, parameter estimation problem difficult using classical statistical methods
– It is likely that such methods also require large data sets(Very little work performed to date)
distributed sensor data
(i,j,k,l)
rawdata (i,j,k,l)
signal processing /
feature extraction
signalclassification
damagedecision criteria
call(i,m)
database –feature
vector (i,j, kl')
feature vector (i,j,kl')
damage state measures – â(i,j,m)
maintenance action(i,m)
database –damage state
measures(i,j,m)
[loop: j=1:P]
1
lnln
3exp1)(
aaPOD
α : false call rate
β : 1 - random missed flaw rate
σ : curve steepness
μ : flaw size median (50% POD)
distributed sensor data
(i,j,k,l)
rawdata (i,j,k,l)
signal processing /
feature extraction
signalclassification
damagedecision criteria
call(i,m)
database –feature
vector (i,j, kl')
feature vector (i,j,kl')
damage state measures – â(i,j,m)
maintenance action(i,m)
database –damage state
measures(i,j,m)
[loop: j=1:P]
1
lnln
3exp1)(
aaPOD
α : false call rate
β : 1 - random missed flaw rate
σ : curve steepness
μ : flaw size median (50% POD)
31
Addressing Deficiencies (2)
• Markov-Chain Monte Carlo (MCMC) offers a flexible method
to use sampling to calculate confidence bounds.
• Bayesian approach with non-informative priors can be used to– Model function: Logit or Probit – Model form: (Parameters): 2, 3, and 4 parameter models.
• Upper Bound = P(random missed call) = a• Lower Bound = P(false call rate) = b
distributed sensor data
(i,j,k,l)
rawdata (i,j,k,l)
signal processing /
feature extraction
signalclassification
damagedecision criteria
call(i,m)
database –feature
vector (i,j, kl')
feature vector (i,j,kl')
damage state measures – â(i,j,m)
maintenance action(i,m)
database –damage state
measures(i,j,m)
[loop: j=1:P]
1
lnln
3exp1)(
aaPOD
α : false call rate
β : 1 - random missed flaw rate
σ : curve steepness
μ : flaw size median (50% POD)
distributed sensor data
(i,j,k,l)
rawdata (i,j,k,l)
signal processing /
feature extraction
signalclassification
damagedecision criteria
call(i,m)
database –feature
vector (i,j, kl')
feature vector (i,j,kl')
damage state measures – â(i,j,m)
maintenance action(i,m)
database –damage state
measures(i,j,m)
[loop: j=1:P]
1
lnln
3exp1)(
aaPOD
α : false call rate
β : 1 - random missed flaw rate
σ : curve steepness
μ : flaw size median (50% POD)
crack length (a)
POD
0
1.0PRMC
a50
PFC
skewness
0.5
crack length (a)
POD
0
1.0PRMC
a50
PFC
skewness
0.5
32
Bayesian Approach
)(
)()|()|(
yp
pypyp
Prior, “Belief” Physics Based Model
Posterior
Likelihood
Normalizing Constant
• Prior – Physics based model or expert opinion• Normalizing Constant : Useful in model selection• Likelihood: forward model and measurement data• Posterior: Integration of information from model
and experimental data• y: data• λ : parameter(s)
posterior
prior
q
likelihood
33
Bayes Factors for Model Selection
)|(
)|(
)(likelihood Marginal
)(likelihood MarginalBF
1
2
1
221 MP
MP
M
M
y
y
Compare two models M2 and M1 Using the Bayes Factor
BF 2log(BF) Strength of evidence<1 <0 Negative (Support M0)
1~3 0~2Barely worth mentioning
3~20 2~6 Positive20~150 6~10 Strong>150 >10 Very Strong
―Bayes Factors by Kass and Raftery, 1995
Candidate models
Model comparison
1θ
Model 1
P(y | M1)
2θ 3θ
Model 2 Model 3
P(y | M2) P(y | M3)BF21
Parameter estimation
BF32
34
Difficult Data Set #1
• NTIAC A9002(3)L
0 2 4 6 8 10 12 14 16 180
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
What’s going on here?
NTIAC, Nondestructive Evaluation (NDE) Capabilities Data Book 3rd ed., NTIAC DB-97-02, Nondestructive Testing Information Analysis Center, November 1997
35
Difficult Data Set #1
• Example of using the wrong model.
-0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.750
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
70
80
90
100
ACTUAL CRACK LENGTH - (Inch)
PR
OB
AB
ILIT
Y O
F D
ET
EC
TIO
N (
%)
Data Set: A9002(3)LTest Object : 2219 Alu-minum, Stringer Stiff-ened PanelsCondition: After EtchMethod: Eddy Current, Raster Scan with Tooling Aid Operator: Combined, 3 Operators
Opportunities = 417Detected = 29990% POD = 0.374 in. (9.50 mm)False Calls = Not Docu-mented
------ PRED. POD X HIT / MISS DATA
NTIAC, Nondestructive Evaluation (NDE) Capabilities Data Book 3rd ed., NTIAC DB-97-02, Nondestructive Testing Information Analysis Center, November 1997
36
Difficult Data Set #1
• 2 parameter logit/probit
• Appears to show a90 and a90/95 values
0 2 4 6 8 10 12 14 16 18 200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 2 4 6 8 10 12 14 16 18 200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
a (mm)
â
a (mm)
â
37
Difficult Data Set #1
• 3 parameter lower logit/probit
• Again, appears as if there are a90 and a90/95 values
0 2 4 6 8 10 12 14 16 18 200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 2 4 6 8 10 12 14 16 18 200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
a (mm)
â
a (mm)
â
38
Difficult Data Set #1
• 3 parameter upper logit/probit
0 2 4 6 8 10 12 14 16 18 200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 2 4 6 8 10 12 14 16 18 200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
a (mm)
â
a (mm)
â
39
Difficult Data Set #1
• Case study 4 parameter probit
-40 -20 00
500
1000
1500
2000
2500
3000
3500
4000
4500intercept
0 20 400
500
1000
1500
2000
2500
3000
3500
4000
4500slope
0 0.50
500
1000
1500
2000
2500lower asymptote
0.70.80.90
500
1000
1500
2000
2500
3000
3500
4000upper asymptote
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
a (mm)
â
40
Difficult Data Set #1
• 4 parameter Logit is most likely
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
a (mm)
â
41
Difficult Data Set #1
• Summary of Resultsintercept slope lower upper ML a90 a90/95
2 parameter Logit -1.6645 1.7257 3.70E-97 9.4148 12.5552 parameter Probit -0.8476 0.9242 9.08E-98 10.0156 13.49933 parameter lower bound Logit -1.8501 1.7485 0.0898 7.29E-983 parameter lower bound Probit -1.0195 0.9616 0.1098 1.02E-983 parameter upper bound Logit -5.4408 5.5486 0.8478 3.27E-933 parameter upper bound Probit -2.788 2.9377 0.8443 1.64E-934 parameter Logit -13.7647 12.2874 0.175 0.8307 7.24E-924 parameter Probit -9.8542 8.674 0.1864 0.8282 2.49E-92
42
Difficult Data Set #2
• Example of using the wrong model.
• Note: MH1823 Software Produce Numerous Warnings.
-0.050.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.750
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
70
80
90
100
ACTUAL CRACK LENGTH - (Inch)PR
OB
AB
ILIT
Y O
F D
ET
EC
TIO
N (
%)
Data Set: D8001(3)LTest Object : 2219 Aluminum, Stringer Stiffened PanelsCondition: As Ma-chinedMethod: Ultra-sonic, Hand Scan, Shear Wave Operator: Com-bined, 3 Operators
Opportunities = 417Detected = 33090% POD = 0.340in. (8.63mm)False Calls = Not Documented
------ PRED. POD X HIT / MISS DATA
NTIAC, Nondestructive Evaluation (NDE) Capabilities Data Book 3rd ed., NTIAC DB-97-02, Nondestructive Testing Information Analysis Center, November 1997
What’s going on here?
43
Difficult Data Set #2
• 2 parameter logit/probit
• Appears to that a90 and a90/95 values exist.
a (mm)
â
a (mm)
â
0 2 4 6 8 10 12 14 16 18 200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 2 4 6 8 10 12 14 16 18 200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
44
Difficult Data Set #2
• 4 parameter probit
• a90 and a90/95 value doesn’t exist
a (mm)
â
0 1 2 3 4 5 6 7 8 9 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
45
Difficult Data Set #2
• Which model is correct?
• Log Marginal Likelihoods and Bayes factors
Model type logit probit
Bayes factor
logit/probit /2-parameter
(logit)
/2-parameter
(probit)
2-parameter –200.16 –201.63 1.47 ——— ———
3-parameter lower
bound –203.86 –203.49 –0.37 –3.7 –1.86
3-parameter upper
bound –189.30 –189.00 –0.30 10.86 12.63
4-parameter –188.89 –185.12 –3.76 11.27 16.51
46
Small Data Set
• A great example where the last procedure fails
• Small data sets do not cause any warnings with
standard software.
47
Small Data Set
• 4 parameter model
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
a (inches)
â
48
Small Data Set
• Summary for Small data setMaus intercept slope lower upper ML a90 a90/952 parameter Logit 27.1517 12.6708 8.87E-04 0.1405 0.1829 4.51E+012 parameter Probit 22.6165 10.577 2.60E-03 0.1329 0.159 8.17E+023 parameter lower bound Logit 24.0452 11.6731 0.0719 9.67E-063 parameter lower bound Probit 24.5728 12.1283 0.0705 9.16E-063 parameter upper bound Logit 27.6386 12.5028 0.7967 2.29E-043 parameter upper bound Probit 22.0074 9.8899 0.7917 4.87E-054 parameter Logit 24.8792 11.3063 0.0706 0.7926 1.97E-054 parameter Probit 23.7871 10.6664 0.0711 0.7781 3.18E-06
IMTT2 parameter Logit 25.5467 9.8023 3.30E-03 0.0941 0.1248 1.52E+022 parameter Probit 19.667 7.4983 4.40E-03 0.0873 0.1139 7.99E+033 parameter lower bound Logit 28.5743 11.3208 0.1273 2.07E-043 parameter lower bound Probit 23.2202 9.2585 0.1391 1.58E-043 parameter upper bound Logit 24.5688 9.2608 0.9055 6.34E-063 parameter upper bound Probit 20.405 7.6861 0.9041 3.10E-054 parameter Logit 25.3354 9.9529 0.1263 0.9067 2.17E-054 parameter Probit 26.4209 11.153 0.1679 0.8884 5.51E-07
49
Conclusion
• It sometimes appears (and is desirable) that there is a
systematic procedure that will automatically determine the
best model, but this actually isn’t the case.
• Bayes Factors provide useful approach to evaluate the
best model
• However, an example with a small data set showed that
even the Bayes factor procedure can lead one to a wrong
conclusion– It doesn’t tell you to stop and not perform an analysis– Need to look at data and perform ‘diagnostics’
• Bottom line – Procedures don’t replace statisticians.
50
• C-5 Wing Splice Fatigue Crack Specimens:– Two layer specimens are 14" long and 2" wide,– 0.156" top layer, 0.100" bottom layer– 90% fasteners were titanium, 10% fasteners were steel– Fatigue cracks position at 6 and 12 o’clock positions – Crack length ranged from 0.027" – 0.169“ (2nd layer)– vary: location of cracks – at both 1st and 2nd layer
• AFRL/UDRI Acquired Data (Hughes, Dukate, Martin)
A1-16C0
100 200 300 400 500 600 700
20406080
100120 0
2
4
A1-16C0
100 200 300 400 500 600 700
20406080
100120
-2
0
2
0.110" 0.107"0.110" 0.107"
x
z
b
a
2nd layer – corner crack
zb
a
1st layer – corner crack
Model-Assisted POD
51
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2-0.02
0
0.02
0.04
0.06
0.08
0.1
crack length (in)
mea
sure
men
t re
spon
se (
V)
model-cornermodel-throughexp.
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2-0.02
0
0.02
0.04
0.06
0.08
0.1
mea
sure
men
t re
spon
se (
V)
crack length (in)
modelexp.
MAPOD
B) 2nd layer – faying surface – corner / through cracksA) 1st layer – faying surface – corner cracksx
zb
a
x
zb
a
x
z
b
a
x
z
a
• Perform simulated studies: Compare with experimental results
• Bayesian methods can assist in determining best model.
52
0 0.05 0.1 0.15 0.2 0.250
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1MAPODexp.
Demonstration of model-assisted probability of detection (MAPOD)
Experimental Comparison with Full Model-Assisted
Successes:• First demonstration of (MAPOD)
in the literature for structural problem.
• Eddy current models were able to simulate eddy current inspection of 2nd layer fatigue cracks around fastener holes.
.
experimental POD
full model-assisted POD
zx
zx
layerslayers
a
b
zy
corner crack
fastener site x
probe
corner crack
y
Knopp, Aldrin, Lindgren, and Annis, “Investigation of a model-assisted approach to probability of detection evaluation”, Review of Progress in Quantitative Nondestructive Evaluation, (2007)
crack length (in)
PO
D
2nd layer – faying surface – corner & through cracks
MAPOD
53
Heteroscedasticity
• â vs a analysis• Berens, A.P and P.W. Hovey, “Flaw Detection Reliability Criteria, Volume I – Methods and Results,” AFWAL-
TR-84-4022, Air Force Wright Aeronautical Laboratories, Wright-Patterson Air Force Base, April 1984 (â vs a
analysis is always more advantageous than hit/miss because much more information is available, but
hit/miss is used much more in practice)
• Berens, A.P., NDE Reliability Data Analysis, American Society for Metals Handbook Nondestructive
Evaluation and Quality Control, Vol 17, pp. 689-701, ASM International, 1989. (classic reference on the
subject, still standard today)
• MIL-HDBK-1823 (1999) – (Guidance for POD studies based on the methods described by Berens and Hovey)
• Box Cox transformations• Kutner, Nachtsheim, Neter, and Li, “Applied Linear Statistical Models”, (2005)
54
Heteroscedasticity
• â vs a assumes homoscedasticity, and if that
assumption is violated, one must resort to hit/miss
analysis. This was the case for an early MAPOD
study (Knopp et al. 2007)
• Box-Cox transformation can remedy this problem.
0 1 2 3 4 5
-0.02
0.00
0.02
0.04
0.06
0.08
0.10
sig
na
l re
sp
on
se
â
crack size (mm)
55
Heteroscedasticity
• Box Cox transformation according to Kutner et al.
• Note: Not to be used for nonlinear relations.
• Box-Cox identifies transformations from a family of
power transformations.
• The form is:
• Some common transformations
ââ
25.0
05.00.1
2ââ
ââ
ââ elog
ââ
1
ââ
1
56
Heteroscedasticity
• New regression model with power transform:
• λ needs to be estimated. Box-Cox uses maximum likelihood.
• I use Excel’s Solver to do a numerical search for potential λ values.
• Standardize observations so that the magnitude of the error sum of squares does not depend on the value of λ.
• c is the geometric mean of the observations
• Next step is to regress g on a for a given λ and calculate SSE.
iii aâ 10
),1(1
1
ii â
cg
,ln( ii âcg
niâc /1
0
0
57
Heteroscedasticity
• The value of λ that minimizes SSE is the best
transformation.
• This procedure is only a guide, and a high level of
precision is not necessary.
• For this data set, λ = 0.45
0 1 2 3 4 5
0.040.060.080.100.120.140.160.180.200.220.240.260.280.300.320.340.360.380.400.42
â t
ran
sfo
rme
d la
mb
da
= 0
.45
crack size (mm)0 1 2 3 4 5
0.00
0.02
0.04
0.06
0.08
0.10
0.12
â +
0.0
2
crack size (mm)
58
Heteroscedasticity
• Box Cox – POD curve associated with λ = 0.45
transform.
size, a (mm)0 1 2 3 4
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0p
rob
ab
ility
of d
ete
ctio
n, P
OD
| a
a50
a90a50 a90 95a50a50a50a50a50a50a50a50a50a50a50a50a50
59
Heteroscedasticity
• Box Cox transformation – square root transform
0 1 2 3 4 50.000.020.040.060.080.100.120.140.160.180.200.220.240.260.280.300.320.340.36
â t
ran
sfo
rme
d la
mb
da
= 0
.5
crack size (mm)
size, a (mm)
resp
on
se, â
0 1 2 3 4
0.00
0.05
0.10
0.15
0.20
0.25
0.30
60
Heteroscedasticity
• POD result for square-root transform
size, a (mm)0 1 2 3 4
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
pro
ba
bili
ty o
f de
tect
ion
, PO
D |
aa50
a90a50 a90 95a50a50a50a50a50a50a50a50a50a50a50a50a50
61
Summary
• Box-Cox enables â vs a analysis for data sets where
the variance is not constant but has some
relationship with the independent variable such as
crack size.analysismethod
λ leftcensor
detectionthreshold
falsecalls
a90 (mm) a90/95 (mm) a90 - a90/95 % difference
1st order linear 0.45 0.13 0.23 0 2.176 2.327 6.9%
1st order linear 0.5 0.14 0.195 1 2.102 2.257 7.3%
1st order linear 0.5 0.195 0.195 1 2.269 2.53 11.5%
2nd order linear 0.5 .14 0.195 1 2.277 2.472 8.5%
2nd order linear 0.5 0.195 0.195 1 2.197 2.428 10.5%
hit/miss 1 0.187 1 1.72 2.04 18.6%
hit/miss 1 0.162 11 1.498 1.907 27.3%
62
Physics-Inspired Models
• MAPOD – idea is to use simulation to reduce time and
cost of POD studies.
• Properly integrating simulation and experiment is an
enormous task.
• Intermediate step is to use models to inspire the
functional form of the regression model.
63
Physics-Inspired Models - literature
• R.B. Thompson and W.Q. Meeker, “Assessing the POD of Hard-Alpha Inclusions from Field Data”, Review of Progress in QNDE,
Vol. 26, AIP, pp 1759-1766, (2007). (Example where kink regression is used to distinguish between Raleigh scattering at small flaw
sizes and regular scattering at larger sizes)
Figure from http://www.tc.faa.gov/its/worldpac/techrpt/ar0763.pdf
64
Physics-Inspired Models
• Simulation and Experiment
• Visual inspection reveals that a 2nd order linear model may fit the
data better than the standard â vs a analysis.
• Evidence beyond visual: 1) p-value for a2 is 0.001 and adjusted
R-square value increases slightly with inclusion of a2.
-0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
-0.02
0.00
0.02
0.04
0.06
0.08
0.10
-0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
-0.02
0.00
0.02
0.04
0.06
0.08
0.10
Experiment
Simulation
0 1 2 3 4 5
0 1 2 3 4 50.000.020.040.060.080.100.120.140.160.180.200.220.240.260.280.300.320.340.36
qu
ad
rati
c m
od
el
65
Physics-Inspired Models – parallel work
• Recent unpublished work by Li, Nakagawa, Larson,
and Meeker.
http://www.stat.iastate.edu/preprint/articles/2011-05.pdf
66
Summary
• Physics model hopefully provides functional form of
the response, and this knowledge can be used in the
initial DOE for a POD study.
• Physics-Inspired model concept is a first step in
using physics models for making inference on
reliability.
• Confidence bounds calculation on models more
complicated than â vs a is an open problem,
especially transforming to probability of detection
curve.
67
Bootstrap Methods
• Confidence bound calculations are complicated and
only available for hit/miss and â vs a analysis.
• More complicated models require new method.
• Bootstrap methods are simple and flexible enough to
provide confidence bounds for a wide variety of
models.
68
Bootstrap Methods - literature
• Efron, B., and Tibshirani, R. J., An Introduction to the Bootstrap,
Chapman & Hall, New York, NY, 1993.
• C.C. McCulloch and J. Murphy, “Local Regression Modeling for
Accurate Analysis of Probability of Detection Data”, Mat. Eval.,
Vol. 60, no. 12, pp. 1438-1143, (2002) (A rare example of
bootstrapping used in NDE context)
• Amarchinta, Tarpey, and Grandhi, “Probabilistic Confidence
Bound Framework for Residual Stress Field Predictions”, 12th
AIAA Non-Deterministic Approaches Conference, AIAA-2010-
2519, Orlando, FL, (2010).
69
Bootstrap Methods
• Bootstrap procedure is simply to sample with
replacement and generate a POD curve each time.
• Sort all of the a90 values in ascending order and look
at the value in the 95th percentile to determine a90/95
• Example for the previous transformed data set with λ
= 0.5
a90 a90/95
Wald Method 2.102 mm 2.257 mm
Bootstrap 1,000 2.096 mm 2.281 mm
Bootstrap 10,000 2.099 mm 2.299 mm
Bootstrap 100,000 2.099 mm 2.297 mm
70
Summary
• Bootstrapping is beautiful.
• 1,000 samples probably sufficient, but 100,000 isn’t that difficult.
• Some interesting formal work could be done to look at the
influence of censoring, which is probably beyond the scope of
this work.
• Results seem to indicate the 2nd order model (which I think is the
best) is the most conservative.
• Further investigation of censoring planned.analysismethod
λ leftcensor
detectionthreshold
falsecalls
a90 (mm) a90/95 (mm) a90 - a90/95 %
difference
1st order linear 0.45 0.13 0.23 0 2.176 2.327 6.9%1st order linear 0.5 0.14 0.195 1 2.102 2.257 7.3%1st order linear 0.5 0.195 0.195 1 2.269 2.53 11.5%2nd order linear 0.5 .14 0.195 1 2.277 2.472 8.5%2nd order linear 0.5 0.195 0.195 1 2.197 2.428 10.5%hit/miss 1 0.187 1 1.72 2.04 18.6%hit/miss 1 0.162 11 1.498 1.907 27.3%
71
Summary
• Hit/miss analysis – MCMC
• â vs a analysis – unchanged
• Higher order / complex models – bootstrapping
• Methods presented for putting confidence bounds on are not elegant by any stretch of the imagination, but incredibly robust and useful.
• Much work needs to be done via simulation to move these methods into practice.
• UQ – Progress made on uncertainty propagation.
• UQ – Bayesian calibration techniques being investigated.
72
Efficient Uncertainty Propagation
• Deterministic simulations are very time consuming.
• NDE problems require stochastic simulation if the
models are to truly impact analysis of inspections.
• Need modern uncertainty quantification methods to
address this problem.
Eddy Current NDE
Model[Stochastic]
n
2
1
Z~
73
Uncertainty Propagation Methods:• Monte Carlo
• Latin Hypercube (Sampling Methods)
• FORM/SORM
• Full Factorial Numerical Integration
• Univariate Dimension Reduction
• Karhunen–Loève Expansion / ANOVA (High Dimension Problems)
• Polynomial Chaos Expansion (Intrusive)
• Probabilistic Collocation Method (Non Intrusive)
Efficient Uncertainty Propagation
74
Uncertainty Propagation
• Motivation: Model evaluations are computationally expensive.
There is a need for more efficient methods than Monte Carlo
• Objective: Efficiently propagate uncertain inputs through “black
box” models and predict output probability density functions.
(Non-intrusive approach)
• Approach: Surrogate models based on Polynomial Chaos
Expansions meet this need.
Eddy Current NDEModel
[Deterministic]
n
2
1
Z~
UniformX
NormalX
n ~
~1
Eddy Current NDE
Model[Stochastic]
? ~~Z
Input Parameters with Variation:
• Probe dimensions (Liftoff / tilt)
• Flaw characteristics (depth, length, shape)
75
Uncertainty Propagation
Uncertainty propagation for parametric NDE characterization problems:
• Probabilistic Collocation Method (PCM) approximates model response with
a polynomial function of the uncertain parameters.
• This reduced form model can then be used with
traditional uncertainty analysis approaches,
such as Monte Carlo.
Extensions of generalized polynomial chaos (gPC) to high-dimensional
(2D, 3D) damage characterization problems:• Karhunen-Loeve expansion• Analysis of variance (ANOVA) • Smolyak Sparse Grids
N 1
Critical Flaw Size
Key Damage and Measurement
States (e.g. crack length, probe liftoff)
Parameterized Flaw
Localization and Sizing
Full 3D Damage and Material State
Characterization
>1 >>1
N
iiicfZ
1
)()(ˆ xx
76
Approach (1): Karhunen-Loeve Expansion • Address stochastic input variable reduction when number of
random variables (N) is large.• Apply Karhunen-Loeve Expansion to map random variables into
a lower-dimensional random space (N').
Eddy Current Example: • Correlation function (covariance model)
defines random conductivity map,• Set choice of grid length to
– achieve model convergence and– eliminate insignificant eigenvalues
for reduced order conductivity map.
Uncertainty Propagation andHigh Dimensional Model Representation
N
nnnn
1
)()( xx )(x Karhunen-Loéve
Expansion
N ...1
covariance model
),( xx C
conductivity map withN random variables
reduced order conductivity map with N' random variables
N' random variables
Crystallites (Grains)
=2.2*106 S/m
Coil
77
Approach (2): Analysis of Variance (ANOVA) Expansion• Provides surrogate to represent high dimensional set of parameters• Analogous to ANOVA decomposition in statistics• Locally represent model output through
expansion at anchor point in -space
– Requires inverse problem
– Replace random surface with equivalent 'homogeneous' surface
Uncertainty Propagation andHigh Dimensional Model Representation
ANOVAExpansion
N
nnnn
1
)()( xx )(x Karhunen-Loéve
Expansion
N ...1
defined bycovariance
model),( xx C
conductivitymap withN randomvariables
reduced orderconductivity map withN' random variables
N' random variables(1) Identify
uniquesources
of variance
(2) Identify significant
factorsin model M ...1
M random variables
)(ξZ
N' >> MN >> N' ξξ
78
Approach (2): Analysis of Variance Expansion + Smolyak Sparse Grids• Significant computational expense for high-dimensional integrals• Can leverage sparse grids based on the Smolyak construction
[Smolyak, 1963; Xiu, 2010; Gao and Hesthaven, 2010]– Provides weighted solutions at specific nodes and adds them to
reduce the amount of necessary solutions– Sparse grid collocation provides subset of full
tensor grid for higher dimensional problems – Approach can also be applied to gPC/PCM
Uncertainty Propagation andHigh Dimensional Model Representation
ANOVAExpansion
N
nnnn
1
)()( xx )(x Karhunen-Loéve
Expansion
N ...1
defined bycovariance
model),( xx C
conductivitymap withN randomvariables
reduced orderconductivity map withN' random variables
N' random variables(1) Identifyuniquesources
of variance
(2) Identify significant
factorsin model M ...1
M random variables
)(ξZ
N' >> MN >> N' ξξ
-1 -0.5 0 0.5 1-1
-0.5
0
0.5
1Sparse Grid and Full Tensor Product Grid
79
All Models Are Wrong
• “All models are wrong, and to suppose that inputs should
always be set to their ‘true’ values when these are
‘known’ is to invest the model with too much credibility in
practice. Treating a model more pragmatically, as having
inputs that we can ‘tweak’ empiracally, can increase is
value and predictive power” (Kennedy O’Hagan 2002)
• Eddy current liftoff is a particularly great example of this.
80
Bayesian Analysis
• Bayesian Model Averaging (BMA) – Used when experts provide competing models for the same system.
• Bayesian calibration is the most promising technical option for integrating experimental data and simulation in a rigorous way that accounts for all sources of uncertainty.
• Kennedy / O’Hagan paper 2001 inspired many efforts in this direction. BTW, rejected by Journal of the American Statistical Association. Now published in Journal of the Royal Statistical Society: Series B, and has been referenced 620 times. Add that to the number of references to the unpublished technical report that was rejected by JASA, and you get a large number.
• Many efforts ongoing in UQ community
81
Bayesian Calibration
• What uncertainty needs to be quantified to go from the
simulator to reality?– Input– Propagation from input to output (Hopefully done in previous
section, but notice no uncertainty is actually quantified in this part)
– Code– Discrepancy
82
Bayesian Calibration
• Terminology– Model: set of equations that describes some real world
phenomena.– Simulator: Executes the model with computer code.– Calibration parameters: θ– Controlled input variables: x
83
Bayesian Calibration
• Simulator: y = f(x,θ)
• Observations: observations = reality(control variables) +
ε, where ε is observation error
• Reality doesn’t depend on calibration parameters.
• Typically you see: observations = f(x,θ) + ε
• This is wrong, mainly because it doesn’t account for
uncertainty in θ, and the ε’s are not independent.
• Bayesian methods are used to learn about uncertainty in
θ.* Paraphrasing discussion with Tony O’Hagan
84
Bayesian Calibration - literature
• Kennedy, M. C. and O’Hagan, A., “Bayesian calibration of computer models,” J. R. Statist. Soc. B, Vol. 63,
pp. 425–464, (2001).
• Park, I., Amarachinta, H. K., and Grandhi, R. V., “A Bayesian approach to quantification of model
uncertainty,” Reliability Engineering and System Safety, Vol 95, pp. 777-785, (2010)
85
Summary
• Hit/miss analysis – MCMC
• â vs a analysis – unchanged
• Higher order / complex models – bootstrapping
• Methods presented for putting confidence bounds on are not elegant by any stretch of the imagination, but incredibly robust and useful.
• Much work needs to be done via simulation to move these methods into practice.
• UQ – Progress made on uncertainty propagation.
• UQ – Bayesian calibration techniques being investigated.