1 integrity service excellence anomalous events in non-destructive inspection data 18 dec 2012...

1

Integrity Service Excellence

Anomalous Events in Non-Destructive Inspection Data

18 Dec 2012

Jeremy S. Knopp

AFRL/RXCA

Air Force Research Laboratory

2

Disclaimer

• The views expressed in this presentation are those of

the author and do not reflect the official policy or

position of the United States Air Force, Department of

Defense, or the United States Government

3

Outline

• Historical Perspective of Aircraft Structural Integrity

Program (ASIP)

• Probability of Detection (POD)

• Nondestructive Evaluation System Reliability

Assessment Handbook (MIL-HDBK-1823A) Revision

• Research Objectives to Improve State-of-the-Art POD

Evaluation

4

Aircraft Management StrategiesSafe Life – No Periodic Inspection Required.

– Fly a certain number of hours and retire.– Considers the effects of cyclic loading on the airframe with full-scale fatigue test.– For example, testing to 40,000 hours ensures safe life of 10,000 hours.

• Used by US Navy.Damage Tolerance Assessment (DTA) – Periodic Inspection to Detect Damage

– Fly and inspect, reassess time to next inspection based on fatigue crack growth analysis, usage, and results of inspection.• Assumes imperfections are present in the early stages of aircraft service.• REQUIRES RELIABLE AND VALIDATED NDI• Used by US Air Force.

Condition-based Maintenance (CBM) – Periodic Inspection and/ or onboard monitoring to Characterize Damage.– Perform repairs only when needed.

• Will minimize maintenance costs.• Requires damage characterization, not just detection.• Desired by US Air Force to maximize availability of assets while minimizing

sustainment costs.Condition-based Maintenance (CBM+) – Periodic Inspection to Characterize Damage

– CBM plus prognosis to estimate capability and remaining life for optimal maintenance scheduling.

5

The USAF Aircraft Structural Integrity Program (ASIP)

• Provides the engineering discipline and management framework …

– associated with establishing and maintaining structural safety … – in the most cost-effective manner …– through a set of defined inspections, repairs, modifications and

retirement actions

• Based on a preventative maintenance strategy that starts in acquisition and continues until retirement

ASIP Processes involve engineers and managers working together to control the risks of structural failure

6

“Wright” approach to Structural Integrity

• Approach used by Wright brothers

began in 1903.

• Essentially the same approach used

by USAF for over 50 years.

• They performed stress analysis and

conducted static tests far in excess

of the loads expected in flight.

• Safety factor applied to forces that

maintained static equilibrium with

weight.

7

• Air Force Strategic Air Command lost

two B-47 Bombers on the same day!

• Metal fatigue caused the wings on two

aircraft to fail catastrophically in flight.

• Standard static test and abbreviated

flight load survey proved structure

would support at least 150% of its

design limit load.

• No assurance that structure would

survive smaller cyclic loads in actual

flight.

B-47 Experience, 1958

8

ASIP Initiated

• Aircraft Structural Integrity Program (ASIP) initiated on

12 Jun 1958 with 3 primary objectives:– Control structural fatigue in aircraft fleet.– Develop methods to accurately predict service life.– Establish design and testing methods to avoid structural

problems in future aircraft systems.

• Led to the “safe-life” approach.– Probabilistic approach to establishing the aircraft service life

capability.– Safe-life established by conducting a full-scale airframe fatigue

test and dividing the number of successfully test simulated flight hours by a scatter factor (usually 4).

9

F-111 Experience, 1969

• Wing separation at ~100 hours (safe-life qualified 4000

hours). Crack initiated from a manufacturing defect.• Two-phase program initiated.

• Phase 1 (allow operations at 80% of designed capability)– Material crack growth data collected to

develop flaw growth model.– Cold proof test to demonstrate that

critical size flaws not present in critical forgings

– Improved NDI for use in reinspection

• Phase 2 (allow operations at 100% of designed capability)– Incorporated NDI during production.– Used fracture mechanics to determine

inspection intervals.

10

Damage Tolerance Update, 1974

• In response to F-111 mishap, ASIP incorporated Damage

Tolerance requirements.– Objective was to prevent airframe failures resulting from the safe

life approach .

• ASIP provides 3 options to satisfy damage tolerance

requirement– Slow crack growth (most common option)– Fail-safe multiple load path– Fail-safe crack-arrest

• Primary basis for aircraft structure maintenance program

for last 30+ years.– Inspection requirements based on initial flaw assumptions (slow

crack growth) and NDI capability.

• Today - Inspection burden is increasing due to age of fleet!– NDE Research needed to reduce the future maintenance burden.

11

Evolution of Structural Integrity Approaches

Each change was made to enhance our ability to protect structural integrity (prevent structural failures)

Today, preventing structural failures requires anticipating events that ensure continuing airworthiness, reliability, availability, and cost-effectiveness

ASIP Approach Prevent Structural Failures

Cost-Effectively1950 1960 1970 1980 1990 2000 2010 2020

Prevent Static Load Failures

Prevent Fatigue Failures

Protect for Potential Damage

Risk Assessment/Management

Timeframe Associated with ASIP Approach

MIL-STD-1530C

12

USAF Structural Reliability

• USAF aircraft losses since 1971:

– 18 due to a structural failure

– 19 due to a structural failure that was caused by maintenance, pilot error, flight control failures, etc.

• Next chart plots overall USAF aircraft loss rate from

1947 – 2002 and structures contribution since 1971

– Overall loss rate calculated for each year (total losses per year / total fleet flight hours per year)

– Loss rate due to structures is cumulative since many years without losses due to structural failure

13

USAF Structural Reliability

USAF Aircraft Loss Rate (Destroyed Aircraft)

1.E-08

1.E-07

1.E-06

1.E-05

1.E-04

1.E-03

1940 1950 1960 1970 1980 1990 2000 2010

Nu

mb

er

of

Air

cra

ft L

os

se

s /

Fli

gh

t H

ou

rs All Causes

Structures = 37

Structures = 18

1 C. Babish, “USAF ASIP: Protecting Safety for 50 Years”, Aircraft Structural Integrity Program Conference (2008)

14

Rare Events

• Nov 2, 2007 – Loss of F-15C

airplane, 0 casualties

• Aircraft operated within limits

• Mishap occurred due to a

fatigue failure in a forward

fuselage single-load-path.

• Hot spot missed during design

and testing and aggravated by

rogue flaw.

• NDI can be used to prevent

fracture at this hot spot.

15

Reliability of NDT

• Probability of Detection1

• Given a population of cracks of size ‘a’– geometry, material, orientation, location, …

• Given a defined inspection system

• POD(a) = Probability that selected cracks of size ‘a’

from the population will be detected– POD(a) = Proportion of all size ‘a’ cracks from the population

that would be detected

1 A. P. Berens, NDE Reliability Data Analysis. In American Society for Metals Handbook Vol 17 Nondestructive Evaluation and Quality Control, pp. 689-701. ASM International, 1989.

16

Reliability of NDT

• POD curve

• Two parameters – (μ and σ)

• μ is a50

• σ describes slope of

the curve. Steep

curve is ideal.0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

flaw size (mm)

PO

D

a90/95 or andea50

a90

17

Inspection Intervals

Equivalent (standard spectrum) or Flight hours

Cra

ck s

ize -

a

ASIP Damage Tolerance Inspection Intervals

Inspections occur at 1/2 the timeassociated with the time it takesfor a crack to grow from initial sizeto failure, e.g., T2 = 0.5*(T3 - T1)

aCR

aNDE

a0

T1 T2 T3

Tf T3

acr-miss

18

Reliability in NDT

• What is ande?

• aNDE is the “reliably” detected

crack size for the applied

inspection system.

• Traditionally, reliably detected

size has been considered to

be the a90 or a90/95 crack size

from the estimate of the NDE

system POD(a).

• Variations of this can be

investigated.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

flaw size (mm)

PO

D

a90/95 or andea50

a90

19

Reliability of NDE

• Development of POD was a very important contribution

to quantifying performance of NDE

• Necessary for effective ASIP program. Damage

Tolerance approach requires validated NDE capability.

• Quantifying largest flaw that can be missed is

important.

• Capability of detecting small flaws less important.

• First serious investigation

– Packman et al 19671

– Four NDI methods (X-ray, dye penetrant, magnetic particle, and ultrasonics)

1 P.F. Packman et al. The applicability of a fracture mechanics – nondestructive testing design criterion. Technical Report AFML-TR-68-32, Air Force Materials Laboratory, USA, May 1968.

20

Reliability of NDT

• Rummel et al 19741

– NASA Space Shuttle Program

– Five NDI methods (X-ray, fluorescent penetrant, eddy current, acoustic emission, and ultrasonics)

• Lewis et al 19782 (a.k.a – “Have Cracks Will Travel”)

– Major US Air Force program to determine reliability.

– Perhaps the largest program of this kind in history.

– Disappointing results concerning NDI capabiliity.

• Both studies inspired more advanced statistical analysis1 W.D. Rummel et al, The detection of fatigue cracks by nondestructive testing methods. Technical Report NASA CR 2369, NASA Martin Marietta Aerospace, USA, Feb 1974.

2 W.H. Lewis et al, Reliability of nondestructive inspection – final report. Technical Report SA-ALC/MME 76-6-38-1, San Antonio Air Logistics Center, USA, Dec 1978.

21

Statistical Analysis – POD

• Two types of data collected

– “Hit/Miss” – binary data in terms of whether or not a flaw is found

– “â vs a” – continuous response data has more information

(â = signal magnitude, a = size)

• Statistical rigor introduced in USAF study conducted by

Berens and Hovey in 19811.

– Previous analysis methods grouped “hit/miss” data into bins and used binomial statistics to evaluate POD.

– Berens and Hovey introduced mathematical model based on log-logistic cumulative distribution function to evaluate POD. This is still standard practice.

1 A.P. Berens and P.W. Hovey, “Evaluation of NDE Reliability Characterization,” AFWAL-TR-81-4160, Vol 1, Air Force Wright- Aeronautical Laboratories, Wright-Patterson Air Force Base, Dec 1981.

22


• Hit/Miss analysis

– Sometimes only detection information available (i.e. penetrant testing). Can also be used if constant variance assumption is violated.

– Model assumes POD is a function of flaw size.

– For logit model (logistic)

– For probit model (lognormal) is the standard normal cumulative distribution function.

– Maximum likelihood estimates and

)log()log(

)( 10 aa

aPOD

0 1

1

)exp(1

)exp()(

z

zz

)(z

1 A. P. Berens, NDE Reliability Data Analysis. In American Society for Metals Handbook Vol 17 Nondestructive Evaluation and Quality Control, pp. 689-701. ASM International, 1989.

23



– Unchanged since Berens and Hovey except for confidence bound calculations.

– Confidence bound calculations are not available in any commercial software package.

– Traditional Wald method for confidence bound calculation is anti-conservative with hit/miss data.

– Likelihood ratio method for confidence bound calculation is used in the revised MIL-HNBK-1823A. This is a very complicated calculation. See Annis and Knopp for details1.

1 C. Annis and J.S. Knopp, “Comparing the Effectiveness of a90/95 calculations”, Rev. Prog. Quant. Nondestruct. Eval. Vol 26B pp. 1767–1774, 2007

24



– example

Size, a (inches)

0.0 0.1 0.2 0.3 0.4

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Pro

babi

lity

of D

etec

tion,

PO

D |

a

mh1823

EXAMPLE 3 hm.xls

a50

a90

a90 95

a50 0.1156

a90 0.1709

a90 95 0.1974

PODa

a

link function = 1 logit

0.1156

0.025147

n hits 92

n total 134

EXAMPLE 3 hm.xls

+

0.10 0.11 0.12 0.13

0.015

0.020

0.025

0.030

0.035

0.040

0 1

1 1

EXAMPLE 3 hm.xls

mh1823

loglikelihood ratioCheng & Iles approx

1 MIL-HDBK-1823A, Non-Destructive Evaluation System Reliability Assessment (2009).

25


• “â vs a” analysis (â = signal strength, a = flaw size)

– Magnitude of signal contains information.

– More information results in more statistical confidence, which ultimately reduces sample size requirements.

– Again, regression model assumes POD is function of flaw size.

– Censored regression almost always involved, so commercial package such as SAS or S-Plus necessary.

where,

)log()(

aaPOD

2:Note

1

0 ))(log(

thresholda

1

1 MIL-HDBK-1823A, Non-Destructive Evaluation System Reliability Assessment (2009).

Regression variance

26


• â vs a analysis

• Basically a linear

model.

• Wald confidence

intervals sufficient.

• Delta method used

to generate

confidence intervals

on POD curve.0

200

400

600

800

1000

1200

1400

Size, a (mils)

resp

onse

, â

100

101

102

2 3 4 5 6 7 9 2 3 4 5 6 7 9

mh1823

EXAMPLE 1 â vs a.xls

0.2

0.4

0.6

0.8

PO

D(a

)

+

+

+

Pfalse call0.11

---------------

-

a50 8.8

a90 12.69

a90 95 13.68

27

MIL-HDBK-1823A

28

MIL-HDBK-1823A Summary

• Completed in 2007; released in 2009

• 132 pages

• All new figures (65)

• Approximately 70% new text

• Based on best-practices for NDE and statistical

analysis

• 100% new software available– â vs. a– hit/miss

29

MIL-HDBK-1823ASupport Website

• Download the Handbook

• Request the mh1823 POD software

http://mh1823.com/mh1823

http://mh1823.com/mh1823

30

Addressing Deficiencies (1)

• Concern exists on performing a POD calculation on poor data sets– Poor data sets can be defined as:

• Limited in sample size• Data does not follow typical POD model fits

– Problem when wrong model used for statistical inference

– Worst case scenario: a fictitious a90/95 may be obtained.

• One possible remedy is a ‘4 parameter model’: – Proposed by Moore and Spencer in 1999,

– However, parameter estimation problem difficult using classical statistical methods

– It is likely that such methods also require large data sets(Very little work performed to date)

distributed sensor data

(i,j,k,l)

rawdata (i,j,k,l)

signal processing /

feature extraction

signalclassification

damagedecision criteria

call(i,m)

database –feature

vector (i,j, kl')

feature vector (i,j,kl')

damage state measures – â(i,j,m)

maintenance action(i,m)

database –damage state

measures(i,j,m)

[loop: j=1:P]

1

lnln

3exp1)(

aaPOD

α : false call rate

β : 1 - random missed flaw rate

σ : curve steepness

μ : flaw size median (50% POD)


(i,j,k,l)

rawdata (i,j,k,l)

signal processing /

feature extraction



call(i,m)

database –feature

vector (i,j, kl')





measures(i,j,m)

[loop: j=1:P]

1

lnln

3exp1)(

aaPOD





31

Addressing Deficiencies (2)

• Markov-Chain Monte Carlo (MCMC) offers a flexible method

to use sampling to calculate confidence bounds.

• Bayesian approach with non-informative priors can be used to– Model function: Logit or Probit – Model form: (Parameters): 2, 3, and 4 parameter models.

• Upper Bound = P(random missed call) = a• Lower Bound = P(false call rate) = b


(i,j,k,l)

rawdata (i,j,k,l)

signal processing /

feature extraction



call(i,m)

database –feature

vector (i,j, kl')





measures(i,j,m)

[loop: j=1:P]

1

lnln

3exp1)(

aaPOD






(i,j,k,l)

rawdata (i,j,k,l)

signal processing /

feature extraction



call(i,m)

database –feature

vector (i,j, kl')





measures(i,j,m)

[loop: j=1:P]

1

lnln

3exp1)(

aaPOD





crack length (a)

POD

0

1.0PRMC

a50

PFC

skewness

0.5

crack length (a)

POD

0

1.0PRMC

a50

PFC

skewness

0.5

32

Bayesian Approach

)(

)()|()|(

yp

pypyp

Prior, “Belief” Physics Based Model

Posterior

Likelihood

Normalizing Constant

• Prior – Physics based model or expert opinion• Normalizing Constant : Useful in model selection• Likelihood: forward model and measurement data• Posterior: Integration of information from model

and experimental data• y: data• λ : parameter(s)

posterior

prior

q

likelihood

33

Bayes Factors for Model Selection

)|(

)|(

)(likelihood Marginal

)(likelihood MarginalBF

1

2

1

221 MP

MP

M

M

y

y

Compare two models M2 and M1 Using the Bayes Factor

BF 2log(BF) Strength of evidence<1 <0 Negative (Support M0)

1~3 0~2Barely worth mentioning

3~20 2~6 Positive20~150 6~10 Strong>150 >10 Very Strong

―Bayes Factors by Kass and Raftery, 1995

Candidate models

Model comparison

1θ

Model 1

P(y | M1)

2θ 3θ

Model 2 Model 3

P(y | M2) P(y | M3)BF21

Parameter estimation

BF32

34

Difficult Data Set #1

• NTIAC A9002(3)L

0 2 4 6 8 10 12 14 16 180

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

What’s going on here?

NTIAC, Nondestructive Evaluation (NDE) Capabilities Data Book 3rd ed., NTIAC DB-97-02, Nondestructive Testing Information Analysis Center, November 1997

35


• Example of using the wrong model.

-0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.750

10

20

30

40

50

60

70

80

90

100

0

10

20

30

40

50

60

70

80

90

100

ACTUAL CRACK LENGTH - (Inch)

PR

OB

AB

ILIT

Y O

F D

ET

EC

TIO

N (

%)

Data Set: A9002(3)LTest Object : 2219 Alu-minum, Stringer Stiff-ened PanelsCondition: After EtchMethod: Eddy Current, Raster Scan with Tooling Aid Operator: Combined, 3 Operators

Opportunities = 417Detected = 29990% POD = 0.374 in. (9.50 mm)False Calls = Not Docu-mented

------ PRED. POD X HIT / MISS DATA


36


• 2 parameter logit/probit

• Appears to show a90 and a90/95 values

0 2 4 6 8 10 12 14 16 18 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 2 4 6 8 10 12 14 16 18 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

a (mm)

â

a (mm)

â

37


• 3 parameter lower logit/probit

• Again, appears as if there are a90 and a90/95 values

0 2 4 6 8 10 12 14 16 18 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 2 4 6 8 10 12 14 16 18 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

a (mm)

â

a (mm)

â

38


• 3 parameter upper logit/probit

0 2 4 6 8 10 12 14 16 18 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 2 4 6 8 10 12 14 16 18 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

a (mm)

â

a (mm)

â

39


• Case study 4 parameter probit

-40 -20 00

500

1000

1500

2000

2500

3000

3500

4000

4500intercept

0 20 400

500

1000

1500

2000

2500

3000

3500

4000

4500slope

0 0.50

500

1000

1500

2000

2500lower asymptote

0.70.80.90

500

1000

1500

2000

2500

3000

3500

4000upper asymptote

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

a (mm)

â

40


• 4 parameter Logit is most likely

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

a (mm)

â

41


• Summary of Resultsintercept slope lower upper ML a90 a90/95

2 parameter Logit -1.6645 1.7257 3.70E-97 9.4148 12.5552 parameter Probit -0.8476 0.9242 9.08E-98 10.0156 13.49933 parameter lower bound Logit -1.8501 1.7485 0.0898 7.29E-983 parameter lower bound Probit -1.0195 0.9616 0.1098 1.02E-983 parameter upper bound Logit -5.4408 5.5486 0.8478 3.27E-933 parameter upper bound Probit -2.788 2.9377 0.8443 1.64E-934 parameter Logit -13.7647 12.2874 0.175 0.8307 7.24E-924 parameter Probit -9.8542 8.674 0.1864 0.8282 2.49E-92

42


• Example of using the wrong model.

• Note: MH1823 Software Produce Numerous Warnings.

-0.050.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.750

10

20

30

40

50

60

70

80

90

100

0

10

20

30

40

50

60

70

80

90

100

ACTUAL CRACK LENGTH - (Inch)PR

OB

AB

ILIT

Y O

F D

ET

EC

TIO

N (

%)

Data Set: D8001(3)LTest Object : 2219 Aluminum, Stringer Stiffened PanelsCondition: As Ma-chinedMethod: Ultra-sonic, Hand Scan, Shear Wave Operator: Com-bined, 3 Operators

Opportunities = 417Detected = 33090% POD = 0.340in. (8.63mm)False Calls = Not Documented

------ PRED. POD X HIT / MISS DATA


What’s going on here?

43


• 2 parameter logit/probit

• Appears to that a90 and a90/95 values exist.

a (mm)

â

a (mm)

â

0 2 4 6 8 10 12 14 16 18 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 2 4 6 8 10 12 14 16 18 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

44


• 4 parameter probit

• a90 and a90/95 value doesn’t exist

a (mm)

â

0 1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

45


• Which model is correct?

• Log Marginal Likelihoods and Bayes factors

Model type logit probit

Bayes factor

logit/probit /2-parameter

(logit)

/2-parameter

(probit)

2-parameter –200.16 –201.63 1.47 ——— ———

3-parameter lower

bound –203.86 –203.49 –0.37 –3.7 –1.86

3-parameter upper

bound –189.30 –189.00 –0.30 10.86 12.63

4-parameter –188.89 –185.12 –3.76 11.27 16.51

46

Small Data Set

• A great example where the last procedure fails

• Small data sets do not cause any warnings with

standard software.

47

Small Data Set

• 4 parameter model

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

a (inches)

â

48

Small Data Set

• Summary for Small data setMaus intercept slope lower upper ML a90 a90/952 parameter Logit 27.1517 12.6708 8.87E-04 0.1405 0.1829 4.51E+012 parameter Probit 22.6165 10.577 2.60E-03 0.1329 0.159 8.17E+023 parameter lower bound Logit 24.0452 11.6731 0.0719 9.67E-063 parameter lower bound Probit 24.5728 12.1283 0.0705 9.16E-063 parameter upper bound Logit 27.6386 12.5028 0.7967 2.29E-043 parameter upper bound Probit 22.0074 9.8899 0.7917 4.87E-054 parameter Logit 24.8792 11.3063 0.0706 0.7926 1.97E-054 parameter Probit 23.7871 10.6664 0.0711 0.7781 3.18E-06

IMTT2 parameter Logit 25.5467 9.8023 3.30E-03 0.0941 0.1248 1.52E+022 parameter Probit 19.667 7.4983 4.40E-03 0.0873 0.1139 7.99E+033 parameter lower bound Logit 28.5743 11.3208 0.1273 2.07E-043 parameter lower bound Probit 23.2202 9.2585 0.1391 1.58E-043 parameter upper bound Logit 24.5688 9.2608 0.9055 6.34E-063 parameter upper bound Probit 20.405 7.6861 0.9041 3.10E-054 parameter Logit 25.3354 9.9529 0.1263 0.9067 2.17E-054 parameter Probit 26.4209 11.153 0.1679 0.8884 5.51E-07

49

Conclusion

• It sometimes appears (and is desirable) that there is a

systematic procedure that will automatically determine the

best model, but this actually isn’t the case.

• Bayes Factors provide useful approach to evaluate the

best model

• However, an example with a small data set showed that

even the Bayes factor procedure can lead one to a wrong

conclusion– It doesn’t tell you to stop and not perform an analysis– Need to look at data and perform ‘diagnostics’

• Bottom line – Procedures don’t replace statisticians.

50

• C-5 Wing Splice Fatigue Crack Specimens:– Two layer specimens are 14" long and 2" wide,– 0.156" top layer, 0.100" bottom layer– 90% fasteners were titanium, 10% fasteners were steel– Fatigue cracks position at 6 and 12 o’clock positions – Crack length ranged from 0.027" – 0.169“ (2nd layer)– vary: location of cracks – at both 1st and 2nd layer

• AFRL/UDRI Acquired Data (Hughes, Dukate, Martin)

A1-16C0

100 200 300 400 500 600 700

20406080

100120 0

2

4

A1-16C0

100 200 300 400 500 600 700

20406080

100120

-2

0

2

0.110" 0.107"0.110" 0.107"

x

z

b

a

2nd layer – corner crack

zb

a

1st layer – corner crack

Model-Assisted POD

51

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2-0.02

0

0.02

0.04

0.06

0.08

0.1

crack length (in)

mea

sure

men

t re

spon

se (

V)

model-cornermodel-throughexp.

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2-0.02

0

0.02

0.04

0.06

0.08

0.1

mea

sure

men

t re

spon

se (

V)

crack length (in)

modelexp.

MAPOD

B) 2nd layer – faying surface – corner / through cracksA) 1st layer – faying surface – corner cracksx

zb

a

x

zb

a

x

z

b

a

x

z

a

• Perform simulated studies: Compare with experimental results

• Bayesian methods can assist in determining best model.

52

0 0.05 0.1 0.15 0.2 0.250

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1MAPODexp.

Demonstration of model-assisted probability of detection (MAPOD)

Experimental Comparison with Full Model-Assisted

Successes:• First demonstration of (MAPOD)

in the literature for structural problem.

• Eddy current models were able to simulate eddy current inspection of 2nd layer fatigue cracks around fastener holes.

.

experimental POD

full model-assisted POD

qq

zx

zx

layerslayers

a

b

zy

corner crack

fastener site x

probe

corner crack

y

Knopp, Aldrin, Lindgren, and Annis, “Investigation of a model-assisted approach to probability of detection evaluation”, Review of Progress in Quantitative Nondestructive Evaluation, (2007)

crack length (in)

PO

D

2nd layer – faying surface – corner & through cracks

MAPOD

53

Heteroscedasticity

• â vs a analysis• Berens, A.P and P.W. Hovey, “Flaw Detection Reliability Criteria, Volume I – Methods and Results,” AFWAL-

TR-84-4022, Air Force Wright Aeronautical Laboratories, Wright-Patterson Air Force Base, April 1984 (â vs a

analysis is always more advantageous than hit/miss because much more information is available, but

hit/miss is used much more in practice)

• Berens, A.P., NDE Reliability Data Analysis, American Society for Metals Handbook Nondestructive

Evaluation and Quality Control, Vol 17, pp. 689-701, ASM International, 1989. (classic reference on the

subject, still standard today)

• MIL-HDBK-1823 (1999) – (Guidance for POD studies based on the methods described by Berens and Hovey)

• Box Cox transformations• Kutner, Nachtsheim, Neter, and Li, “Applied Linear Statistical Models”, (2005)

54

Heteroscedasticity

• â vs a assumes homoscedasticity, and if that

assumption is violated, one must resort to hit/miss

analysis. This was the case for an early MAPOD

study (Knopp et al. 2007)

• Box-Cox transformation can remedy this problem.

0 1 2 3 4 5

-0.02

0.00

0.02

0.04

0.06

0.08

0.10

sig

na

l re

sp

on

se

â

crack size (mm)

55

Heteroscedasticity

• Box Cox transformation according to Kutner et al.

• Note: Not to be used for nonlinear relations.

• Box-Cox identifies transformations from a family of

power transformations.

• The form is:

• Some common transformations

ââ

25.0

05.00.1

2ââ

ââ

ââ elog

ââ

1

ââ

1

56

Heteroscedasticity

• New regression model with power transform:

• λ needs to be estimated. Box-Cox uses maximum likelihood.

• I use Excel’s Solver to do a numerical search for potential λ values.

• Standardize observations so that the magnitude of the error sum of squares does not depend on the value of λ.

• c is the geometric mean of the observations

• Next step is to regress g on a for a given λ and calculate SSE.

iii aâ 10

),1(1

1

ii â

cg

,ln( ii âcg

niâc /1

0

0

57

Heteroscedasticity

• The value of λ that minimizes SSE is the best

transformation.

• This procedure is only a guide, and a high level of

precision is not necessary.

• For this data set, λ = 0.45

0 1 2 3 4 5

0.040.060.080.100.120.140.160.180.200.220.240.260.280.300.320.340.360.380.400.42

â t

ran

sfo

rme

d la

mb

da

= 0

.45

crack size (mm)0 1 2 3 4 5

0.00

0.02

0.04

0.06

0.08

0.10

0.12

â +

0.0

2

crack size (mm)

58

Heteroscedasticity

• Box Cox – POD curve associated with λ = 0.45

transform.

size, a (mm)0 1 2 3 4

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0p

rob

ab

ility

of d

ete

ctio

n, P

OD

| a

a50

a90a50 a90 95a50a50a50a50a50a50a50a50a50a50a50a50a50

59

Heteroscedasticity

• Box Cox transformation – square root transform

0 1 2 3 4 50.000.020.040.060.080.100.120.140.160.180.200.220.240.260.280.300.320.340.36

â t

ran

sfo

rme

d la

mb

da

= 0

.5

crack size (mm)

size, a (mm)

resp

on

se, â

0 1 2 3 4

0.00

0.05

0.10

0.15

0.20

0.25

0.30

60

Heteroscedasticity

• POD result for square-root transform

size, a (mm)0 1 2 3 4

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

pro

ba

bili

ty o

f de

tect

ion

, PO

D |

aa50

a90a50 a90 95a50a50a50a50a50a50a50a50a50a50a50a50a50

61

Summary

• Box-Cox enables â vs a analysis for data sets where

the variance is not constant but has some

relationship with the independent variable such as

crack size.analysismethod

λ leftcensor

detectionthreshold

falsecalls

a90 (mm) a90/95 (mm) a90 - a90/95 % difference

1st order linear 0.45 0.13 0.23 0 2.176 2.327 6.9%

1st order linear 0.5 0.14 0.195 1 2.102 2.257 7.3%

1st order linear 0.5 0.195 0.195 1 2.269 2.53 11.5%

2nd order linear 0.5 .14 0.195 1 2.277 2.472 8.5%

2nd order linear 0.5 0.195 0.195 1 2.197 2.428 10.5%

hit/miss 1 0.187 1 1.72 2.04 18.6%

hit/miss 1 0.162 11 1.498 1.907 27.3%

62

Physics-Inspired Models

• MAPOD – idea is to use simulation to reduce time and

cost of POD studies.

• Properly integrating simulation and experiment is an

enormous task.

• Intermediate step is to use models to inspire the

functional form of the regression model.

63

Physics-Inspired Models - literature

• R.B. Thompson and W.Q. Meeker, “Assessing the POD of Hard-Alpha Inclusions from Field Data”, Review of Progress in QNDE,

Vol. 26, AIP, pp 1759-1766, (2007). (Example where kink regression is used to distinguish between Raleigh scattering at small flaw

sizes and regular scattering at larger sizes)

Figure from http://www.tc.faa.gov/its/worldpac/techrpt/ar0763.pdf

64

Physics-Inspired Models

• Simulation and Experiment

• Visual inspection reveals that a 2nd order linear model may fit the

data better than the standard â vs a analysis.

• Evidence beyond visual: 1) p-value for a2 is 0.001 and adjusted

R-square value increases slightly with inclusion of a2.

-0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

-0.02

0.00

0.02

0.04

0.06

0.08

0.10

-0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

-0.02

0.00

0.02

0.04

0.06

0.08

0.10

Experiment

Simulation

0 1 2 3 4 5

0 1 2 3 4 50.000.020.040.060.080.100.120.140.160.180.200.220.240.260.280.300.320.340.36

qu

ad

rati

c m

od

el

65

Physics-Inspired Models – parallel work

• Recent unpublished work by Li, Nakagawa, Larson,

and Meeker.

http://www.stat.iastate.edu/preprint/articles/2011-05.pdf

66

Summary

• Physics model hopefully provides functional form of

the response, and this knowledge can be used in the

initial DOE for a POD study.

• Physics-Inspired model concept is a first step in

using physics models for making inference on

reliability.

• Confidence bounds calculation on models more

complicated than â vs a is an open problem,

especially transforming to probability of detection

curve.

67

Bootstrap Methods

• Confidence bound calculations are complicated and

only available for hit/miss and â vs a analysis.

• More complicated models require new method.

• Bootstrap methods are simple and flexible enough to

provide confidence bounds for a wide variety of

models.

68

Bootstrap Methods - literature

• Efron, B., and Tibshirani, R. J., An Introduction to the Bootstrap,

Chapman & Hall, New York, NY, 1993.

• C.C. McCulloch and J. Murphy, “Local Regression Modeling for

Accurate Analysis of Probability of Detection Data”, Mat. Eval.,

Vol. 60, no. 12, pp. 1438-1143, (2002) (A rare example of

bootstrapping used in NDE context)

• Amarchinta, Tarpey, and Grandhi, “Probabilistic Confidence

Bound Framework for Residual Stress Field Predictions”, 12th

AIAA Non-Deterministic Approaches Conference, AIAA-2010-

2519, Orlando, FL, (2010).

69

Bootstrap Methods

• Bootstrap procedure is simply to sample with

replacement and generate a POD curve each time.

• Sort all of the a90 values in ascending order and look

at the value in the 95th percentile to determine a90/95

• Example for the previous transformed data set with λ

= 0.5

a90 a90/95

Wald Method 2.102 mm 2.257 mm

Bootstrap 1,000 2.096 mm 2.281 mm

Bootstrap 10,000 2.099 mm 2.299 mm

Bootstrap 100,000 2.099 mm 2.297 mm

70

Summary

• Bootstrapping is beautiful.

• 1,000 samples probably sufficient, but 100,000 isn’t that difficult.

• Some interesting formal work could be done to look at the

influence of censoring, which is probably beyond the scope of

this work.

• Results seem to indicate the 2nd order model (which I think is the

best) is the most conservative.

• Further investigation of censoring planned.analysismethod

λ leftcensor

detectionthreshold

falsecalls

a90 (mm) a90/95 (mm) a90 - a90/95 %

difference

1st order linear 0.45 0.13 0.23 0 2.176 2.327 6.9%1st order linear 0.5 0.14 0.195 1 2.102 2.257 7.3%1st order linear 0.5 0.195 0.195 1 2.269 2.53 11.5%2nd order linear 0.5 .14 0.195 1 2.277 2.472 8.5%2nd order linear 0.5 0.195 0.195 1 2.197 2.428 10.5%hit/miss 1 0.187 1 1.72 2.04 18.6%hit/miss 1 0.162 11 1.498 1.907 27.3%

71

Summary

• Hit/miss analysis – MCMC

• â vs a analysis – unchanged

• Higher order / complex models – bootstrapping

• Methods presented for putting confidence bounds on are not elegant by any stretch of the imagination, but incredibly robust and useful.

• Much work needs to be done via simulation to move these methods into practice.

• UQ – Progress made on uncertainty propagation.

• UQ – Bayesian calibration techniques being investigated.

72

Efficient Uncertainty Propagation

• Deterministic simulations are very time consuming.

• NDE problems require stochastic simulation if the

models are to truly impact analysis of inspections.

• Need modern uncertainty quantification methods to

address this problem.

Eddy Current NDE

Model[Stochastic]

n

2

1

Z~

73

Uncertainty Propagation Methods:• Monte Carlo

• Latin Hypercube (Sampling Methods)

• FORM/SORM

• Full Factorial Numerical Integration

• Univariate Dimension Reduction

• Karhunen–Loève Expansion / ANOVA (High Dimension Problems)

• Polynomial Chaos Expansion (Intrusive)

• Probabilistic Collocation Method (Non Intrusive)

Efficient Uncertainty Propagation

74

Uncertainty Propagation

• Motivation: Model evaluations are computationally expensive.

There is a need for more efficient methods than Monte Carlo

• Objective: Efficiently propagate uncertain inputs through “black

box” models and predict output probability density functions.

(Non-intrusive approach)

• Approach: Surrogate models based on Polynomial Chaos

Expansions meet this need.

Eddy Current NDEModel

[Deterministic]

n

2

1

Z~

UniformX

NormalX

n ~

~1

Eddy Current NDE

Model[Stochastic]

? ~~Z

Input Parameters with Variation:

• Probe dimensions (Liftoff / tilt)

• Flaw characteristics (depth, length, shape)

75

Uncertainty Propagation

Uncertainty propagation for parametric NDE characterization problems:

• Probabilistic Collocation Method (PCM) approximates model response with

a polynomial function of the uncertain parameters.

• This reduced form model can then be used with

traditional uncertainty analysis approaches,

such as Monte Carlo.

Extensions of generalized polynomial chaos (gPC) to high-dimensional

(2D, 3D) damage characterization problems:• Karhunen-Loeve expansion• Analysis of variance (ANOVA) • Smolyak Sparse Grids

N 1

Critical Flaw Size

Key Damage and Measurement

States (e.g. crack length, probe liftoff)

Parameterized Flaw

Localization and Sizing

Full 3D Damage and Material State

Characterization

>1 >>1

N

iiicfZ

1

)()(ˆ xx

76

Approach (1): Karhunen-Loeve Expansion • Address stochastic input variable reduction when number of

random variables (N) is large.• Apply Karhunen-Loeve Expansion to map random variables into

a lower-dimensional random space (N').

Eddy Current Example: • Correlation function (covariance model)

defines random conductivity map,• Set choice of grid length to

– achieve model convergence and– eliminate insignificant eigenvalues

for reduced order conductivity map.

Uncertainty Propagation andHigh Dimensional Model Representation

N

nnnn

1

)()( xx )(x Karhunen-Loéve

Expansion

N ...1

covariance model

),( xx C

conductivity map withN random variables

reduced order conductivity map with N' random variables

N' random variables

Crystallites (Grains)

=2.2*106 S/m

Coil

77

Approach (2): Analysis of Variance (ANOVA) Expansion• Provides surrogate to represent high dimensional set of parameters• Analogous to ANOVA decomposition in statistics• Locally represent model output through

expansion at anchor point in -space

– Requires inverse problem

– Replace random surface with equivalent 'homogeneous' surface


ANOVAExpansion

N

nnnn

1


Expansion

N ...1

defined bycovariance

model),( xx C

conductivitymap withN randomvariables

reduced orderconductivity map withN' random variables

N' random variables(1) Identify

uniquesources

of variance

(2) Identify significant

factorsin model M ...1

M random variables

)(ξZ

N' >> MN >> N' ξξ

78

Approach (2): Analysis of Variance Expansion + Smolyak Sparse Grids• Significant computational expense for high-dimensional integrals• Can leverage sparse grids based on the Smolyak construction

[Smolyak, 1963; Xiu, 2010; Gao and Hesthaven, 2010]– Provides weighted solutions at specific nodes and adds them to

reduce the amount of necessary solutions– Sparse grid collocation provides subset of full

tensor grid for higher dimensional problems – Approach can also be applied to gPC/PCM


ANOVAExpansion

N

nnnn

1


Expansion

N ...1

defined bycovariance

model),( xx C

conductivitymap withN randomvariables

reduced orderconductivity map withN' random variables

N' random variables(1) Identifyuniquesources

of variance

(2) Identify significant

factorsin model M ...1

M random variables

)(ξZ

N' >> MN >> N' ξξ

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1Sparse Grid and Full Tensor Product Grid

79

All Models Are Wrong

• “All models are wrong, and to suppose that inputs should

always be set to their ‘true’ values when these are

‘known’ is to invest the model with too much credibility in

practice. Treating a model more pragmatically, as having

inputs that we can ‘tweak’ empiracally, can increase is

value and predictive power” (Kennedy O’Hagan 2002)

• Eddy current liftoff is a particularly great example of this.

80

Bayesian Analysis

• Bayesian Model Averaging (BMA) – Used when experts provide competing models for the same system.

• Bayesian calibration is the most promising technical option for integrating experimental data and simulation in a rigorous way that accounts for all sources of uncertainty.

• Kennedy / O’Hagan paper 2001 inspired many efforts in this direction. BTW, rejected by Journal of the American Statistical Association. Now published in Journal of the Royal Statistical Society: Series B, and has been referenced 620 times. Add that to the number of references to the unpublished technical report that was rejected by JASA, and you get a large number.

• Many efforts ongoing in UQ community

81

Bayesian Calibration

• What uncertainty needs to be quantified to go from the

simulator to reality?– Input– Propagation from input to output (Hopefully done in previous

section, but notice no uncertainty is actually quantified in this part)

– Code– Discrepancy

82


• Terminology– Model: set of equations that describes some real world

phenomena.– Simulator: Executes the model with computer code.– Calibration parameters: θ– Controlled input variables: x

83


• Simulator: y = f(x,θ)

• Observations: observations = reality(control variables) +

ε, where ε is observation error

• Reality doesn’t depend on calibration parameters.

• Typically you see: observations = f(x,θ) + ε

• This is wrong, mainly because it doesn’t account for

uncertainty in θ, and the ε’s are not independent.

• Bayesian methods are used to learn about uncertainty in

θ.* Paraphrasing discussion with Tony O’Hagan

84

Bayesian Calibration - literature

• Kennedy, M. C. and O’Hagan, A., “Bayesian calibration of computer models,” J. R. Statist. Soc. B, Vol. 63,

pp. 425–464, (2001).

• Park, I., Amarachinta, H. K., and Grandhi, R. V., “A Bayesian approach to quantification of model

uncertainty,” Reliability Engineering and System Safety, Vol 95, pp. 777-785, (2010)

85

Summary

• Hit/miss analysis – MCMC

• â vs a analysis – unchanged

• Higher order / complex models – bootstrapping

• Methods presented for putting confidence bounds on are not elegant by any stretch of the imagination, but incredibly robust and useful.

• Much work needs to be done via simulation to move these methods into practice.

• UQ – Progress made on uncertainty propagation.

• UQ – Bayesian calibration techniques being investigated.

1 integrity service excellence anomalous events in non-destructive inspection data 18 dec 2012...

Documents

structural integrity

retirement slide

us air force

structural safety

results of inspection

control structural fatigue

structural problems

aircraft fleet