fabrizio ruggeri istituto di matematica applicata e

Post on 30-Jun-2022

7 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Some issues in Bayesian industrial statistics

Fabrizio Ruggeri

Istituto di Matematica Applicata e Tecnologie InformaticheConsiglio Nazionale delle RicercheVia Bassini 15, I-20133, Milano, Italiafabrizio@mi.imati.cnr.it

www.mi.imati.cnr.it/˜fabrizio/

1

OUTLINE OF THE TALK

• Bayesian statistics

• Preventive maintenance in dry etching

• Possible models

– Wavelets

– Markov chain

– Poisson process

• Their application in industrial problems

2

BAYESIAN STATISTICS

Bayesian statistics is . . .

• . . . another way to make inference and forecast on population features(practitioner’s view)

• . . . a way to learn from experience and improve own knowledge(educated layman’s view)

• . . . a formal tool to combine prior knowledge and experiments(mathematician’s view)

• . . . cheating(hardcore frequentist statistician’s view)

• . . .

3

BAYES THEOREM - EXAMPLE 1Examples partly based on Suhir (2000), International Journal of Microcircuits and Elec-tronic Packaging, 23, 215-223

• The estimated probability that the thermally induced bow of a Printed Circuit Board(PCB), manufactured at a given factory, meets the specification requirement is known

• A series of control tests was carried out to determine whether the given batch ofPCB’s meets the specification. However, due to the test equipment employed, thetest results are not absolutely certain, but the probabilities that a PCB can pass thetest have been established for both PCB meeting or not meeting the specification

• The given PCB passed the control tests

• What is the probability that the PCB meets the specification?

4

BAYES THEOREM - EXAMPLE 1

• H1 = {the given PCB meets the specification} : P (H1) = 0.96

• H2 = {the given PCB does not meet the specification} : P (H2) = 0.04

• A = {the given PCB passes the control tests}– If H1 is fulfilled ⇒ P (A|H1) = 0.98

– If H2 is fulfilled ⇒ P (A|H2) = 0.05

• A occurs: ⇒ P (H1|A)?

5

BAYES THEOREM - EXAMPLE 1

P(H1|A) =P(H1

A)

P(A)=

P(A|H1)P(H1)

P(A|H1)P(H1) + P(A|H2)P(H2)

=.98 · .96

.98 · .96 + .05 · .04 = .998

Passing control test updates belief on PCB meeting the specification:from 96% to 99.8%

Prior opinion updated, through experiment, into posterior one

6

BAYES THEOREM - EXAMPLE 2

• A bi-material assembly is subject to temperature change and experiences thermallyinduced loading

• It has been predicted (for instance, on the basis of accelerated testing) that theprobabilities that the constituent materials, M1 and M2, will not fail during the lifetimeof the assembly are p1 and p2, respectively

• The field report indicated that the assembly failed

• The details were not reported, however, and are not available

• What is the probability that it was the material M1 that failed?

7

BAYES THEOREM - EXAMPLE 2

• 4 possible hypotheses

– H0: none of the materials will fail

– H1: only the first material will fail

– H2: only the second material will fail

– H3: both materials will fail

• their probabilities (assuming independence between failure of M1 and failure of M2)

– P0 = p1p2

– P1 = (1 − p1)p2

– P2 = p1(1 − p2)

– P3 = (1 − p1)(1 − p2)

• Event A = {assembly fails} occurs

• Interest in P (H1|A)

8

BAYES THEOREM - EXAMPLE 2

• P (A|H0) = 0, P (A|H1) = P (A|H2) = P (A|H3) = 1

• Bayes theorem

P(H1|A) =P(A|H1)P (H1)

∑4j=1 P(A|Hj)P (Hj)

=(1 − p1)p2

(1 − p1)p2 + p1(1 − p2) + (1 − p1)(1 − p2)

=(1 − p1)p2

(1 − p1p2)

• p1 = 2p2 = 0.99 ⇒ P(H1|A) = 0.00971

• p1 = p2 = 0.99 ⇒ P(H1|A) = 0.4975

• (very reliable) p1 = p2 ≈ 1 ⇒ P(H1|A) ≈ 0.5

• (very unreliable) p1 = p2 ≈ 0 ⇒ P(H1|A) ≈ 0

• Results heavily affected by prior opinions

• Importance of reliable experts’ assessments

9

ASSESSMENT OF PRIOR PROBABILITIES

S1 = {M1 will fail during the lifetime of the assembly}S2 = {M2 will fail during the lifetime of the assembly}

P(S1⋃

S2) = .2, P(S1) = .3, P(S2) = .05, P(S1⋂

S2) = .1

10

ASSESSMENT OF PRIOR PROBABILITIES

S1 = {M1 will fail during the lifetime of the assembly}S2 = {M2 will fail during the lifetime of the assembly}

P(S1⋃

S2) = .2, P(S1) = .3, P(S2) = .05, P(S1⋂

S2) = .1

• P(S1⋃

S2) ≥ P(S1)

• P(S2) ≥ P(S1⋂

S2)

11

ASSESSMENT OF PRIOR PROBABILITIES

S1 = {M1 will fail during the lifetime of the assembly}S2 = {M2 will fail during the lifetime of the assembly}

P(S1⋃

S2) = .3, P(S1) = .2, P(S2) = .2, P(S1⋂

S2) = .15

12

ASSESSMENT OF PRIOR PROBABILITIES

S1 = {M1 will fail during the lifetime of the assembly}S2 = {M2 will fail during the lifetime of the assembly}

P(S1⋃

S2) = .3, P(S1) = .2, P(S2) = .2, P(S1⋂

S2) = .15

• .3 = P(S1⋃

S2) = P(S1) + P(S2) − P(S1⋂

S2) = .25

• P(S1⋃

S2) = .3, P(S1) = .2, P(S2) = .2, P(S1⋂

S2) = .1

• Prior elicitation follows rules (e.g. probability)

13

EXAMPLE 2 MODIFIED

• In Suhir’s paper, p1 and p2 (probabilities that the constituent materials, M1 and M2,will not fail during the lifetime) were assumed known

• Here we suppose them unknown and we want to learn about them through

– experiment

– experts’ opinions

• We assume only two possible events over the useful life of the bi-material assembly

– Failure ⇒ X = 1 s.t. P(X = 1) = θ

– No failure ⇒ X = 0 s.t. P(X = 0) = 1 − θ

• We observe the behaviour of n items

14

BASICS ON FREQUENTIST STATISTICS

Failure of assembly over useful life⇒ X ∼ Bern(θ) & f(x; θ) = θx(1 − θ)1−x x = 0,1 and 0 ≤ θ ≤ 1

• Sample X = (X1, . . . , Xn), i.i.d. Bern(θ)

• Likelihood lx(θ) =∏n

i=1 f(Xi; θ) = θ∑n

i=1 Xi(1 − θ)n−∑ni=1 Xi

• MLE: θ̂ =∑n

i=1 Xi/n, C.I., UMVUE, consistency, etc.

What about available prior information on assembly behaviour?How can we translate it? ⇒ model and parameter

15

BAYESIAN UPDATE

Parameter as a random quantity

• Its physical meaning: θ = E(X) or θ = P(X = 1) (failure probability)

• Assessment of prior beliefs on it, e.g. most likely value mode

• Search of a suitable prior distribution π(θ): θ ∼ Be(α, β)

• Update prior opinion (Bayes Theorem): π(θ|X) =lx(θ)π(θ)

lx(u)π(u)du

⇒ θ|X ∼ Be(α +∑n

i=1 Xi, β + n − ∑ni=1 Xi)

• Statistical analysis through posterior distribution π(θ|X)

16

BASICS ON BAYESIAN STATISTICS

The posterior distribution π(θ|X) is all we need to

• estimate the parameter θ (in general, with the posterior mean)

• build credible intervals

• test hypotheses

• forecast future observations

17

PARAMETER ESTIMATION

• Loss function L(θ, a), a ∈ A action space

• Minimise Eπ(θ|X)L(θ, a) =∫

L(θ, a)π(θ|X)dθ w.r.t. a

⇒ θ̂ Bayesian optimal estimator of θ

– θ̂ posterior median if L(θ, a) = |θ − a|

– θ̂ posterior mean Eπ(θ|X)θ if L(θ, a) = (θ − a)2

Eπ(θ|X)L(θ, a) =

(θ − a)2π(θ|X)dθ

=∫

θ2π(θ|X)dθ − 2a∫

θπ(θ|X)dθ + a2 · 1

=

θ2π(θ|X)dθ − 2aEπ(θ|X)θ + a2

18

PARAMETER ESTIMATION

• Failure probability: posterior mean θ̂ =α +

∑ni=1 Xi

α + β + n⇒ compare with

– prior meanα

α + β

– MLE

∑ni=1 Xi

n

• Posterior belief as mixture of prior belief and experimental evidence

• MAP (Maximum a posteriori)

⇒ θ̂ =α +

∑ni=1 Xi − 1

α + β + n − 2

• Possible different estimators

19

PARAMETER ESTIMATION

• Prior Be(sα, sβ)

– Prior mean:α

α + βfor all s > 0

– Prior variance: V =αβ

(α + β)2(sα + sβ + 1)

∗ decreasing in s: lims→∞

V = 0 and lims→0

V =αβ

(α + β)2

• Posterior mean θ̂ =sα +

∑ni=1 Xi

sα + sβ + n

– lims→∞

θ̂ =α

α + β(prior mean)

– lims→0

θ̂ =

∑ni=1 Xi

n(MLE)

• Influence of prior belief

20

CREDIBLE INTERVALS

• P(θ ∈ A|X), credible (and Highest Posterior Density) intervals

• Compare with confidence intervals

• Failure probability:

P(θ ≤ z|X) =

∫ z

0

1

B(α +∑

Xi, β + n − ∑

Xi)θα+

Xi−1(1 − θ)β+n−∑

Xidθ

21

HYPOTHESIS TESTING

• One sided test: H0 : θ ≤ θ0 vs. H1 : θ > θ0

⇒ Reject H0 iff P(θ ≤ θ0|X) ≤ α, α significance level

• Two sided test: H0 : θ = θ0 vs. H1 : θ 6= θ0

– Do not reject if θ0 ∈ A, A 100(1 − α)% credible interval

– Consider P([θ0 − ǫ, θ0 + ǫ]|X)

– Dirac measure: P(θ0) > 0 and consider P(θ0|X)

22

PREDICTION

• Prediction P(Xn+1|X) =∫

P(Xn+1|θ)π(θ|X)dθ

• Failure probability:Xn+1|θ ∼ Bern(θ), θ|X ∼ Be(α +

Xi, β + n − ∑

Xi)

• fXn+1(1|X) =

α +∑

Xi

α + β + n

• fXn+1(0|X) =

β + n − ∑

Xi

α + β + n

23

MODEL SELECTION

Compare M1 = {f1(x|θ1), π(θ1)} and M2 = {f2(x|θ2), π(θ2)}

• Bayes factor

⇒ BF =

f1(x|θ1)π(θ1)dθ1∫

f2(x|θ2)π(θ2)dθ2

• Posterior odds

⇒ P(M1|data)

P(M2|data)=

P(data|M1)

P(data|M2)· P(M1)

P(M2)= BF · P(M1)

P(M2)

24

PRIOR CHOICE

• θ ∼ Be(α, β) ⇒ θ|X ∼ Be(α +∑n

i=1 Xi, β + n − ∑ni=1 Xi) (Conjugate prior)

• α and β chosen to match, e.g.

– meanα

α + βand variance

αβ

(α + β)2(α + β + 1)(or mode)

– Ideal experiment with α = sum of failures and β = sum of no failures

– two quantiles, with a third given for consistency

Does the Beta distribution reflect actual prior knowledge?

What about other priors (e.g. triangular)?

• Markov Chain Monte Carlo

• Bayesian robustness

25

PREDICTIVE MAINTENANCE IN DRY ETCHINGWork in progress with Borgoni, Deldossi, Radaelli and Zappa

• The electrostatic chuck (ESC) is a device that helps in cooling silicon wafers duringthe chip manufacturing process and it electrostatically clamps onto a silicon wafer,allowing the temperature to be controlled, at similar values, by the addition of heliumgas between the ESC ceramic and the wafer backside

• The ESC is degraded over time by the action of aggressive plasmas produced insidethe etch chamber through the application of radiofrequency power, which consumenot only the materials on the wafer but also the parts of the chamber exposed to theplasma

• When certain critical locations of the ESC have been eroded by the plasma, the ESCfails to hold the wafer close to the ESC ceramic, and an unusually high flow rate ofhelium is observed because of the large wafer-ESC space and, finally, the ESC mustbe replaced because the supplied helium is no longer controlling the wafer surfacetemperature as desired

• Goal: to predict when the lifetime of the ESC will be reached and to predict when theflow rates will likely exceed the alarm threshold

26

PREDICTIVE MAINTENANCE IN DRY ETCHING

(Modified) Helium flow rate vs Radiofrequency hours

27

PREDICTIVE MAINTENANCE IN DRY ETCHING

Possible approaches

• Signal smoothing

• Markov model

• Poisson process

28

SIGNAL SMOOTHINGWork with Cutillo, Jung and Vidakovic

AFM (atomic force microscopy) measures adhesion strength between two materials at thenanonewton scale

0 0.2 0.4 0.6 0.8 1−0.05

0

0.05

0.1

0 0.2 0.4 0.6 0.8 1−0.05

0

0.05

0.1

0 0.2 0.4 0.6 0.8 1−0.05

0

0.05

0.1

Original AFM measurements at Georgia Tech (top), and denoised signals using Bayesianmethods for wavelets (middle and bottom)

29

SMOOTHING - WAVELETSEquispaced points x1, . . . , xN , N(= 2n)

yi = f(xi) + ηi, i = 1, N

• yi: noisy measurement

• f : unknown signal

• ηi: random noise, i.i.d. N (0, σ2)

Apply (discrete) wavelet transform W

⇒ di = θi + εi, i = 1, N

[y → d = Wy, f → θ = Wf, η → ε = Wη]

Estimate θi

⇒ θ̂ and f̂ = W−1θ̂

30

SMOOTHING - BAYESIAN MODELING

Model for d = θ + ε

d|θ, σ2 ∼ N (θ, σ2)

Many choices of priors on (θ, σ2)

• σ2 ∼ E(µ) ⇒ d|θ ∼ DE(

θ,1/√

2µ)

• θ ∼ DE(0, τ)

• δ(d) =τ(τ2 − 1/(2µ))de−|d|/τ + τ2(e−|d|√2µ − e−|d|/τ)/µ

(τ2 − 1/(2µ))(τe−|d|/τ − (1/√

2µ)e−|d|√2µ)

Bayesian estimators are usually shrinkers (and smoothers)

Threshold is possible as well

31

MARKOV MODEL

• Helium flow as multiple of unit ∆h > 0, i.e. k∆h, k ∈ N

• Transitions from integer k to integer m

• ⇒ Markov chain with state space E = N

• Transition matrix P with probabilities pkm of moving from k to m, pos-sibly changing over time

• Replacement of ESC for high values of k and/or high jumps from k tom

32

SOFTWARE RELIABILITY

• Bugs in software induce failures

• Fixing current bugs sometimes implies introduction of new bugs

• Lack of knowledge about effects of bugs fixing

• ⇒ need for models allowing for possible, unobserved introduction of new bugs in acontext aimed to reduce bugs

• Software affects our life at a larger extent and its malfunctioning could be very harm-ful

• Goal: Detecting bad fixing of bugs

33

HIDDEN MARKOV MODEL

• Failure times t1 < t2 < . . . < tn in (0, y]

• Yt latent process describing reliability status of software at time t (e.g. growing,decreasing and constant)

• Yt changing only after a failure ⇒ Yt = Ym for t ∈ (tm−1, tm], m = 1, . . . , n + 1,with t0 = 0, tn+1 = y and Yt0 = Y0

• {Yn}n∈N Markov chain with

– discrete state space E

– transition matrix P with rows Pi = (Pi1, . . . , Pik), i = 1, . . . , k

34

HIDDEN MARKOV MODEL

• Interarrival times of m-th failure Xm|Ym = i ∼ E(λ(i)), i = 1, . . . , k, m = 1, . . . , n

• Xm’s independent given Y ⇒ f(X1, . . . , Xn|Y ) =

n∏

m=1

f(Xm|Y )

• Pi ∼ Dir(αi1, . . . , αik),∀i ∈ E, i.e. π(Pi) ∝k

j=1

Pαij−1ij

• Independent λ(i) ∼ G(a(i), b(i)),∀i ∈ E

• Interest in posterior distribution of Θ = (λ(k), P, Y (n))

– λ(k) = (λ(1), . . . , λ(k))

– Y (n) = (Y1, . . . , Yn)

35

SOFTWARE RELIABILITY - MUSA DATATime Series Plot of Failure Times

Period

0 20 40 60 80 100 120 140

0100

0200

0300

0400

0500

0600

0

Time Series Plot of Posterior Probabilities of Y(t)=1

Period

0 20 40 60 80 100 120 140

0.00.2

0.40.6

0.81.0

Longer failure times ⇒ higher Bayes estimator of probability of ”good” state

36

POISSON PROCESS

• Helium flow as multiple of unit ∆h > 0, i.e. k∆h, k ∈ N

• Transitions from integer k to integer m ≥ k

• ⇒ Poisson process counting the accumulated number of units

• Constant/faster/slower increase in accumulated units expressed byconstant/increasing/decreasing intensity function

• Replacement of ESC for high values of k and/or high jumps from k tom

37

REPAIRABLE SYSTEMS

• Renewal Process (“Good as new”)

– sequence of i.i.d. r.v.’s denoting time between two failures

• Nonhomogeneous Poisson Process (NHPP) (“Bad as old”)

– minimal repair of a small component in a complex system

– instantaneous repair

Nt ≡ N(0, t), t ≥ 0, NHPP with intensity function λ(t):

• λ(t) := lim∆→0

P{N(t, t + ∆] ≥ 1}∆

, ∀t ≥ 0

• Mean value function Λ(y, s) =

∫ s

y

λ(t)dt

• P{N(y, s) = k} =Λ(y, s)k

k!e−Λ(y,s),∀k ∈ N

• λ(t) ≡ λ ∀t ⇒ Homogeneous Poisson process (HPP)

38

POWER LAW PROCESSλ(t) = Mβtβ−1 & Λ(t) = Mtβ, M, β, t > 0

Some Beta’s and M=2t

lambd

a(t)

0.0 0.5 1.0 1.5 2.0 2.5 3.0

02

46

810

.511.523

β > 1 ⇒ reliability decay - β < 1 ⇒ reliability growth - β = 1 ⇒ constant reliability

⇒ Can compute P (β|data) < 1

39

GAS ESCAPE IN CAST-IRON PIPES

“Old” cast-iron is not aging ⇒ HPP in space and time

• Data: 150 failures in 6 years on a net ≈ 320 Km long

• Relevant covariates: diameter, location and depth (from EDA). Twolevels for each of them so that their combination ⇒ 8 classes

• 26 experts (4 in designing the network, 8 in locating failures and 14 inrepairing pipes) asked about the 8 classes

• Data and pooled opinions ⇒ posterior mean of failure rates

• Replacement plan based upon largest posterior mean of failure rates

40

ESTIMATES’ COMPARISON

• Location: W (under walkway) or T (under traffic)• Diameter: S (small, < 125 mm) or L (large, ≥ 125 mm)• Depth: N (not deep, < 0.9 m) or D (deep , ≥ 0.9 m)

Class MLE Bayes (LN ) Bayes (G) HierarchicalTSN .177 .217 .231 .170TSD .115 .102 .104 .160TLN .131 .158 .143 .136TLD .178 .092 .094 .142WSN .072 .074 .075 .074WSD .094 .082 .081 .085WLN .066 .069 .066 .066WLD .060 .049 .051 .064

Highest value; 2nd-4th values

• Location is the most relevant covariate• TLD: 3 failures along 2.8 Km but quite unlikely to fail according to the experts• LN and G ⇒ similar answers

41

GAS ESCAPES IN STEEL PIPES

• Data: 53 failures in 30 years on an expanding net, currently ≈ 380 Km long

• Network split into subnetworks based upon year of installation

• Independent PLP’s for each subnetwork

– different or equal (apart from starting point) λ(t)

– same, different or exchangeable parameters

• Superposition Theorem: Sum of independent NHPPs with intensity functions λi(t)is still a NHPP with intensity function λ(t) =

λi(t)

• Prior distribution on installation dates (when unkonwn) of failed pipes

42

GAS ESCAPES IN STEEL PIPES95% credible intervals for reliability measures:

• System reliability over 5 years: P{N(1998,2002) = 0} ⇒ [0.0000964,0.01]

• Expected number of failures in 5 years: EN(1998,2002) ⇒ [4.59,9.25]

• Mean value function (solid) vs. cumulative # failures (points)

!"#$

%"&'(#!$

)*+, )*+- )**, )**-

,),

.,

/,

0000

0

0

0

00000

0

0

0

00

00

000000

00

0

000

00

43

MISCELLANEA IN INDUSTRY AND BUSINESS

• Gas escapes in steel pipes ⇒ Bayesian nonparametrics (Gamma process)

• Train doors’s failure ⇒ Marked and Bivariate Poisson Processes

• Optimal number of beds in hospitals ⇒ Queues

• Project management and bidding in auctions for industrial plants ⇒ Dynamic linearmodels

• Human and organisational factors in accidents in the maritime industry ⇒ BayesianBelief Networks

• Calls to a call center ⇒ Poisson process and latent variables for calls unallocated tospecific campaigns

• Accidents in the construction sector ⇒ Poisson-gamma and dynamic models

• Usability diagnosis of web pages based on time analysis of clickstream data ⇒Markov chain

44

top related