vii) note that the origin of the weibull paper (y = 0) is located at p i = 0.6321 (sometimes called...

37
125 CHAPTER Statistical Techniques for Life Data Analysis 6 6.1 GENERAL The experimental data obtained on a large number of nominally identical specimens of insulation show a characteristic scatter, which are very large compared to normal or Gaussian data. Such data have been called as extremal data. To ascertain the quality (faithfulness) of extremal data is not easy. Another important aspect of data acquisition is the control exercised over extraneous experimental conditions and maintenance of constancy of the applied stress. However, in the foregoing analysis, it is assumed that the quality of data is reasonably good. The failure data can be processed in two different ways: (i) Graphical Method. (ii) Analytical Method. While the graphical method provides an analogue representation of data on a selected graph paper or probability paper, analytical procedure enables the computation of the required parametric estimates governing the proposed model. Although this method is simple and affords a pictorial representation of data, the analytical method has a overriding advantage of being accurate and offers several technical advantages as shall be described later. 6.1.1 Graphical Method The relevant graph paper is usually, if not always, graduated such that the independent variable is plotted on the X-axis (abscissa) and the dependent variable, on the Y-axis (the Ordinate). It is desirable, from statistical stand point, that all plots are markedly linear so that, interpolation and extrapolation of unknown data can be made with any certainty. Also operations with linear plots afford a higher order of reproducibility. In order to obtain such plots, the scales of the metric chosen for the ordinates and the abscissa are to be linearized. By-far the important need for linear plots arises, especially in life prediction due to the fact, that, the scales are almost always logarithmic or sometimes double-logarithmic. Hence, non-linearity of the graphs results in gross inaccuracies in estimation. It often happens that, validity of the model in question itself becomes doubtful if a linear transformation of variables cannot be effected. It is expected that the plotting positions are nearly linear so that inferences from probability plots can be made by an interpolation or an extrapolation procedure. If it is not so, depending

Upload: ngodang

Post on 09-Apr-2018

217 views

Category:

Documents


4 download

TRANSCRIPT

���

CHAPTER

����������� ����� ��������� �������������

��� ������

The experimental data obtained on a large number of nominally identical specimens of insulationshow a characteristic scatter, which are very large compared to normal or Gaussian data. Such datahave been called as extremal data. To ascertain the quality (faithfulness) of extremal data is noteasy. Another important aspect of data acquisition is the control exercised over extraneousexperimental conditions and maintenance of constancy of the applied stress. However, in theforegoing analysis, it is assumed that the quality of data is reasonably good.

The failure data can be processed in two different ways:

(i) Graphical Method.

(ii) Analytical Method.

While the graphical method provides an analogue representation of data on a selected graphpaper or probability paper, analytical procedure enables the computation of the required parametricestimates governing the proposed model. Although this method is simple and affords a pictorialrepresentation of data, the analytical method has a overriding advantage of being accurate andoffers several technical advantages as shall be described later.

����� ��������� ����

The relevant graph paper is usually, if not always, graduated such that the independent variable isplotted on the X-axis (abscissa) and the dependent variable, on the Y-axis (the Ordinate). It isdesirable, from statistical stand point, that all plots are markedly linear so that, interpolation andextrapolation of unknown data can be made with any certainty. Also operations with linear plotsafford a higher order of reproducibility. In order to obtain such plots, the scales of the metric chosenfor the ordinates and the abscissa are to be linearized. By-far the important need for linear plotsarises, especially in life prediction due to the fact, that, the scales are almost always logarithmic orsometimes double-logarithmic. Hence, non-linearity of the graphs results in gross inaccuracies inestimation. It often happens that, validity of the model in question itself becomes doubtful if alinear transformation of variables cannot be effected.

It is expected that the plotting positions are nearly linear so that inferences from probabilityplots can be made by an interpolation or an extrapolation procedure. If it is not so, depending

��� � �!�"!�!�#��$���!% � ��!���!&$�&%�'&( �� )*!'� $�

upon the degree of non-linearity conclusion as to the quality of data can be made. It is also possiblethat the ageing data may not conform to the assumed statistical model. In such a situation theexperimenter should exercise due care and caution to either rule out or confirm such possibilities.A point of particular importance is the error associated with extrapolation of the plots. This aspectshall be taken up in some detail later.

The parameter estimation by graphical methods is generally inaccurate. The degree ofuncertainty involved cannot be quantified using the graphical plots. This means that there is noobjective assessment of the data and the plot. Analytical methods have since been developed tonearly completely quantify the degrees of variability using such procedure as ConfidenceBounds (CB) or Confidence Limits (CL) or Confidence Intervals (CI) estimation. It is important toknow that the data fitted to distributions containing more than two parameters cannot be analyzedusing the graphical methods.

�� ��� ��� ���� ��������������

All probability papers are marked with the cumulative probability of the distribution of interest onthe Y-axis and the corresponding random variable on the Y-axis. The probability axis is graduatedfrom zero to unity, excluding the terminal values. The Y-axis of all papers are formally similar andare often mistaken unless they are specifically captioned with the cumulative distribution. Theabscissa is graduated on the basis of the random variable or, a function thereof. When the cumulativedistribution is a function in a closed form, the variables can be transformed so that Y-axis reads theprobability both as a percentage (non-linear scale ) as well as linear scale (Weibull distribution).The position of each point on a probability paper is called a plotting position. Normal and Log-Normal probability papers are graduated identically with respect to the cumulative probability(the probability axis is the same). On the X-axis, the random variable is marked either on a linearscale or on a logarithmic scale.

����� &�� ����������������'��+�+�����'�����,�'��������

Calculation of the probability of failure of extremal and Log-Normal data is not straight forward.The general characteristics of life data are the following:

(i) There is no possibility of grouping the data as in the case of Gaussian data.

(ii) Since the data is extremal, simple or weighted averaging cannot be made.

(iii) Often the data is scanty due to the fact that conducting ageing experiments, in which allspecimens are run to failure, takes inordinately long time.

In view of these there is a strong need for a different method for obtaining the probabilities.The concept of order statistics is used for this purpose.

A set of observations on a random variable ranked according to their chronological ascendingorder of occurrence are called ordered observations an ordered data set. Consider a random variable,X to represent an ordered set of mutually independent events, X = {xi}, i = 1, 2,......, n, having thesame density function.

����!��!-���� -.$!)* ��%&���!% �������$��#�!� ��/

F(x) = f xX

( )− ∞� dx (6.1)

We now introduce a new set of continuous random variables, x1′, x2′, x3′, … such that x1′< x2′ < x3′, …. and ignore the possibility of any two of these ordered random variables being equal.The mutually exclusive events xi’s are in this sense enjoy certain large sample properties [39]. Further,we may note that, (xr′ ≤ X) is the logical sum of the events:

Exactly j out of x1, x2, x3,…., xn are less than or equal to X for j = r, r + 1, r + 2, …, n so that;

pr[xr′ ≤ X] = nii r

nx x

jx x

n rC F x F xr r= = ′ = ′

−∑ −[ ) ] [ ) ]( (1 (6.2)

This summation can be performed and after simplification, one of the following expressionscan be obtained [45, 50]:

pi = i

n + 1(6.3)

pi = i

n

−������

12 (6.4)

where, i stands for j and pi, the cumulative probability of occurrence of ith random variable, isusually referred to as the plotting position.

Throughout this book, the expression for the order-statistic probability, given by theexpression 4.7 is used. It should however be remembered that, although, such measures as varianceand mean, remain invariant, the median and range of order statistics do differ from parentdistributions. For a given value of n, the sample size, the plotting position is a function of n only.

����� -��������������( �+��'��+�+�����'�� �

Consider the cdf of a 2-p-W. Referring to the Eqn. (6.32), F(t), can be linearized by taking doublelogarithm on both sides, so that,

ln {– ln(1 – F(t))} = β ln(t) – β ln(τ) (6.5)

Now, t is the dependent variable and the probability function is the independent variable,being the calculated value. So, we may write the transformed equation above as;

Y = mX + c (6.6)

where,

Y = ln ln)

11 −���

���

��

� �F t(

X = ln(t) (6.7)

m = β, and c = β ln(τ)

It is possible to assign desired values to F(t), less than unity and greater than zero so that theY-axis is fully determined. For example, using order statistics and knowing the number of datapoints n, F(t) can be identified with pi the order statistic probability.

��0 � �!�"!�!�#��$���!% � ��!���!&$�&%�'&( �� )*!'� $�

Tables 6.1 and 6.2 give pi for sample-sizes of ten and twenty just for illustration. The value ofthe linearized random variable, X is also included. Depending upon the sample-size, F(t) in Eqn. (6.7)is replaced by the corresponding pi.

TABLE 6.1 Probability Plotting Positions for n = 10

Rank Ordered observations* pi = F(ti) = ( 1/2)i

n−

Y = ln ln1

1 ( )−���

���

��

� �F ti

i ti

1 t1 0.05 – 2.9702

2 t2 0.15 – 1.8170

3 t3 0.25 – 1.2459

4 t4 0.35 – 0.8422

5 t5 0.45 – 0.5144

6 t6 0.55 – 0.2250

7 t7 0.65 + 0.0486

8 t8 0.75 + 0.3266

9 t9 0.85 + 0.6403

10 t10 0.95 + 1.0972

The Weibull probability paper, Fig. 6.1, can be constructed following the steps given below:

(i) First of all take an ungraduated Log-linear paper.

(ii) On this paper, the time to failure is set off on the logarithmic scale, the X-axis.

(iii) On the linear scale, the Y-axis is graduated to a convenient scale as shown in the Fig. 6.1.

(iv) Locate the values of Y calculated using the expression (3.3).

(v) For most of the practical cases, the values of Y between about – 4.0 to +2.0 units (dependingon the value of n) is sufficient.

(vi) Corresponding to each value of Y, the value of pi is extracted and marked, on a parallelscale (auxiliary scale) as shown in the Fig. 6.1 (axis to the right side).

(vii) Note that the origin of the Weibull paper (Y = 0) is located at pi = 0.6321 (sometimescalled the virtual origin of the plot).

����!��!-���� -.$!)* ��%&���!% �������$��#�!� ��1

TABLE 6.2 Probability Plotting Positions for n = 20

Rank Ordered observations* pi = F(ti) = ( 1/2)i

n−

Y = ln lnF ti

11 ( )−���

���

��

� �i ti

1 t1 0.0250 – 3.6762

2 t2 0.0750 – 2.5515

3 t3 0.1250 – 2.0134

4 t4 0.1750 – 1.6483

5 t5 0.2250 – 1.3669

6 t6 0.2750 – 1.1345

7 t7 0.3250 – 0.9338

8 t8 0.3750 – 0.7550

9 t9 0.4250 – 0.5917

10 t10 0.4750 – 0.4395

11 t11 0.5250 – 0.2951

12 t12 0.5750 – 0.1559

13 t13 0.6250 – 0.0194

14 t14 0.6750 + 0.1168

15 t15 0.7250 + 0.2554

16 t16 0.7750 + 0.3999

17 t17 0.8250 + 0.5556

18 t18 0.8750 + 0.7321

19 t19 0.9250 + 0.9518

20 t20 0.9750 + 1.3053

* ti + 1 > ti for all I = 1, 2, ..., n

It should be added at this point that in a laboratory controlled electrical stress ageingexperiment on identical models, the times to failure at different stresses give rise to a distributionof values for the shape parameter. Since, only a unique value for β is allowed as per the Weibullstatistic. If the degree of acceleration is not very high, the scatter in the computed values can beassumed to be nearly normally distributed. Under such conditions, a weighted mean value there-of is deemed as a reasonable estimate.

�23 � �!�"!�!�#��$���!% � ��!���!&$�&%�'&( �� )*!'� $�

99.93

93.40

63.21

30.78

Pi%

12.66

4.86

1.82104103102101100

X = t

2.00

1.00

0.00

–1.00

–2.00

–3.00

–4.00

Y=

lnln

11

–F

(t) i

���������( �+�����+�+�������� �

However, the fact still remains that it is mandatory to prove the invariance of β w.r.t. stressfor which a statistical test for validity exists. This aspect shall be covered later under the head, datavalidation. If this test shows that the shape parameter is indeed significantly different for differentstresses, the data is suspect to the extent that the mechanism of ageing is not preserved at everyageing stress-level, including, perhaps, at the operating stress as well.

Situations are encountered in which the data plotted on a Weibull paper shows a pronouncedpositive or negative curvature. This does not imply that the data does not conform to the fitteddistribution. A curvature may be real and may be due to an adjustable offset. Referring to Fig. 6.2,if a correction is given to the Weibull cdf in the form of a third parameter, t0, Eqn. (6.8), called thelocation parameter, the plot becomes a linear fit.

F(t) = 1 – et t

−−�

�����

0

τ

β

(6.8)

It is to be noted, however that a corrected graph does not allow for the computation of thelocation parameter. There are analytical methods by which the third parameter can be expresslyestimated. This cdf is called a 3-p-W, meaning that there are three parameters controlling thedistribution. The horizontal dotted lines drawn from the actual plots to an expected linear fit aremeasures of the third Weibull parameter.

����!��!-���� -.$!)* ��%&���!% �������$��#�!� �2�

99.93

93.40

63.21

30.78

Pi%

12.66

4.86

1.82104103102101100

X = t

2.00

1.00

0.00

–1.00

–2.00

–3.00

–4.00

Y=

lnln

11

–F

(t) i

A and A are actual plots in 2-p-W1 2 B and B are corresponding in 3-p-W1 2

����������'������4��,������+ � 5��� � ������ �����������6 � �������( �+��������+�����

��� � �������� �� �� ������� ��������������

It is seen from the previous section that when the function, F(t), is known in a closed form, anyprobability paper can be constructed by effecting a suitable transformation of variables. However,there are instances in which this may not always be the case, and other methods may have to beused. As an example, a Gausian or Log-normal statistic does not possess a cumulative distributionfunction, F(t) for all values of t since Eqn. (4.49) cannot be evaluated exactly. In such cases, a MonteCarlo scheme of integration can be used. Alternately, a convergent power series expansion of theintegrand be made and integrated term-by-term to the required degree of accuracy.

��2�� $��6��������,7$��6��'�� ��

The probability axes for both normal and log-normal papers are the same. The random variable inthe first case is plotted on a linear scale and logarithm of the random variable in the second case.The probability density function f(t) is made, parameter-free, for convenience, by making thetransformation:

t − µσ

= z (6.9)

so that, F(t) becomes,

F(t) = 12

2

20π

ez−∞� dz (6.10)

�2� � �!�"!�!�#��$���!% � ��!���!&$�&%�'&( �� )*!'� $�

Normal and Log-normal probability papers are readily available and can be used. Also,these papers are shown in Fig. 6.3 and Fig. 6.4 for a ready reference.

The parameter µ for normal distribution is given by the value of the random variablecorresponding to the point of intersection of the fitted line and the horizontal line passing through50% probability. For log-normal distribution the parameter µ is called log mean value.

It should be noted, however, that the parameter σ, which is analogous to the shape parameterof the Weibull distribution, is a measure of the dispersion of the data. This parameter cannot beobtained as the slope of the probability plot. It is possible to estimate σ the logarithmic standarddeviation, [45] as the difference between the 50th and the 16th percentile of the distribution. Whenthe sample size is reasonably large, or it can also be obtained as what is called a pooled estimateusing data.

99.90

99.00

95.00

90.00

60.00

30.00

10.00

5.00

1.00

0.10

��������� $��6�����+�+�������� ��������� 8�$��6�����+�+�����9:��+�����8������6�;����+

��2�� 5�� 6 �<�� �'��+�+�����'�� ��

Recall the cdf of extreme value distribution,

F(t) = 1 – ee

t

−�

���

���

−−�

�����

λδ

, – ∞ < t < ∞ and λ, δ > 0 (6.11)

Linearizing this expression,

ln ln( )

11 −���

���

���

���

= −F t

t tδ δ

, – ∞ < t < ∞ and λ, δ > 0 (6.12)

����!��!-���� -.$!)* ��%&���!% �������$��#�!� �22

99.90

99.00

95.00

90.00

60.00

30.00

10.00

5.00

1.00

0.10

�������� ���,7���6�����+�+�������� ��������� 8�$��6�����+�+�����9:��+�����8���,�=�����6�;����+ >

Which can be written as,

Y = mX + c (6.13)

In which, Y = ln ln( )

,1

11

−���

���

���

���

=F t

and c = − λδ

(6.14)

The F(t) can be got using the expression 4.7 and the Y-axis can be graduated as in the case ofWeibull Probability paper. Note that the X-axis is linear in this case instead of logarithmic in Weibullcase. Fig. 6.5 shows a sample extreme value paper.

��2�2 ��,�����'��+�+�����'�� �

The cumulative distribution function of the Logistic distribution is given by,

F(x) = 1

1 +−�

�����e

x ab

(6.15)

where, – ∞ < x < ∞, a, b > 0

Linearizing, the above equation can be written as,

Y = mX + c (6.16)

where, Y = ln1 1−���

��� = =

F xF x

mb

cab

((

and)

), (6.17)

�2? � �!�"!�!�#��$���!% � ��!���!&$�&%�'&( �� )*!'� $�

99.93

93.40

63.21

30.78

12.66

4.86

1.82

X = t

Pt%

2.00

1.00

0.00

–1.00

–2.00

–3.00

–4.000

Y=

lnln

11

–F

(t) i

���������� 5�� 6 �;�� ����+�+�������� �

The following steps may guide the reader to make his own probability paper:(i) Compute Y for synthetic values of F(x) from zero to unity (excluding both).

(ii) On a linear graph paper, graduate the Y-axis to an arbitrary scale. This is called primaryscale.

(iii) Open out an auxiliary parallel scale to mark the % probability pi corresponding to Y.Note that the 50% probability corresponds to Y = 0. This point serves as the origin forplotting.

(iv) The slope of the fitted line is found out using the primary scale of the Y-axis. It is easy tosee that the fitted line to the data on this paper has a slope given by 1/b.

(v) The mean value a can be obtained by taking the ordinate of the point of intersection offitted line and the line Y = 0.

X = t

2.00

1.00

0.00

–1.00

–2.00

–3.00

–4.000

Y=

lnln

1–F

(x)

F(x

)

99.93

93.40

63.21

30.78

12.66

4.86

1.82

Pt%

������������,��������+�+�������� �

����!��!-���� -.$!)* ��%&���!% �������$��#�!� �2�

��� �� ��������� ������ ��� ����������

The probability plotting is described in the following steps:

1. When once the number of samples used in an accelerated ageing experiment is decided,the probability axis for order statistic based Y-axis is completely determined, use formulagiven by Eqn. (6.17).

2. For a complete data the logarithms of times to failure are plotted on the relevant paperwith the corresponding probability.

3. Sets of values of pi and log(ti) form the plotting positions of every point on the graph.

4. While plotting, the per cent probability against the corresponding ti is considered. Thismeans the auxiliary scale is used in all probability plotting.

5. A line is fitted by eye using a transparent straight edge so that the fitted line occupies anoptimum position of nearness to the plotted points. A word of caution here is that howevercarefully a line is fitted gross inaccuracies would occur if due care is not taken to see thatthe points farther from the origin are carefully balanced. In fact this is the most importantaspect which determines the effectiveness of the graphical methods.

6. The nominal life corresponds to either the 63.2th or 50th percentile according to thechosen distribution.

7. For calculating the shape parameter wherever possible, two convenient points on thefitted line with coordinates (ln(t1), Y1) and (ln(t2), Y2) are identified and a graphicalestimate of the shape parameter, for example, in the case of 2-p-W, β can be obtained as,

β = Y Yt t

2

2 1

−−

1

ln ln(6.18)

8. Note that graphical estimate for the parameter σ of Normal and Log-normal distributionscannot be directly obtained from the graph. However, an approximate estimate can begot from the sample variance.

��� ����������������������������� �������������� ��

When relevant probability papers are available, or, when it can be constructed, the graphical analysisof the data, is by-far the simplest and quickest. Since the graphical plot is a quantified pictorialrepresentation of the data, gross inaccuracies in the acquired data or erroneous modeling can easilybe detected. If a linearity of the data was expected, a visual examination of the plot either confirmsor rejects the contention. The plot reveals the extent of departure from linearity and existence ofoutliers. Approximate estimates of some of the parameters of the model under consideration canbe obtained, which would serve as a first approximation for use in finer analytical techniques.

Almost always, the disadvantages outweigh the credits of graphical analysis of data,particularly when it is not validated using more accurate methods. The important drawbacks ofgraphical methods are:

�2� � �!�"!�!�#��$���!% � ��!���!&$�&%�'&( �� )*!'� $�

• The fact that the data points are fitted, by eye, makes the resulting linearization to behighly subjective. Being dependent upon the person analyzing the data, parameterestimates introduce variabilities, not characteristic of the data.

• Often, uncompromising errors result in the estimates of the parameters, since the scalesare logarithmic and sometimes, double logarithmic and even small displacements in theposition of the fitted lines give rise to exponentiated read-outs of the parameters.

• Graphical estimates do not provide the much needed objective assessment of theparametric estimates such as the confidence bounds or the possible limits within whichthe parameters lie, cannot be made.

• The rounding off errors in plotting as well as read-outs, are, almost always, unacceptablyhigh.

• Graphical methods are unsuitable for processing censored data.In view of these serious objections, analytical methods for life data analysis become

indispensable. Analytical techniques to solve stochastic problems are usually, if not always, possible.While such methods ensure a high degree of accuracy of both point and interval estimation of theparameters and many other benefits, admittedly, these methods require the expertise of trainedpersonnel capable of statistical data analysis and computational aspects besides considerableexperience in interpretation and decision making.

There are essentially two methods of analytical data processing:(i) Linear estimation methods like, Least Squares Regression (LSR), Best Linear Unbiased

Estimation (BLUE) and Best Linear Invariant Estimation (BLIE).(ii) Maximum Likelihood (ML) method.

��� ��������������������������� �

A real time experimental data acquisition results in a set of data represented by two variables, anindependent variable (which is normally under the control of the experimenter) and a dependent(random) variable with its characteristic uncertainty. Since the acquired data is only a subset of allpossible inexhaustive measurements, the parameters controlling any statistical model to the datafitted will always contain a certain degree of uncertainty. All parametric estimates (and inferencesbased on them) used in justifying the proposed stochastic or physical models, can be derived, to areasonable degree of certainty, if the equations describing the models can be reformulated as alinear relationship.

The procedure of estimating the model parameters and associated uncertainties using a setof exhaustive data using linear or approximations thereof is called Simple Linear Regression orLeast Square Regression.

Consider an ordered pair of i observations. The conditional expectation of y given x is thena linear function of (xi, yi) given by:

( )y x = ζ + κx (6.19)

in which, ζ and κ are parameters of the model considered, with its associated variability.The accuracy of their estimation is closely connected with the sample-size. We usually regard thisequation as stochastic and replace ζ and κ by their close enough estimates a and b, respectively, sothat

y = a + bx (6.20)

����!��!-���� -.$!)* ��%&���!% �������$��#�!� �2/

The above equation is called the regression of y on x. Similarly, the equationx = c + d y (6.21)

is the regression of x on y. In general these two equations (Eqn. (6.20) and Eqn. (6.21)) are not thesame. However, the point of intersection of these two lines are the mean values x and y of x andy respectively.

The following assumptions are necessary for estimating a and b using LSR technique [39, 51](i) The xi values are controlled, or, are observed without error. Also, there are no other

extraneous factors influencing the estimation.(ii) The deviations, D given by

D = yi – y x (6.22)

are mutually independent(iii) D’s have the same variance (not usually known) independent of xi and normally

distributed, as shown in the Fig. 6.7 and Fig. 6.8.

The LSR line, y = a + bx

y

� (y – a – bx ) is to be minimizedi2

i

n

i=1

x

������������ � ��������� ��� ,� ������6 ����

The LSR line

Conditional distribution of y for equal variances

y

x

(y x) = + kx�

��������� ��,�������� �� � ����������� ����������;����� �

�20 � �!�"!�!�#��$���!% � ��!���!&$�&%�'&( �� )*!'� $�

(iv) It must be ensured that the data comes from the population on which inferences arebeing drawn.

Assumption (iii) is usually quite difficult to comply with, but, as long as the departure fromnormality is not substantial, no serious errors are likely to accrue.

Suppose, yi are scattered about the regression line, y x = a + bx, as shown in the Fig. 6.7 ,

and that it is desired to obtain a and b, the estimates of ζ and κ respectively as closely as possible.A logical way of achieving this is to minimize the total deviation of corresponding individualvalues of yi for known xi about the regression line. But, since the deviations can either be positiveor negative, the algebraic sum could become zero. To get over this situation, instead of seeking tominimize the sum itself, squared deviations, D2 are sought to be minimized, see Fig. 6.7.

If the deviations are redefined this way, D2 can be expressed as:

D2 = ( )y a bxi ii

n

− −=∑ 2

1

(6.23)

where, yi is the observed value of y and (a + bxi) is obtained from the regression line.

Minimization of D2 can be done by equating, separately, the partial derivatives w.r.t. a andb to zero and solving the resulting simultaneous equations as follows:

∂∂Da

2

= 0, ∂∂Db

2

= 0 (6.24)

With the condition that;

∂∂

2 2

2D

a > 0,

∂∂

2

2Db

2

> 0 (6.25)

We then have,

( { ( )}( ))

( { ( )}( ))

2 1 0

2 0

1

1

y a bx

y a bx x

i ii

n

i i ii

n

− + − =

− + − =

=

=

∑(6.26)

Solving these ‘normal’ equations simultaneously,

b =

x y x y

n x x

i ii

n

ii

n

ii

n

ii

n

ii

n

= = =

= =

∑ ∑ ∑

∑ ∑

=�

���

���

−�

���

���

1 1 1

2

1 1

2 (6.27)

Now, y a bxii

n

ii

n

= =∑ ∑= +

1 1

( ) (6.28)

����!��!-���� -.$!)* ��%&���!% �������$��#�!� �21

Simplifying, y b x naii

n

ii

n

= =∑ ∑= +

1 1

(6.29)

Dividing both sides by n, and rearranging,

a = y bx− (6.30)

where, y and x are the mean values of yi and xi respectively. The equation of the fitted line can alsobe written as,

( ) ( )y x y b x x= + − . (6.31)

��� ������������������� �

Sometimes the experimental data indicates a strongly non-linear behaviour of a dependent variabley with the independent variable x, in one of the following forms:

(i) y = a1 xa2

(ii) y = b1 eb x2 (6.32)

The exponents a2 and b2 in the above equations can be of either sign. By a logarithmictransformation these can be linearized and new variables for y and x can be used. After thistransformation, the method of LSR described earlier can be used.

��� ��������������� �

In many of the experimental situations, there may be more than one linearly dependent/independent variables (x, y, and z), a linear relationship between these variables can be expressedin the following way:

z = a′ x + b′ y + c′ (6.33)

Suppose z is a variable, dependent on x and y which are mutually independent. In developingthe multiple regression all assumptions made in connection with LSR are assumed to be the samehear also. The multiple regression model can be formally written as,

E[z | x, y] = ax + by + c (6.34)

where a, b and c are estimates of population multiple regression parameters of a’, b’ and c’,respectively.

Following the LSR procedure, the least square deviations, can be written as,

{ ( )}z ax by c Di i ii

n

ii

n

− + + == =∑ ∑2

1

2

1

= D (6.35)

�?3 � �!�"!�!�#��$���!% � ��!���!&$�&%�'&( �� )*!'� $�

Differentiating D partially with respect to a, b and c and equating them to zero,

∂∂Da

= 0, ∂∂Db

= 0 and ∂∂Dc

= 0 (6.36)

The normal equations are obtained as,

z a x b y nc

z x a x b x y c x

z y a x y b y c y

ii

n

ii

n

ii

n

i ii

n

ii

n

i ii

n

ii

n

i ii

n

i ii

a

ii

n

ii

n

= + +

= + +

= + +

= = =

= = = =

= = = =

∑ ∑ ∑

∑ ∑ ∑ ∑

∑ ∑ ∑ ∑

1 1 1

1

2

1 1 1

1 1

2

1 1

( 6.37)

The set of Eqn. (6.37) can be solved simultaneously for a, b and c, the estimates of the modelparameters a’, b’ and c’.

�� � �� ������������ �

When data do not conform to a transcendental curvilinear model but still shows a pronouncednonlinearity, then, fitting the data to a polynomial of appropriate degree should be resorted to. Let,with usual notation, the mth degree polynomial relationship between y and x be of the form,

y = a′0 + a′1 x + a′2 x2 + ... + a′i x

i + ... + a′m xm (6.38)

y = ′=∑ a xii

mi

0

(6.39)

y = ′=∑ a xii

mi

0

(6.40)

In Eqn. (6.40), ai is the estimates of the model population parameters a’i. Using the earlierprocedure, the mean square deviation about the fitted polynomial curve can be written as,

i

n

=∑

1

{yi – (a0 + a1xi + a2xi2 + ... + amxi

m)}2 = i

n

=∑

1

Di2 = Dsqd (6.41)

Further steps are shown assuming a polynomial of degree three (m = 3). Minimizing thesquare deviation about the regression polynomial, the four normal equations can be obtained asfollows:

����!��!-���� -.$!)* ��%&���!% �������$��#�!� �?�

y na a x a x a x

y x a x a x a x a x

y x a x a x a x a x

y x a x a x

ii

n

ii

n

ii

n

ii

n

i ii

n

ii

n

ii

n

ii

n

ii

n

i ii

n

ii

n

ii

n

ii

n

ii

n

i ii

n

ii

n

ii

n

= = = =

= = = = =

= = = = =

= = =

∑ ∑ ∑ ∑

∑ ∑ ∑ ∑ ∑

∑ ∑ ∑ ∑ ∑

∑ ∑ ∑

= + + +

= + + +

= + + +

= + +

10 1

12

2

13

3

1

10

11

2

12

3

13

4

1

2

10

2

11

3

12

4

13

5

1

3

10

3

11

4

1

a x a xii

n

ii

n

25

13

6

1= =∑ ∑+

( 6.42)

Solving the Eqn. (6.42) simultaneously, the estimates of the model parameters, the parameters,a0, a1, ..., am, can be obtained.

A standard error, s(y | x) (also written as syx), of estimate for an mth degree polynomial fit, isgiven by,

syx =

( ( ...... )}

[ ( )]

y a a x a x a x

n m

ii

n

i i m im

=∑ − + + + +

− +1

0 1 22 2

1(6.43)

where, [n – (m + 1)] is the degree of freedom, ν. In the case of the third degree polynomial ν = n – 4.

���!���������� ��� ���������������� �� ��� ������

The general theory of linear regression presented in the previous section does not account forexistence of hidden variables in x and y. If the random variables x and y represent the probability offailure and the times to failure, the stress applied during the ageing experiment do not appear inthe set of regression equations. It is therefore easy to see that a and b, the estimates of ζ and κ do notpossess unique values and hence the model itself is not unique.

Experiments conducted on nominally identical specimens with a hidden variable, a variablethat does not enter expressly into the statistical model, gives sets of raw data which are nearlyhomogeneous. For example, insulation failure data at different Electrical or Thermal stresses aredistributed according to the cdf already proposed. Depending upon the number of test stressesused, that many regression parameter estimates are obtained. Under the assumption that thephysical model sufficiently describes the raw data, the estimated regression parameters can beassumed to be nearly Gaussian. Distinct values of the regression parameter can be obtained bytaking a weighted mean of the parameters.

In order to overcome this situation, a more elegant parameter estimation technique calledpooled estimation method suggested by Draper and Smith [51] and Nelson [45] can be employed.In this procedure, the failure data acquired at different stresses are homogenized to realize a pooled

�?� � �!�"!�!�#��$���!% � ��!���!&$�&%�'&( �� )*!'� $�

data set weightage being given to the sample size at each stress. By this procedure, it is possible toobtain both point and interval estimates of the parameters.

���3�� '�� �� ���6���������� ������, ��,�����

To begin with, pooled least square regression of extreme value ageing failure data under electricstress is described. In addition to the assumptions mentioned earlier, it is further added that thestatistical and physical models describing the long-time failure are the Weibull statistics (2-p-W)and the Inverse Power Law (IPL) respectively.

Let the failure data be acquired on nominally identical specimens at J-different stresses, U1,U2, U3, ...., UJ and that n1, n2, n3, ..., nJ be the sample sizes at these stresses respectively (since thespecimens are identical, stresses can be replaced by voltages).

xj and yij are new transformed variables given by;

xj = ln(Uj) � �yij = ln(tij)

(6.44)

where, I runs over all times to failure over all J test stresses, that is; I = 1, 2, 3, ….., nj and j = 1, 2,3, …., J.

Based on the Inverse Power Law Model, the least square regression relationship betweenstress and time can be written as;

Yij = γ + η xi + εij (6.45)

where, γ and η are the regression coefficients and εij are the mutually independent random errorsof estimation. Now, γ and η can be identified with the endurance coefficient n and the constant K ofthe IPM. The mean error can be neglected for large sample sizes but, the unknown variance there-of,σe

2 can be approximately estimated. Again, γ, η and σe are not calculable, directly, as described

earlier. Hence, they are replaced by their estimates, say, �γ , �η and s. The data processing proceedsas follows:

The logarithmic mean time to failure, yj , at the jth test stress and the corresponding standard

deviation are calculated as,

yy y y y

nj

j j j n j

j

j=+ + + +1 2 3 ......

(6.46)

sj = ( ) ( ) ( ) ...... ( ) ( )

( )

y y y y n y

n

j j j n j j j

j

j12

22

32 2 2

21

+ + + + −

−(6.47)

Here, nj – 1 = ν is the number of degrees of freedom and sj is not defined for nj = 1.

The grand averages of x’s and y’s are calculated as;

xn x n x n x n x

nj j

=+ + + +1 1 2 2 3 3 ......

(6.48)

yn y n y n y n y

nj j=

+ + + +1 1 2 2 3 3 ......(6.49)

����!��!-���� -.$!)* ��%&���!% �������$��#�!� �?2

where, n = n1 + n2 + … + nj, the total sample size.

The sample variances syy , sxx and co-variances sxy are;

syy = y n yijji

n2

11

2

==∑∑

���

��� −

J

� � (6.50)

sxx = (n1x1 + n2x2 + n3x3 + ... + nJxJ) – n x� �2

(6.51)

sxy = n x y n x y n x y n x y n x yJ J J1 1 1 2 2 2 3 3 3+ + + + −...� � � � (6.52)

The LSR estimates of γ and η namely, �γ and �η , are,

�γ = sxy/sxx, and �η = y – �γ x (6.53)

The corresponding estimates of the power law parameters, �m and �K are;

�m = – �γ and �K = e n s( � . )− 0 4501 (6.54)

where, s is the pooled estimate of the logarithmic standard deviation given by,

s = ( ) ( ) ( ) ...... ( )

( )

ν ν ν ν

ν1 1

22 2

23 3

2 2

2

s s s sJ J+ + + +(6.55)

An estimate of the mean life Le (a measure of the parameter τ in the Weibull distributionfunction) at any electric stress U0 is then expressed as:

Le = �K ( ) �U0− m . (6.56)

���3�� � ,� �����������������'�� ���� �6���, ��,�����

In the analysis insulation failure data under thermal stress, it is deemed that the underlyingphenomenological model is based on the Arrhenius law of chemical rate-process (law of first orderreaction kinetics) described earlier. The times to random failure of nominally identical specimensat any accelerated thermal stress (the absolute temperature, T, to which the insulation is subjectedshall represent the degree of thermal acceleration), generally conforms to a Log-normal distribution.

A very important aspect in thermal ageing data acquisition is the enunciation of the end oflife criteria (also termed as indexation of failure). In electrical ageing, insulation failure was definedas a complete, or a nearly complete, loss of some, or all of the insulating properties. It is easy toquantitatively assess this fact by making direct electrical measurements which, more often thannot, give a definite indication of failure or, at least, of an impending failure. This is not the caseunder thermal stress, because, the failure is subjective, in the sense that only indirect measurementscan be performed to infer, qualitatively, the extent of damage suffered by the insulation under test.Usually, certain electrical and physico-chemical tests are conducted to get such information. Thisaspect shall be taken up for discussion in detail later, but it suffices to add here, that the same endpoint criteria, it should be applied in the series of experimental runs on thermal ageing.

The calculation of model parameters proceeds in an exactly similar way as for the electricalstress except that the independent variable, xj in the earlier treatment is replaced by (1000 is includedfor convenience),

�?? � �!�"!�!�#��$���!% � ��!���!&$�&%�'&( �� )*!'� $�

xj = 1000/Tj (6.57)

If the logarithmic mean time to failure of ith test specimen at the jth test stress is representedas yij, then;

yij = µ(xi) + eij (6.58)

µ(xi) = ς + υ xi (6.59)

(in the above equations, I = 1, 2, 3, …., nj and j = 1, 2, 3, ….., J.)

where, ς and υ are the controlling parameters of the linearized Arrhenius equation. An estimate ofµ of the model at a given test temperature is represented by LT, the thermal life.

As in the earlier case, the random error, eij has a zero mean value and an unknown variance,which is nearly normally distributed. All assumptions applicable to regression theory are invokedhere also.

The logarithmic average time to failure, yj, at the jth temperature and the correspondingsample standard deviation, sj, are;

yy y y y

njj j j nj

j=

+ + + +1 2 3 ......(6.60)

si = ( ) ( ) ... ( )

( )

y y y y y y

n

j j j j nj j

j

12

22 2

1

− + − + + −

−(6.61)

sj = ( ... ) ( )

( )

y y y n y

nj j nj j j

j

12

22 2 2

1

+ + + −

−(6.62)

Either of the equations above for calculating sj can be used, however, it is important to carryas many significant decimal places as possible, (at least a six figure accuracy is necessary in theintermediate stages) since, the values are usually very small so that a three- or four decimal figuresappear in the final results.

The grand averages of the two variables are respectively,

xn x n x n x n x

nj j=

+ + + +1 1 2 2 3 3 ......(6.63)

yn y n y n y n y

nj j=

+ + + +1 1 2 2 3 3 ......(6.64)

The equations above are just the sums over the entire sample space divided by the totalsample size. As was done before, variances and co-variance of x and y are computed using thesame set of expressions 6.50 to 6.53. The estimates of the model parameters are given by;

�υ = sxy/sxx and �ς = y x− �υ (6.65)

The pooled estimate of the logarithmic standard deviation, s, is;

s = s s

nyy xy−

�υ

2(6.66)

����!��!-���� -.$!)* ��%&���!% �������$��#�!� �?�

The least squares estimate of the Log-mean life, �µ (x0) at any desired temperature, x0 = 1000/T0

is then;

�µ (x0) = �ς + �υ x0 (6.67)

The insulation life, LT, at x0 = 1000/T0, in the appropriate units of time is;

LT = e x� ( )µ 0 . (6.68)

���3�2 ��������� ��������� ���6��

In connection with the regression analysis of data, a term called ‘Goodness of Fit’ is used to indicatehow good the assumed fit is compared to the actual data. This is expressed in terms of what iscalled the standard error of the estimate. The standard error of an estimated parameter, s, is thestandard deviation of the sampling distribution of that parameter. Since this is not readily known,an approximate computation is made assuming that the sampling distribution is nearly normal,given below.

s =

( [ ]y a bx

n

i ii

n

− +

−=∑

1

2(6.69)

Since two parameters a and b are being estimated, the number of degrees of freedom isn – 2. It should be noted that a more appropriate terminology is adequacy of fit rather than thegoodness of fit. Since both yi and a + bxi are not observed and expected frequencies and bothcontain some error of observation. A similar approach to estimating the standard error can befollowed in any other regression procedure. As an example, the estimation of the value of s underthermal stress is given below.

Accordingly, the general expression for standard error of estimate of a parameter, say, µ, atany test temperature, T0, with corresponding x0 = 1000/T0 is written as;

s(LT) = 1 0

2

nx x

sxx−

−�

���

���

( ) s (6.70)

In a similar manner, standard errors of all other relevant parameters can be estimatedprovided the sampling distribution of the parameter is known.

���� ����� ��������� ��������

The point estimate of a parameter provides only one of the possible approximations to its truevalue. The degree of uncertainty in the estimation is strongly associated with the precision of themodel used to describe the underlying phenomenological processes and the probabilistic nature ofthe physical process involved therein. Inaccuracies accumulate at every step due largely to thedepartures from the assumptions made in developing the models. These are sufficient reasons to

�?� � �!�"!�!�#��$���!% � ��!���!&$�&%�'&( �� )*!'� $�

give rise to a scatter in the parameter estimates arrived at using experimental data obtained evenunder identical test conditions. It, therefore, becomes necessary to add a qualifying statementconcerning the estimates with the form and content as under;

“The estimated value of the parameter say, θ, has a stated (High) probability of being enclosedbetween a lower bound or limit θL and an upper bound θU”.

It is desired that the range between θL and θU, called the interval, is as narrow as practicable.

It is now possible to formally define the confidence interval (CI) as:

pr { θL ≤ θ ≤ θU }= γ, 1 ≥ γ ≥ 0, (6.71)

Whatever be the value or the nature of the parameter, θ.

This probability statement, expressed in words, means the following:

Supposing a large number, of say, one hundred experiments are conducted with a view toestimating a parameter, θ, (describing a model), under identical conditions, then the probabilitythat the interval encloses the value θ is 100 γ %. The quantity, (1 – γ) is called the Confidence Coefficient.

This interval is called the “two-sided 100 γ % CI for θ”. It may often be sufficient to knowonly one of the bounds, upper or lower. In such a case, one-sided CI” s are used.

The one-sided 100 γ % upper CI on θ is given by:

pr {θ ≤ θU } = γ (6.72)

Actually, the interval covered by θ now extends from (– ∞, θU). One-sided lower CI can bedefined in a similar way. It can be seen that an unreasonably narrow CI means, extreme precision,or, near determinism. Concomitantly, this further implies that, to realize such narrow intervals, asample size tending to infinity has been, or needs to be considered in the particular experimentalrun.

Construction of CI ’s is not easy unless one has sufficient reason to believe that the samplingdistribution of the parameter, θ is nearly normal and that the bias in q is small. Formulae for theconfidence intervals on mean and variance of the regression parameters in an LSR estimate areprovided by Nelson [45]. Fig. 6.9 gives a pictorial view of the confidence intervals.

+ ++

+ ++

+ ++

+ ++

Lower confidence bound(Lower interval)

Upper confidence bound(Upper interval)

The LSR line, y = a + bx

y

x

����������!������������������ � ���� �;����+������� ,� ��������

����!��!-���� -.$!)* ��%&���!% �������$��#�!� �?/

If the statement above is true, then, to a restricted accuracy, the confidence statement can beformulated as follows:

Let the unbiased estimator has a pivotal value, say, θ* and its unknown variance, which canbe approximately calculated from the sample data, be σ2 (θ*), the 100 γ % CI’s are,

(θU, θL) = θ* ± Kγ σ (θ*) (6.73)

where, Kγ is the [100(1 + γ)/2]th standard normal percentile obtainable from statistical Tables.Examples on these calculations are given in Appendix.

��� ��������"��� ��������� ��

In general as applied to the LSR, the size of the data (the number of experimental observations)required for satisfying a desired confidence bound on the parameters can be approximatelyestimated using the expression for the population mean. For a normal distribution, it is possible toshow that for a desired width of the bound, w (the distance between the regression line and the rimof either of the confidence interval), the value of n can be calculated using the formula,

K1 – γ/2 × σ / n = w (6.74)

In insulation ageing experiments, a certain minimum number of specimens in a samplepacket is prescribed for credible estimation of the model parameters. A large sample size ensuresprecise parametric estimates, but if the numbers involved are too large, experiments becomeprohibitively expensive besides consuming inordinately long-time.

Sound engineering judgment is called for in determining an optimal sample size. A detaileddiscussion on the economics of running such experiments shall be covered later. One of the methodsof working out a reasonable number of specimens in an experimental run is to apply the conceptsof confidence intervals mentioned in the previous section. This treatment is covered at length byJohnson and Leone [39]. In insulation ageing data analysis, it is usual to use a simpler, but,approximate formula for the minimum sample size, n, suggested by Nelson [45].

n = Kγ

ϖ������

2

g* (6.75)

in which, ϖ is the specified width of the CI on θ and the term g* is the unknown population variance.Usually, this variance is not the sample variance, but, can be estimated using similar, earlier dataor, may be based on previous experience.

More often, the scatter in θ is not normally distributed nor is its variance known, evenapproximately. In such cases, instead of the normal percentiles, Kγ, the percentiles of a t- statistic isused. Examples of estimation of the different model parameters of actual ageing data are includedin Appendix.

������#������$��� ���������� �

Historically, the Maximum Likelihood (ML) method is among the earliest, purely analyticaltechniques, for parameter estimation of a stochastic model. The particular advantage in this analysis

�?0 � �!�"!�!�#��$���!% � ��!���!&$�&%�'&( �� )*!'� $�

is that it is not necessary to have any prior information on the data, except that it is derived from arandomized, stable population. The estimators posses large sample properties, are fairly unbiasedand normally distributed. The variability factor, [(sample size) x (sample variance)], is alwaysequal to or even better than, all other estimators possessing asymptotic properties. However, someknowledge of the prior distribution of the parameter θ, being estimated, is required.

Suppose, observations on a random variable, X, be, { x1 , x2 , ….., xn }, following a distributionwith parameters θj, all θj’s being equally likely. Then, by Bayes’ theorem, the aposteriori distributionof θ is proportional to the conditional probability of X, pr(x1, x2, ……, xn/θj).

The Likelihood L, of θj , given x1, x2, …, xn is expressed as,

L = πi

n

rp= 1

(x1, x2, ..., xi | �j) (6.76)

The term, pr (x1, x2, …, xn | θj), is always a known probability density function, f(xi /θj),where, xi are independent random variables and the qj, unknown parameters being estimated fromthe data, control the position, the shape and magnitude of the function, f(x). The likelihood function,L, can therefore be rewritten as;

L(xi | qj) = π θi

n

i jf x= 1

( | ) (6.77)

If θML is the value of θ which maximizes the Likelihood function, then, θML is called themaximum likelihood estimator of θ. On the assumption of a uniform prior distribution it is the modeof the posterior distribution of θ. Apart from very rare cases, where, the likelihood function ismultimodal, θML is a uniquely defined function of xi’s.

The ML Estimators enjoy, under fairly unrestricted conditions, the following properties:

• θ ML is an asymptotically unbiased estimator of θ.

• n (θML – θ ) is asymptotically normally distributed.

• n Var( θML) = [ < ∂2 { ln L(x | θ)} / ∂θ2 >]– 1, for large n.

It is simpler to work on the natural logarithm of L(xi | θj), (called log likelihood), as this nowinvolves sums instead of products, containing θj, so that partial derivatives required to maximizethe likelihood can be obtained very easily.

The function, ln(L) is formally written as l(xi | θj) and it is sought to maximize this by partiallydifferentiating it, successively, w.r.t θj and equating them to zero thus:

∂{l(xi | θj )}/∂θj = 0 (6.78)

This results in j-simultaneous, coupled non-linear equations which need to be solved. Usually,fast-convergence, algorithms incorporating Newton-Raphson, or, Successive over relaxationmethods are employed in solving θj. In practice, no more than two or three parameters areencountered, in which case, j = 1, 2, 3.

In the following, illustrative examples of using ML estimation are provided. To start with,an exponential probability distribution, involving a single parameter, is considered to demonstratethe procedure.

����!��!-���� -.$!)* ��%&���!% �������$��#�!� �?1

���� �#� ����������������� �

The one-parameter, (λ), exponential density function, f (x | λ) is given by;

f(x) = λ exp (–λx), x, λ > 0 (6.79)

The parameter λ is called the failure rate and λ = 1 results in a standard exponentialdistribution.

L(xi|λ) = ln π λ

i

nxe i

= 1(6.80)

l( xi | λ) = ln [L(xi | λ) (6.81)

= n ln λ – λ Σi

n

ix= 1

(6.82)

for l (xi | λ) to be maximum and hence, λ = λML one must have,

∂ |∂λ

( ( ))l xi λ = 0 (6.83)

from which, λML = n/[ Σi

n

= 1 (xi)] = (1/x ) (6.84)

The asymptotic variance of λML is given by,

Var(lML) = ( )λ ML

n

2

(6.85)

The corresponding, two-sided 100 γ % confidence intervals on, λML, are,

λLL = λ

γ λ λML

K Vare ML ML( )/(6.86)

λUL = λML eK lML MLγ λVar ( )/ (6.87)

����%��������������� ��&'�'�(

The probability density function of a 2-p-W in the random variable t indicating the times to failureis given by,

f(ti | β, τ) = βτβ

β τ

β

t ei

ti−

− ������1 (6.88)

L(ti | β, τ) = π β τi

n

if t=

|���

���1

( , ) (6.89)

��3 � �!�"!�!�#��$���!% � ��!���!&$�&%�'&( �� )*!'� $�

This equation can be numerically maximized using optimization programs, like, LinearProgramming (LP) or Dynamic Programming (DP).

L(ti | β, τ) = β

πβ

τ ββ βn

n

t

i

n

ite ti

n

i−

=

−−

=�

���

���

Σ1

1

1 (6.90)

The log-likelihood equation is given by,

l(ti | β, τ) = n ln β – n β ln τ + (β – 1) Σ Σi

n

ii

n

it t=

=

���

��� −

1 1ln( ) τ β β (6.91)

Again, differentiating the log-likelihood function above, partially, w.r.t. β and τ respectivelyand equating them to zero gives a pair of coupled, nonlinear equations shown below;

nβ–1 + Σ Σi

n

ii

ni it

t t= =

= ������

������

���

���1 1

ln( / ) lnττ τ

β

(6.92)

τβ = 1

1nt

i

n

iΣ=

β (6.93)

These two equations can be solved to obtain the ML estimates, τML and βML.

21.5

10.50

500010000

1500020000

0

0.2

0.4

0.6

0.8

1.0

1.2

× 10–35

× 104

Like

lihoo

d

������������@ ����������� �������������7�7(

An example of mapping likelihood function for chosen values of β and τ (β = 1.21 andτ = 10,000), called the likelihood surface, is given in Figs. 6.10 and 6.11. It may be noted that whenthe likelihood is maximum, the values of β and τ are equal to the chosen values.

����!��!-���� -.$!)* ��%&���!% �������$��#�!� ���

× 104

21.81.61.41.210.80.60.40.6

0.8

1

1.2

1.6

1.4

1.8

2

����������� ��������6�������@ ����������� �������������7�7(

������ <����� ����( �+��'���6 � ��

The ML estimates of the scale and shape parameters are of the model have a scatter which areapproximately normally distributed, at least for complete data, a set in which all the specimens arerun to failure. The exact computation of variance of Weibull parameters is difficult. The asymptoticvariance and covariance of τ and β are quite difficult, and are laborious to calculate. Approximateexpressions suggested by Bain [52], are given below. These formulae are sufficiently accurate forapplication in life estimation.

Var (τML) ≅ (1.1087/n) (τ/β)2

Var (βML) ≅ (0.6079/n ) (βML)2

�(6.94)

Covar (αML, βML) ≅ (0.2570/n) αML

It is encouraging to note that these variances are smaller than those obtained by othermethods, such as, linear estimation and LSR.

������ -����� � �!�� �;��

The exact confidence intervals for Weibull parameters are not available, since their posteriordistributions indicate considerable departures from normality. An important, noteworthy pointhere is that, the formulae for parameter variances given above need substantial and considerablyinvolved modifications if data from experiments where all the specimens in a sample pack are notrun to failure and data acquisition terminated prematurely. Such censored data a different analyticalprocedure to handle.

In the following, the expressions for approximate CI’s using the parameter variances forcomplete sample indicated above, are given. The approximate two-sided 100 γ % CI on τ and β are,respectively,

t

e

ML

K ML

ML

= �

��

��

τγ τ

τ{ ( ) }Var (6.95)

��� � �!�"!�!�#��$���!% � ��!���!&$�&%�'&( �� )*!'� $�

~{ ( ) }

τ τγ τ

τ=

��

��

ML

K

e

ML

ML

Var

(6.96)

and ββ

γ ββ

~ { ( ) }= �

��

��

ML

K

e

ML

ML

Var(6.97)

~{ ( ) }

β βγ β

β=

��

��

ML

K

e

ML

ML

Var

(6.98)

where, K� is the [100 (1 + γ)]/2 percentile of standard normal distribution. In the above equations

~τ , τ~

are respectively the upper and lower limits on the parameter τ. In a similar way the CI on the

other parameters can also be defined.

�����2 ��6� ���A �-����� �������

A point of considerable importance is the sample size to be used in an experimental run for adesired bound. The minimum number of specimens in a sample to ascertain that the estimated τMLis enclosed within a bound of a prescribed width ± ϖ of the 63.2th population percentile with 100γ % probability is given by;

n = [Var (τML)] {Kγ/ω}2 (6.99)

Often, it happens that the above requirement is difficult to fulfill as the estimated numbers,n, are far too large thus running the experiments too expensive in terms of time and cost and oftenimpracticable. Sound engineering judgment is required to take a decision on, either to accept abroader CI or to use larger sample size.

���� ��� �������������� ��&�(

As mentioned earlier, the Logarithmic-Normal distribution is found relevant in a great manyinstances where the logarithms of the r.v.’s (the log (times) to failure, t), are normally distributedand the data can be treated as order statistics. The two parameters of the distribution, µ and σ (asapplied to thermal ageing data at constant temperature, T) correspond to the logarithmic mean or50% probable time to failure, and logarithmic standard deviation respectively. The probabilitydensity function, f(ti | µ, σ) for a complete data is given by;

f(ti | µ, σ) = 1

2

2

22

π σ

µσe

ti− −( )

(6.100)

where, 0 < t < ∞ and µ, σ > 0.

����!��!-���� -.$!)* ��%&���!% �������$��#�!� ��2

The likelihood and Log-likelihood functions respectively are;

L(ti|µ, σ) = 12 1

2

2

2

π σπ

µσ

���

���

=

− −−n

i

n

in

t

t ei( )

(6.101)

l(L(ti | µ, σ)} = – n2

ln(2π) – n ln σ – Σ Σi

n

ii

nit

t= =

−−�

�����1 1

212

ln( )µ

σ(6.102)

Differentiating the above equation partially w.r.t. σ and µ respectively and equating them tozero yields the maximum likelihood estimates sML, mML of s and m respectively thus:

∂∂σ

= − ������ + −−

=

l nt

i

n

iσσ µ3

1

2Σ ( ) = 0 (6.103)

∂∂µ

= − −−�

�����=

l ti

ni1

22

µσ

= 0 (6.104)

from which, σ ML2 = 1

1

2

nt

i

n

i MLΣ=

−( )µ and µML = 11n

ti

n

iΣ=

(6.105)

This shows that the ML estimates are just the sample men and sample variance respectively.While the sample average is a good estimate of population mean, µ the variance, σ2

ML needs amodification in view of a possible bias. A small sample correction for σ2 for complete data is givenby;

σ2 = σ ML2

nn−�

�����

1 , n > 1. (6.106)

������ ��,�$��6��<�����

The asymptotic variances of µ and σ of the population, assuming that, the departure of the truemean from their ML estimate is small, are;

Var (µ) = (σ ML2 )/n

Var (σ) = (σ ML2 )/2n

� (6.107)

Covar (µ, σ) = 0

������ -����� � �!�� �;��

In the estimation of CI on the Log Normal parameters, it is to be noted that the sampling distributionof σ is χ2 with the number of degrees of freedom, ν, equal to (n – 1). On the other hand, the logarithmicmean, µ, has a t-sampling distribution, also with ν = (n – 1). Accordingly, the estimates of two-sided, 100 γ % CI on σ and µ based on χ2 and t-distributions are given in the following:

σ σχ γ~ {( )/ ; }

=−

+ −

���

���ML

n

n

1

1 2 12 (6.108)

��? � �!�"!�!�#��$���!% � ��!���!&$�&%�'&( �� )*!'� $�

~ ( ) {( )/ ;σ σ χ γ= − − −���

���ML n n1 1 2 12 (6.109)

and µ µγ σ

~;= −

+−�

�����ML

MLt nn

12

1 (6.110)

~ ;µ µγ σ

= ++

−���

���ML

MLt nn

12

1 (6.111)

In the expressions above, [(1 + γ)/2; n – 1] means, [100(1 + γ)/2]th percentile of thecorresponding distributions with n – 1 (= ν) degrees of freedom.

The approximate sample size, n, to be employed in the experiments depends upon thevariance of σ and the desired width, ± ϖ of the confidence interval on the log mean life, µ.

N = Var (σ) (Kγ/ϖ)2 (6.112)

The same comments regarding the practical acceptability of this value of n in designing theageing experiments made earlier applies here also. Since the thermal ageing experiments do notindicate the exact terminal or end points, it may happen that, (for reasons other than the establishedCI’s), a larger sample size becomes mandatory.

���� ���� ������������������ �������������

In the preceding sections methods of acquisition and analysis of long-time failure or endurancedata on electrical insulation on complete samples have been compiled in detail. In many instances,it may happen that all the specimens are not run to failure and the experiments are terminatedprematurely due to constraints on time and financial expenditure. Often, one may wish to make apreliminary analysis of an ongoing experiment to see if the experiments are progressing the rightway, or need any midcourse correction. For this purpose, the experiments are allowed to continue,but rough estimates of the model parameters are made with the already accrued incomplete, (or inthis case called progressively censored) data.

The exponentially long-times involved in ageing experiments is due to the fact, that, forrealizing a complete data, all samples under test shall meet a destructive end point. Since the failurehere is an extremal process, the times to failure of the later half of the specimens are inordinatelylong. For reasons mentioned earlier, (see Chapter 2) it is not permissible to enhance the stressesbeyond certain established limits to induce very fast degradation in the specimens. Therefore, itoften becomes necessary to censor, or truncate or terminate the experiments before all specimensfail. The resulting data is called censored data. The analysis and validation of such are more involved,less accurate and requires substantial computational skills, besides expertise in mathematicalstatistics. Much of the subject presented in the following can be traced to the works of Bain & Antle[52, 53], Cohen [54, 55], Lawless [50] and Nelson [45].

There are three kinds of censoring, time censoring (Type- I censoring), number or failure censoring(Type-II censoring), mixed or multiple or hyper-censoring. There are certain guidelines to be followedbefore deciding upon truncating the experiments.

����!��!-���� -.$!)* ��%&���!% �������$��#�!� ���

In Type-I censoring, the experiments are terminated after, a predetermined time, tr, calledthe censoring time (the time that elapses between the start of the experiment and the instant oftruncation), subject to the condition that a certain minimum number of samples put to test shallfail. Here, ‘r’ is called the censoring level. As a thumb rule, of the total of n- specimens on test, 0.5nto 0.6n specimens in sample pack should have failed before truncation.

In a similar way, Type-II censoring can be described as a scheme, in which, the truncation iseffected after a certain number of specimens, nr, fail (number or failure censoring). To ensure this,the applied stresses should be so adjusted that a minimum time has elapsed before a decision oncensoring may be taken. From experience, about 1% of the expected life time, or around 2500 hoursof ageing, which ever is higher, is prescribed as a reasonable lower limit on the time.

With these preliminary remarks, the graphical method and working principles of ML methodsfor censored data analysis shall be described in the following for Type-I censoring. Data analysisusing other kinds of censoring can be performed in a similar way.

���/�� ��������� ����

The time censored data set, with a censoring level r, will have r values tr of failure times, which areknown and n – r values of survival times, which are unknown. The probability plotting of suchdata is done in an exactly similar way as for complete data except that n – r probabilities areplotted on the Y-axis at the same time coordinate, ln(tr) as illustrated in Fig. 6.12.

XXXXX

XX

y

x

t (Censoring time)r

r failures

(n – r) survivorsP

Newre

gres

sion lin

e

Ord

erst

atis

ticpr

obab

ility

; p=

(i–0.

5)/n

i

ln(t)

����������������������������� ����+�+�����������,������� 7!� ���� ������

The fitting of the line is not straight forward because the x-coordinates of n–r failures areunknown. Two possible schemes are available,

(i) If the LSR fit to r failures is linear and indicates a good adequacy of fit, a possibleextrapolation of this segment of the line may be considered for n – r survivors. Thisessentially means that r failure times are taken as a complete data set and regular LSRprocess is continued. It can be seen that this is not a correct approach since the effect oftruncation does not enter expressly into the LSR data processing.

��� � �!�"!�!�#��$���!% � ��!���!&$�&%�'&( �� )*!'� $�

(ii) The other alternative could be to use r-failure times as primary data and to fit a new LSRline to r-failure times along with the center of gravity of n – r survivor times, correspondingto the point P on the abscissa through tr as shown in the Fig. 6.12.

It is seen that both these methods have a tendency to grossly over estimate the later failuretimes and hence this method can only serve as a very rough estimation method. Analytical methodscan treat this anomaly much more effectively, and hence are adopted despite their computationalcomplexity.

���/�� ��������� ����

If, in an electrical stress ageing experiment, initially a total of n samples are used, at the censoringtime, tr, there are r failure times and (n – r) unknown times to failure. Assuming that the dataconforms to a Weibull probability distribution in two parameters, τ and � with usual notations andavoiding all details, in all censored data computations, the log-likelihood equations are expressedin two distinct parts, namely, the survivor part and the failed part.

The log-likelihood function, l, can now be written as;

l = Σ Σi

r

ii

i n r

nit

t t= = −

+ − − − ������

���

���

+ − ������1

1ln ( ) ln lnβ β β ττ τ

β β

(6.113)

where, the first summation runs over all failure times and the second over the censoring time. TheML estimates are arrived at, as before, by equating the log- likelihood equations to zero and solvingthe ensuing the coupled, nonlinear equations for τML and βML, the estimates of their true values, τand β. For the sake of brevity, only the final steps in the calculations are shown below.

∂∂τ

= − + ������

���

���

������������= = −

l t ti

ri

i n r

niΣ Σ

1

βτ

βτ τ

βτ τ

β

(6.114)

∂∂β

=������ + �

����� − ���

���

������

���

���

���

���

− ������

������= = −

l t t t t ti

ri i i

i n r

ni iΣ Σ

1

1β τ τ τ τ τ

β β

ln ln ln (6.115)

From these equations, τ can be eliminated and an equation, in β only, can be arrived at asshown below.

1 11

1

1

rt

t t

ti

r

ii

n

i i

i

n

i

ΣΣ

Σ=

=

=

= −ln( )( ) ln ( )β

β β (6.116)

It may be observed that the sum on the left hand side runs over all failure times while thoseon the RHS run over all n.

The ML estimate of β, βML is computed from the equations derived above, using appropriateiterative programs.

����!��!-���� -.$!)* ��%&���!% �������$��#�!� ��/

The ML estimate of τ, τML, is easily obtained using relation,

τML = Σi

n

it rML ML ML

=

−���

���

���

���1

1 1

( ) ( )β β β (6.117)

If the constraints on the censoring level are enforced properly as described earlier, the MLestimates of τ and β are reasonably accurate.

It should be mentioned that, linear analytical methods like, BLUE and BLIE can and havebeen, extensively used in processing censored data. Detailed treatment on these applications canbe found in the works of Nelson [45], Beck and Arnold [56], as also, Lawless [50].

���/�2 -����� � �!�� �;��

Construction of confidence bounds for the parameter-estimates from censored data has alwaysbeen a difficult issue for which no satisfactory solution has yet been offered. The uncertainty in theestimation due to the lack of knowledge on the un-failed units undergoing ageing experiments isthe major contributing factor in the inaccuracies introduced in the CI ’s. Many others have tried toaddress this aspect over the years and have suggested certain approximate expressions for intervalestimation when the level of censoring is not very high. Among others, Cohen [54, 55], Meeker[57], Lawless [50], Bain and Antle [53], have provided Tables for the coefficients appearing in theequations for CI.

The general theory of confidence intervals is quite complex and requires the deduction ofwhat is called the Fisher-Information Matrix (FIM). The elements of FIM in turn involves theevaluation of the expectations of the second partial derivatives of the log-likelihood functions inthe neighbourhood of true, or, at least their asymptotically true values of the parameters of thecorresponding distributions. Obviously, such estimates are not possible to obtain using incompletedata and hence the complications. While referring the readers to works of the authors mentionedabove for a thorough understanding of the subject, a suggestion for constructing approximateCI’s, using incomplete data, is made in the following:

A censored data set with a reasonably low level of censoring (say, 40 % or lower) can beimagined as a complete data set with the censoring time as the last failure time and with n = r. Inother words, the sample size can be assumed to be the number, r, of the samples which have alreadyfailed ignoring the samples which are still running. Now, this fictitious complete data is analyzedusing ML methods for uncensored or complete data. This procedure is repeated, successively, takingthe value of n as, n = r – 1 and r – 2. If the parameters of the models arrived at using these values ofn, in the likelihood equations with corresponding density functions, then, the estimates may beassumed to be unaffected by truncation. It is now possible to use the expressions for the CI’s forcomplete data. Although this procedure for arriving at the CI’s may, at times, be less than satisfactory,it is, nevertheless, the only way of avoiding laborious computations and frequent references touncommon statistical tables.

���� �������� ������ ���� �'� ��������

The Weibull density function possesses a survivor function [R(t)=1–F(t)] in a closed functionalform whereby the likelihood functions thereof can be written exactly. Since the log-normal density

��0 � �!�"!�!�#��$���!% � ��!���!&$�&%�'&( �� )*!'� $�

is exactly defined only between zero and infinity, the reliability, or, survivor function has to bewritten as:

l(ti; µ, σ) = Σ Σ Φi

ri

i r

n rit t

= =

−−���

���

���

���

+ −−�

�����

���

���1

11ln ln

σϕ

µσ

µσ

(6.118)

As in the earlier cases, the first sum runs over all failure times in the log-normal densityfunction, ϕ, and the second sum is taken over the predetermined censored time, tr, in the log-normal reliability and survivor function, Φ = 1 – F(t) is taken over n – r running times.

The function, l, can now be maximized by partially differentiating it w.r.t. the parameters, µand σ and equating them zero thus:

∂∂µ

= ������

−���

���

��

� �

+ ������

−���

���=

−lH

t r ti

n ri1

1σµ

σ σµ

σΣ (6.119)

∂∂σ

= ������

− −���

���

��

� �

− ������ + ���

��� + −

=

−l tH

t r rm t

i r

n ri i1

32 2

σµ

σµ

σ σ σµΣ � � (6.120)

in which, H(x) = φ(x)/[1 – φ(x)], is the standard log-normal hazard function, <t> is the log mean ofr times to failure, m is the corresponding standard deviation given by:

m = Σi

r it t

r=

−�

���

���1

212� �

(6.121)

The ML estimates of the parameters are obtained on solving these coupled non-linearequations using appropriate iterative solution techniques.

As regards the confidence intervals, one needs to evolve the Fisher Information and the‘Variance-Covariance’ matrices, the elements of which involve the mathematical expectations ofthe second partial derivatives of censored log-likelihood function, l, w.r.t. the parameters, evaluatedat the true values of µ and σ. These calculations are reported in [45] and are generally difficult toperform. Approximate computation of CI’s can however be made in a manner similar to thatindicated for Weibull data.

On certain occasions, inclusion of a third parameter, called the location parameter, in theprobability distribution may be warranted. The genesis of three parameter distributions has beendescribed in Chapter 1 and its estimation can be made based on the ML theory. This howeverintroduces additional computational and other complexities. The best way of circumventing thedifficulty is the following:

If there is any evidence for the existence of the third parameter, appropriate corrections areincorporated into the probability density and distribution functions by a trial and error algorithm.In the case of Weibull distribution, this is done quite effectively, by adding a quantity which is afraction of the scale parameter and checking whether the linearity of the probability plots arepreserved. This correction may then be used everywhere.

����!��!-���� -.$!)* ��%&���!% �������$��#�!� ��1

��� ����� ������ �

Experimental data on a large number of nominally identical specimens of electrical insulation,acquired even under nearly ideal conditions, are characterized by a large scatter, or, variability.The scatter may either be generic to material behaviour or be due to varying experimental conditions,or be equipment specific. In any case, it would, first of all be required to establish plausiblecorrespondences with the measured properties (dependent variables – yi) and the stress or durationof time over which the stress is applied (independent variables – xi). It is important to know, forfurther analysis, the linearity of the dependence between these variables. A measure of the statisticallinear dependence between the two variables is effectively quantified by what is called the coefficientof linear correlation (or, simply, Correlation Coefficient), r. Calculation of r, can be made and expressedin several ways.

���1�� -��� ������-� ���� ��

The correlation coefficient, r, can be expressed in terms of the parameters, b and d, of the tworegression lines given by the Eqns. (6.20) and (6.21) as;

r = bd (6.122)

However, this is a non-standard representation of r and is rarely used in data analysis assuch. There are other and more explicit ways of calculating the correlation coefficient as under:

If yi is nearly normally distributed about the line in Eqn. (6.20) the variance of y is given by;

sy2 = 1

1

2

ny y

i

n

iΣ=

−���

���

� � (6.123)

The correlation coefficient can also be represented by the variances of x and y thus;

s s ryx y2 2 21= −( ) (6.124)

Using Eqns. (6.122) to (6.124), r is formally written in one of the following standard forms,among many others, as:

r = s

s sxy

x y(6.125)

r = Σ

Σ Σ

i

n

i i

i

n

ii

n

i

x x y y

x x y y

=

= =

− −���

���

−���

��� −���

���

1

1

21 2

1

1 2

� �� �/

/(6.126)

��3 � �!�"!�!�#��$���!% � ��!���!&$�&%�'&( �� )*!'� $�

r = n x y x y

n x x n y y

i

n

i ii

n

ii

n

i

i

n

ii

n

ii

n

ii

n

i

Σ Σ Σ

Σ Σ Σ Σ

= = =

= = = =

− ���������

���

���

���

− ������

���

���

× − ������

���

���

1 1 1

1

2

1

2

1

2

1

2 1 2

[ ] [ ]

/(6.127)

r = xy x y

x x y y

−��

�� −��

��2

2

2

2� � � �

(6.128)

The following further remarks may be made to understand the use of correlation analysis incondition monitoring of apparatus insulation in service:

• These equations are a consistent set and the choice of any of them depends on thecomputational aids used. It is easy to see that, r is also a measure of ‘goodness’ of thelinear fit for the data.

• If all the uncertainties can be accounted for, then, the fit is truly linear, the regressionequations are deterministic, or algebraic and the parameters like a, b are free fromstochasticity. In such a case, r = +1 or –1. The case of r = 0 indicates a total lack of fit(uncorrelated data set), meaning that the variabilities remain entirely unexplained.

• There is no particular significance for r = – 1, except that it reveals a drooping line, a linewith a negative slope.

• Qualitatively, as specific to insulation statistics, a value of | r | < 0.6 indicates anunsatisfactory linear-fit, or, a poor correlation. Similarly, | r | ~ 0.8 and | r | > 0.9 arerespectively cases of fair and excellent degree of correlation.

���1�� �� ��6��B��-��� �����

Spearman’s correlation between two random variables, x and y, also called Rank Correlation, [39] isa concept in the assessment of strength of linear relationship between x, y with much computationallabour. In working out the degree of linear correspondence, it generally happens that the calculationsare both laborious and time-consuming. Besides this, the data itself may not be as accurate as onedesires it to be and the bi-variate distribution of x, y, many not conform to any known distributionfunction. In such cases, instead of using the raw data, directly, it would be sufficient to see if thereexists a linear relationship between the pairs of ‘ordered’ observations. This means, it is sought toestablish a linear correlation between the variables ranked in an ascending or descending order oftheir magnitudes, ignoring the sequence of their occurrence.

Let (x1, y1) , (x2, y2), . . ., (xn, yn) be a set of paired random observations which are replaced bytheir respective ranks, (x’1, y’1) , (x’2 , y’2), . . ., (x’n, y’n), then;

����!��!-���� -.$!)* ��%&���!% �������$��#�!� ���

Σ Σi

n

ii

n

ix y nn n

= =′ = ′ = + + + + =

+1 1

1 2 31

2....

( )

Σ Σi

n

ii

n

ix y nn n n

= =′ = ′ = + + + + =

+ +1

2

1

2 2 2 2 21 2 31 2 1

6� � � � ....

( )( )

(6.129)

n Σ Σ Σ Σi

n

ii

n

ii

n

ii

n

ix x n y yn n

= = = =′ − ′�

�����

= ′ − ′���

���

=−

1

2

1

2

1

2

1

2 2 2 112

� � � �( )( )

Also, if dj are the deviations, given by the squares of the differences between the ranks, then;

Σ Σ Σ Σ Σi

n

ii

n

i ii

n

i

n

ii

n

id x y x x y y= = = = =

= ′ − ′ = ′ − ′ ′ + ′1

2

1

2

11

2

11

1

22( ) (6.130)

On rearrangement and simplification, the equation above takes the form;

Σ Σ Σ Σi

n

i ii

n

ii

n

ii

n

ix y x y d= = = =

′ ′ = ′ + ′ −���

���1 1

2

1

2

1

212

(6.131)

Substituting equations 6.129, 6.130 and 6.131 in 6.127 and 6.128, the coefficient of rankcorrelation, rrank, becomes;

rrank = 1 – 6

11

2Σi

n

id

n n=

−( )(6.132)

It is of interest to note that, even when the data is not particularly homogeneous, thecoefficients of correlation and the rank correlation are reasonably close to each other, to within~ 10%. In calculating rrank, it may happen that there may be ‘ties’ in the ranking order. That is, oneor more of the entries in x, or, y, or both may have the same values. In such cases, the mean of theirranks is assigned to each entry. As an example, if two values of yj, say the fifth and the sixth ranksare tied, they are both assigned {(6 + 5)/2}th or, (5 ½ )th rank.