measurement uncertainties - physics & astronomy

1 Data Analysis

(Adapted–stolen, really–from materials created by Amin Jaziri and David M.Harrison)

1.1 Introduction

No measurement can be perfectly certain. No measuring device is infinitelysensitive or infinitely precise. For example, knowledge of length determinedwith a meter stick or ruler divided into centimeters and millimeters is limitedto fractions of a millimeter. Such measurement is an estimate of the positionbetween millimeter lines on the scale, but even this is less than certain becausethe reading also depends on exactly how you align the measured object withthe scale and your vantage point while reading it.

Repeated measurements that span the true value (however this is known)are said to be accurate. Those that have very little spread among them are saidto be precise. High precision makes it more difficult to be accurate. Precisionis related to statistical, or random, uncertainties, and accuracy is related tosystematic uncertainties. We will discuss both sorts of uncertainty later in thisnote.

1.2 Reporting measured values

Science, in contrast to almost every other branch of knowledge, attempts toquantify the degree of uncertainty associated with any statement of fact.1 Stat-ing the uncertainty provides a sense of the range in which the “true” value of aquantity being measured probably lies. A measurement should always be accom-panied by the uncertainty of the measurement, both labeled with appropriateunits. Thus, scientific measurements include appropriately precise numericalvalues, uncertainties of matching precision, and units to label what in fact wasmeasured. Results lacking any of these are useless, because they are uninter-pretable. As students in this course, you are required to provide all of themevery time you state a result.

Note that an uncertainty is not a mistake, which is the result of inattentionor other carelessness on the part of the experimenter, but rather an inadvertentand inevitable part of the measurement process. Mistakes (sometimes referredto as “experimenter error,” for that is what they are) are correctable in realtime and therefore inexcusable: it is not an acceptable explanation of results.You must be careful, pay attention, and check your results as you go along.

It is a universal convention in scientific work to report a numerical result withthe number of significant figures, digits, or decimal places up to, and including,but no more than, the first uncertain one. This immediately tells the reader theapproximate level of uncertainty.

1Another name for “uncertainty” is “error,” and the two terms are often used interchange-ably. This instructor prefers the former to the latter.

1

To return to our example: in reading a meter stick or ruler on which thesmallest scale divisions are millimeters, measurements (in cm) would be reportedas, say, 21.76 cm or 5.30 cm or 72.15 cm, etc. If the reading is exactly sevencentimeters, it is reported as 7.00 cm and not as 7 cm or 7.00000 cm. If theedges of an object were particularly uneven such that trying to read the rulerto a tenth of a mm is hopeless, and the best estimate can be made only to thenearest whole mm, a measured value would be reported as 48.2 cm or 1.3 cm,etc. This indicates an uncertainty at the level of tenths of a cm (whole mm’s)rather than at the hundredth of a cm level as the scale itself might indicate.Thus, the scale’s precision is only one factor determining the precision of ameasurement. What to report is a judgment call, and one of the purposes ofthis course is to help you develop the capacity to judge.

This appropriate use of significant figures tells only order of magnitude ofthe uncertainty (to the tenths, hundredths, or whatever). It indicates that thenext-to-last figure is the one in which we can have considerable confidence, whilethe last one is uncertain, although it represents the best estimate we can make.The size of the uncertainty is not conveyed by this convention, but must beestimated and reported separately.

1.3 Precision of significant figures

When reporting a measurement, you express its relative precision in terms ofthe number of digits or “ significant figures,” in the sense that the fractionaluncertainty in the last figure becomes smaller as the number of significant figuresincreases. The place of the least significant digit gives the absolute precision.

If, for example, the length of a cylinder is reported to be 20.64 cm, the claimis that the length of the cylinder is known to the level of tenths of a millimeter;this is the absolute precision of the measurement. In this case, because there arefour significant figures, the relative precision is therefore a few parts in a coupleof thousand. Even if you wrote this number in terms of kilometers as 0.0002064km, it still has the same precision and number of significant figures. The zerospreceding the 2 are used only to indicate the position of the decimal point. Thezero between the 2 and 6 is a significant figure, but the other zeros are not. Ifyou reported a length as 0.78 cm, you still claim to know it to within tenthsof a millimeter but to just two significant figures. This measurement has beenmade to the same absolute level of precision, but the relative precision is less.Clearly, it takes better precision to know a larger measurement to a certain levelof uncertainty than to know a smaller measurement to the same level: while thefractional uncertainty in the 20.64-cm measurement is a few parts in a coupleof thousand, the fractional uncertainty in 0.78 is no better than a few parts inless than a hundred.

Writing numbers in scientific notation helps remove some of the ambiguityof zeros while emphasizing the relative precision of a number. For example,20.64 cm can be rewritten as 2.064× 101 cm and 0.0002064 km can be writtenas 2.064 × 10−4 km. Now, in both cases, it’s easy to see that there are fourfigures considered significant. Similarly, 0.78 cm can be written as 7.8 × 10−1

2

cm, and it becomes immediately obvious that is has two significant figures, andis relatively less precise than the former.

It’s usually a simple matter to determine the greatest possible precision thatcan be recorded for a measurement (if, for example, your meter stick is ruled inmillimeter divisions, then your precision is at best a fraction of a millimeter),but you must take into consideration all aspects involved in the making of themeasurement to determine a realistic level of precision.

Furthermore, additional difficulties may arise when measurements are usedin calculations, which can produce a large number of figures that might seemsignificant but really aren’t. Calculators, in particular, often prove a bane tounderstanding, because they produce all kinds of figures which, unfortunately,tend to be written down without consideration. Calculators don’t cause er-rors (assuming all the numbers have been correctly entered), but mindlesslyrecording all the figures of the result gives a physically incorrect answer. Nomathematical computation may produce a result whose absolute precision isgreater than that of the quantities used; the result can be no more precise thanthe least precise quantity that went into the calculation. As a general rule,though, it is better when computing to carry too many figures than not enough,and then to round-off to the appropriate precision later.

This is more or less straight forward when adding or subtracting, since allthe numbers involved have to be the same sort of quantities, that is they haveto have the same dimensions, such as length, time, mass, or volume. Simplyenter the experimental values (being sure that they are all in the same units)to the absolute precision known and then round to that of the least precisequantity. For example, if you combined four masses, 806.5 g, 32.03 g, 0.06523g, and 125.0 g, you would be adding four numbers each with four significantfigures. But plugging these values into your calculator leads to something thathas eight figures. The worst absolute precision is a tenth of a gram, so you mustround to this, giving a total of 963.6 g. In this case, the worst relative precisionis around a part in a thousand, and simple rounding produced an answer thatroughly matches both senses of precision.

Multiplication and division are not so obvious because they may involveunrelated quantities. The least precise result still determines the precision ofthe final result, but it’s not so clear what the absolute precision of the final resultis, because its units may be different from those entering into the computation.Suppose your measurements for the sides of a rectangle were 38.2 cm and 21.4cm, respectively. By the convention, the 2 in the first number and the 4 in thesecond number are the uncertain figures. If we want to determine the area ofthe rectangle, we multiply the two measurements together. A calculator wouldgive the value of 817.48 and the units would be cm2. But the 7, 4, and final8 of the result each involves an uncertain number (for example, the 8 resultsfrom the multiplication of the uncertain 2 of the first side by the uncertain 4 ofthe second). Any result in which an uncertain number is involved must itselfbe uncertain. Here, since the 7 is uncertain, any figure following it must becompletely meaningless. If you reported 817.48 cm2 as the area of the rectangle,you would be misleading the reader by implying that the final 8 was the first

3

uncertain figure and all the others were certain. This would be untrue. Thetruth of the situation would be conveyed by reporting the result as 817 cm2 anddropping the other figures. In this way, the relative precision of the outcomeroughly matches that of the least (relatively) precise factor.

Yet, even this is inadequate. We know by looking that the last digit isuncertain, but unless told explicitly the magnitude of this uncertainty, we arestill in the dark about the meaning of the result. We discuss ways to clarify thisquestion in what follows.

1.4 Statistical and Systematic Uncertainties

Repeated measurements, even of the same quantity, tend to vary. Variationsthat distribute indeterminately both in magnitude and sign about a centralvalue–and thus average to zero–are known as “statistical” or “random” uncer-tainties. A basic tenant of probability and statistics is that if only randomuncertainties arise, then the average of more measurements is a better estimateof the actual value than the average of fewer measurements. The quantity givenof statistical uncertainty should indicate the range of results that would be ob-tained from multiple measurements. Typically, this quantity characterizes themeasuring device(s) and process. The most common method for estimating thisquantity is simply to repeat the measurement many times.

If measurements are all shifted in both magnitude and direction from the“true” value, the uncertainty is called “systematic.” Repeating measurements,without identifying and rectifying the cause of the shift, will not improve knowl-edge of an actual value. The difficulty, of course, is that the “true” value is usu-ally unknown (why else measure it?), and so recognizing and either quantifyingor rectifying systematic effects can be, and usually are, very challenging.

The most common methods for discovering and dealing with systematic ef-fects i) rechecking the calibration of the measuring device after the measurement(to see if it changed from the calibration that was done before the measurement,or if it was done wrong), and ii) altering slightly and in carefully controlled waysaspects of the measuring process to see how sensitive the outcome is to each ofthese aspects.

Almost all measurements suffer from both random and systematic uncer-tainties. When these can be estimated independently, they should be quotedindependently.

1.5 Uncertainty in Individual Measurements

An irreducible source of uncertainty is the precision or resolution of the mea-suring instrument whose values are typically indicated on a dial or scale. Thecoarseness of the divisions on an indicator limits the absolute precision withwhich a value can be determined. One ends up guessing to a fraction of a di-vision, and the variation of guesses is typically random. This, then, is a sourceof statistical uncertainty. Common practice assigns an uncertainty due to theinstrument of half the smallest division on the indicator. But this is probably

4

just one component of the total uncertainty. Consider the case of a digital stopwatch, which gives readings to 0.01 second. The inherent uncertainty of thewatch would ordinarily be estimated to be 0.005 second, but human reactiontime associated with starting and stopping the watch is roughly 0.05 second,also a statistical effect. Thus, the uncertainty associated with a stop-watch mea-surement must be at least 0.05 second, rather than 0.005 second as the scaleresolution might suggest. If the watch were fast or slow compared to a standardclock, then a systematic uncertainty would have to be cited, as well. Carefulconsideration, you should see, is often necessary to identify the dominant sourcesof uncertainty in a measurement.

1.6 The RMS Deviation

Perhaps the most common way of specifying statistical uncertainty is with thequantity called the RMS deviation, where RMS stands for Root Mean Squared.In words, this quantity measures a kind of average discrepancy around the mean,or average, of values. Given a set of N values of xi, the average, or mean, valuex is defined as

x ≡ 1

N

N∑i=1

xi. (1)

The RMS deviation of this distribution, denoted σ, is then defined as

σ ≡

√√√√ 1

N

N∑i=1

(xi − x)2. (2)

If the variation of values happens to be normally distributed (like a so-calledbell curve) about the mean, then sigma is called the “standard deviation.” IfN is small (< 30), then this quantity actually gives too small an estimation ofthe uncertainty and the factor 1/(N − 1) is used instead of 1/N in Equation 2and the symbol typically used is s rather than σ.

s ≡

√√√√ 1

N − 1

N∑i=1

(xi − x)2. (3)

As is readily shown, an expression for σ equivalent to Equation 2 is:

σ =

√x2 − x2 (4)

=

√√√√ 1

N

N∑i=1

x2i −

(1

N

N∑i=1

xi

)2

.

5

As an example, consider the following two sets of measurements taken withtwo different devices (arbitrary units):

Set 1: 3.02 4.21 2.38 2.50 2.89

Set 2: 2.85 3.15 3.12 3.00 2.88

The average for both sets is the same (3.00 in arbitrary units). The RMS devi-ation for the first set is 0.65 (0.73) in arbitrary units, while the RMS deviationfor the second is 0.12 (0.14) in arbitrary units [the values in parenthesis, moreappropriate in this case, use Equation 3 instead of Equation 2]. The seconddevice is said to have greater precision than the first. If the number of mea-surements N is increased, the values of x and σ tend toward limiting valueswhich are independent of N . The limiting value of x is presumably that of thephysical quantity being measured. The values of the uncertainties depend onthe measurement technique employed.

The implication of this is that the more measurements we make, the moreconfident we become of the central value and the uncertainty of an individualmeasurement: after many measurements, neither the mean nor the RMS willchange substantially even if many more measurements are made. Thus, as thenumber of measurements increases, confidence in the value of the central valueincreases regardless of the size of the uncertainty of an individual measurements.We indicate the level of confidence by reporting not RMS as the statisticaluncertainty of the result, but rather a quantity called the “RMS (or uncertainty)of the mean,” σx, given by

σx =σx√N, (5)

where σx is the RMS deviation of the individual measurements x, and N is thenumber of measurement. While σx becomes smaller as N grows larger σx willbecome more stable. Its fractional, or relative, uncertainty decreases as doesthe uncertainty in the mean:

σσxσx

=1√

2(N − 1)(6)

σx is the statistical uncertainty that should be reported when the centralvalue is an average:

(x± σx) in some units

(assuming there is no systematic uncertainty).

Note that the standard way to report results, which include systematic un-certainties, is:

6

(central value± statistical uncertainty ± systematic uncertainty) units.

It is essential that the number of decimal places for the central valueand for the uncertainty is the same. This must always be the case. Onecannot know the uncertainty to more decimal places (absolute precision) thanthe central value or vice versa.

If the statistical uncertainties are Gaussian, or normally distributed (see be-low), as they often are, the RMS deviation, or standard deviation, has a veryspecific, probabilistic interpretation. Given a mean of x and a standard devia-tion σx (standard deviation of the mean σx), the probability that an individualmeasurement (another determination of the mean) will differ from the meandecreases as the magnitude of this difference increases. This probability can berelated to increments of standard deviations [see Table 1].

Table 1: Probability that a value xi differs by n standard deviations from themean x.

n, Number Probability Probabilityof standard |xi − x| > nσ |xi − x| < nσdeviations (%) (%)

1 31.7 68.31.64 10.0 90.0

2 4.55 95.453 0.27 99.734 0.0065 99.9935

It is not statistically likely for a measured value to differ from a “true” valueby more than 3− 4σ, but not unusual for one to differ by 1− 2σ.

So-called expected values are numbers (often dimensioned–i.e., with associ-ated units) considered to be correct, but, in fact, can be changed by experiment.

The importance of knowing the uncertainty of a measurement is starklyrevealed when it is to be compared with an expected value: agreement or dis-agreement depends not on the absolute difference between values, or even therelative, or percent, difference, but on the difference relative to the uncertainty,

|xmeasured − xexpected|RMS

,

for example.The result of this calculation is the difference, in units of standard deviations,

between measured and expected values. If the RMS is believable, then a largerelative deviation may indicate a problem with the expectation–or a systematicerror in the measurement. Refer to section 1.9.6 for a discussion of confidenceintervals.

7

Consider the results of two investigations of a certain quantity which a the-ory, for example, predicts [see Figure 1]. First of all, note that experiments Aand B disagree with each other as to the central value (as indicated by a dot)of their results, with B’s result closer to the predicted value. But since theuncertainties of the experiments (as indicated by the error bars) overlap, thisdisagreement is not considered significant. Experiment A, however, deviatesfrom the prediction by less than 2σ, and therefore cannot be said to disagreesignificantly with the prediction, while experiment B deviates by more than 3.5σfrom the prediction, which is a significant disagreement. So, even though B iscloser to the prediction than A, and A and B are consistent within uncertainties,B is said to disagree with the prediction, while A does not. Again, what mattersis not the absolute difference, but the difference in units of uncertainty.

Data Points

Expe

rimen

tal v

alue

s

A

valueExpected

B

Figure 1: Data points from experiments A and B. The dots indicate centralvalues and the lines indicate 1σ uncertainties.

1.7 Outliers and Robust Estimators

The mean and the RMS (standard deviation) are not robust estimators. Theyare sensitive to–and therefore change dramatically in–the presence of even avery small number of extremely divergent data points. It’s extremely dangerousto discard such outliers.

A more robust measure of central tendency, when such data points arepresent, is the median, the middle value (or, if N is even, the average of themiddle two values) of data set ordered by magnitude. Similarly, a more robustmeasure of dispersion or spread of the data is the interquartile range (IQR), thedifference between the third quartile (the value that is less than 1/4 of the data

8

and greater than 3/4 of the data) and the first quartile (the value that is greaterthan 1/4 of the data and less than 3/4 of the data). [Note that the median isthe second quartile.]

A result using the median and interquartile range is quoted as

median± 1.58× IQR√N

, (7)

The median and the interquartile range are much less sensitive to rare di-vergences in the data, but more difficult to interpret. In particular, the intervaldefined by Equation 7 can not be interpreted probabilistically, as can one definedaround the mean with Equation 5 [see Section 1.9.6].

1.8 Error Propagation

Frequently, the goal of an experiment requires arithmetically combining the re-sults of different sorts of measurements. For example, determining the averagevelocity may require measuring the displacement and the associated time inter-val with different instruments and then dividing the former by the latter. Wemay determine the statistical uncertainty of the displacement and the changeof time separately, but what then is the uncertainty of the resulting velocity?To get this, we perform what is known as error propagation.

Let us say that the final result, z, depends on two, independent sets ofmeasurements, x and y, according to some functional relationship f :

z = f(x, y). (8)

Knowing the functional relationship f as well as the uncertainties of x and y,σx and σy, respectively, we determine the uncertainty of z, σz, by

σz =

√(∂f

∂x

)2

σ2x +

(∂f

∂y

)2

σ2y, (9)

where ∂f∂x is the partial derivative of f with respect to x (that is, as if x is the

only variable in f , all other terms in the function are treated as constants). Andsimilarly for y. This expression can be extended to any number of independentmeasurements included in the relationship.

Suppose, for example, that z = xy2. Then, ∂z∂x = y2 and ∂z

∂y = 2xy, so

σz =√y4σ2

x + 4x2y2σ2y = xy2

√(σxx

)2

+

(2σyy

)2

.

1.9 Probability Density Functions (PDF)

A probability density (distribution) function (PDF) describes the outcomes ofa series of otherwise identical measurements. More specifically, it describesthe relative likelihood or probability of obtaining a certain value for a given

9

measurement. The description amounts to the shape, most probable value(s),and width of the distribution function.

The choice of applicable PDF–in particular, its shape and width–is not oftenobvious, and, in fact, may require a good deal of (educated) guesswork. Experi-ments are carried out to verify an expected PDF or to discover it. In the formercase, an experimental distribution that differs significantly from the expectedPDF indicates that either the expectation or the measurements are wrong.

1.9.1 Triangular PDF

A fair die has six faces and, when rolled fairly, each face is equally likely toshow. If two fair dice are rolled fairly, 6 × 6 = 36 different combinations canappear, all with equal likelihood. Total values (the sum of the values on eachface) however, are not equally likely, because some can be reached by morecombinations than others. For example, 2 and 12 can occur in only one wayeach, 1 + 1 = 2 and 6 + 6 = 12, respectively, while 6 and 8 can each showup five different ways, 1 + 5 = 2 + 4 = 3 + 3 = 4 + 2 = 5 + 1 = 6 and2 + 6 = 3 + 5 = 4 + 4 = 5 + 3 = 6 + 2 = 8, respectively. Therefore, for each fairroll of two dice, one would expect a 2 with only a 1/36 ≈ 3% probabilityandsimilarly for 12but a 5/36 ≈ 14% chance for 6 and 8, respectively. In Table 2,we summaries the chances of getting specific values from a fair roll of two fairdice.

Table 2: Value probabilities from fair rolls of two fair dice.

Value Possible Combinations Number of ProbabilityCombinations

1 0 02 1 + 1 1 .033 1 + 2, 2 + 1 2 .064 1 + 3, 2 + 2, 3 + 1 3 .085 1 + 4, 2 + 3, 3 + 2, 4 + 1 4 .116 1 + 5, 2 + 4, 3 + 3, 4 + 2, 5 + 1 5 .147 1 + 6, 2 + 5, 3 + 4, 4 + 3, 5 + 2, 6 + 1 6 .178 2 + 6, 3 + 5, 4 + 4, 5 + 3, 6 + 2 5 .149 3 + 6, 4 + 5, 5 + 4, 6 + 3 4 .1110 4 + 6, 5 + 5, 6 + 4 3 .0811 5 + 6, 6 + 5 2 .0612 6 + 6 1 .0313 0 0

Figure 2 is a histogram of the number of combinations that produce each ofthe values in Table 1. Note that if two fair dice were rolled fairly 36 times, theexpected distribution would be that of Figure 2, as well. Note, too, in the case

10

of rolling dice, that the measured value is exact but the values themselves aredistributedthe outcome of any roll can be guessed only probabilistically.

Figure 2: Two-dice combinations that give certain values.

Figure 3 shows the probability distribution of the values from fairly rollingtwo fair dice. As can be seen, the probability distribution of values of two fairdice, fairly rolled, is a triangle. The apex of the triangle is at the maximumprobability of 0.17, which is that for rolling a 7. For a symmetric distributionlike this one, the range from the value of the maximum probability to the limitsof the distribution (where the probabilities become always zero) is known as thehalf-width, a. In this case, a = 6, 7− 1 or 13− 7. Due to its symmetry, it’s easyto see that the average value, or mean, of this distribution is 7. The variance is6. In general, the variance of a triangular distribution is given by

σ2triangular =

a2

6, (10)

and, therefore, the standard deviation by,

σtriangular =√σ2

triangle =a√6≈ 0.41a. (11)

The triangle of Figure 2 can be described by the function,

PDFtriangular(value) =

136 (value− 1), 1 ≤ value ≤ 7136 (13− value), 7 < value ≤ 130, otherwise

. (12)

Take note that assigning a half-width is a judgment call, one that requires acompromise: on the one hand, you want to be absolutely sure the true value isincluded in the range of the full-width (accuracy); on the other hand, you wantto minimize the uncertainty of your result (precision).

11

Figure 3: Probability distribution when fairly rolling two fair dice.

You may also note, by adding the probabilities in Table 2 or directly calcu-lating the area of the triangle, that the area under the triangle is equal to 1–asis the area under all probability density functions.

1.9.2 Binomial PDF

As we mentioned, Figure 2 shows not only the number of ways different valuescan appear in a roll of two dice, it also displays the expected distribution ofvalues from 36 rolls of the dice. Thus, reading off from the histogram (or fromTable 2), if we fairly rolled two fair dice 36 times, we’d expect to get the value3 twice and the value 7 six times.

Of course, if we do the experiment, we rarely get exactly what we expect.We might ask, how likely it is to get the expected number. The answer to thisquestion is given by the binomial distribution, which tells us the probability,P (n), of getting the result of interest n times in N trials, when the probabilityof getting the result on any given trial is p:

PDFbinomial(n) =

(Nn

)pn(1− p)(N−n), (13)

where (Nn

)≡ N !

n!(N − n)!(14)

is the so-called binomial coefficient, the number of unique ways n successes canbe attained in N trials.

The average number of successes, the standard deviation, and the standarderror of the mean are given by,

n = Np (15)

12

σn =√Np(1− p) (16)

σn =

√p(1− p)N

(17)

It is important to note that σn ∼√N and σn ∼ 1/

√N . This is a general

result of any counting experiment: if the counting of integer values fluctuatesrandomly around a central value, then the uncertainty of the counting is ∼

√N ,

and the error of the mean count is ∼ 1/√N .

From Equation 13, we can say that the probability of getting 3 twice in36 rolls is P3(2) = (36!/(2!34!) × .062 × .9434 ≈ 0.28, while the probability ofgetting 3 once or three times is P3(1) ≈ 0.24 and P3(3) ≈ 0.20, respectively–notso different. Obviously, even though the expectation is two 3s in 36 rolls, onewould not be surprised to get one or three 3s in 36 rolls. Table 3 gives theprobabilities for getting any number of 3s–from 0 to 36–in 36 rolls. Obvious, if36 rolls produced more than ten 3s, one could justifiably question the fairnessof the dice or the rolls, or both.

Table 3: Probability of getting n 3s in 36 fair rolls of two fair dice.n P3(n) n P3(n) n P3(n) n P3(n)0 0.11 10 3× 10−5 20 1× 10−15 30 3× 10−31

1 0.24 11 5× 10−6 21 5× 10−17 31 4× 10−33

2 0.28 12 6× 10−7 22 2× 10−18 32 4× 10−35

3 0.20 13 7× 10−8 23 8× 10−20 33 3× 10−37

4 0.11 14 8× 10−9 24 3× 10−21 34 2× 10−39

5 0.04 15 7× 10−10 25 8× 10−23 35 6× 10−42

6 0.01 16 6× 10−11 26 2× 10−24 36 1× 10−44

7 0.004 17 4× 10−12 27 6× 10−26

8 0.0009 18 3× 10−13 28 1× 10−27

9 0.0002 19 2× 10−14 29 2× 10−29

Obviously, a similar table could be created for the number of 7s in 36 rolls,but a plot of the resulting distribution is informative [see Figure 4]. Notice thatwhile six 7s might be expected in 36 rolls, one shouldn’t be at all surpised to getfour, five, seven, or eight 7s, or even two, three, nine, ten, or eleven. Gettingnone, or fifteen or more, might be suspicious.

Notice, too, how closely the normal distribution (see below) with a mean andstandard deviation the same as the binomial distribution matches the binomialdistribution. This is a general result, as well: if N is rather large (typicallyN > 20) and p is not too close to 0 or too close to 1, then a Gaussian PDFbecomes a good approximation for a binomial distribution.

13

Figure 4: Probability distribution of getting n 7s in 36 fair rolling two fair dice.A Gaussian PDF (see below) with x = 6 and σ =

√5 is superimposed.

1.9.3 Rectangular (Uniform) PDF

A measurement (measurand) by a digital instrument is rounded or truncated tothe resolution of the instrument, and the true value could lie anywhere betweenthe next highest and next lowest display values. For example, if a certainohmmeter rounds and displays to the nearest 1/10th of an ohm, say, 99.1 Ω,then the true value of the resistance could lie anywhere between 99.05 Ω and99.15 Ω. There is no way, with a digital instrument–as there is with an analoginstrument–to know if the value is really closer to one end of this range or theother. The probability distribution is flat, or uniform, or rectangular, betweenthe two extremes. Graphed, such a distribution appears to be a rectangle [seeFigure 5].

For a uniform distribution with half-width a, the variance

σ2uniform =

a2

3(18)

so,

σuniform =a√3≈ 0.6a, (19)

and the PDF takes the form,

14

Figure 5: Probability distribution for a digital voltmeter readout (to 0.1 V ).

PDFuniform(value) =

12a , value− a < value < value + a0, otherwise

. (20)

If this were the only uncertainty for this measurement, then the result wouldbe reported as voltage = 99.10 ± 0.03 V . But this is uncertainty representsjust the intrinsic precision of the instrument, σintrinsic. Digital instrumentshave a limited accuracy or and therefore a finite likelihood of systematic errordue to the instrument alone. This accuracy uncertainty is typically given in aspecification as a percentage plus some number of least significant figures, forexample, ±(0.5% + 1), which says that the accuracy is good to 0.5% plus 1 inthe least significant digit. For a digital voltmeter reading of 99.1 V , this givesan accuracy uncertainty of σaccuracy = 0.6 V .

The two sources of uncertainty are independent of one another, and may bereported separately,

±σintrinsic ± σaccuracy, (21)

or combined into a total uncertainty, formed by adding them in quadrature:

σtotal =√σ2

intrinsic + σ2accuracy. (22)

In our case, the accuracy uncertainty is 20 times larger than the intrinsic uncer-tainty, and completely dominates the final results, voltage = 99.1±0.6 V . Note

15

that it’s seldom the intrinsic precision of the instrument–digital or analog–whichsets the scale of the uncertainty.

1.9.4 Cosine PDF

With an analog measuring instrument, the distance from a scale marker is es-timable. The thermometer reading in Figure 6 is clearly closer to 20 C than30 C or even 25 C. In fact, one might reasonably conclude that the readingis between 21 C and 23 C, say, 22 C. One can be pretty certain that thereis no chance that the reading is less than 20 C or more than 24 C. Clearly,the probability distribution for reading this thermometer is not uniform (flat),but more triangular, with the apex at, say, again, 22 C, and probably reachingzero at 20 C and 24 C.

Figure 6: Cartoon of a centigrade analog thermometer.

For a measurement with finer scaling (see Figure 7), one can estimate to moresignificant figures, while still acknowledging a range of decreasing probabilityboth higher and lower than an estimated central value of, say, 22.3 C.

In both cases, it may very well be appropriate to suppose that the certaintyassociated with the reading is distributed triangularly. But you might reasonablybelieve that you know the value more precisely than a triangular probabilitydistribution would indicate, in which case, you might prefer that the distributionlook like Figure 8, a cosine probability density function of the thermometer ofFigure 6, assuming T = 22 C and a = 2 C.

With a mean value, x, the cosine probability density function has the form

PDFcosine(x) =

12a

[1 + cos

(π + x−x

a

)], x− a < x < x+ a

0, otherwise. (23)

The variance, found by integration, is

16

Figure 7: Stock photograph of an analog thermometer.

σ2cosine =

a2

3

(1− 6

π2

), (24)

so

σcosine ≈ 0.36a, (25)

which, to one significant figure, is no different from σtriangular (see Equation 11).

1.9.5 Gaussian (Normal) PDF

Measurements of the same object with the same tool in the same controlledenvironment tends to reproduce results well within the precision and accuracyof the instrument. However, measurements of the same parameter of differentobjects or under varying conditions will likely produce results distributed morebroadly than the total uncertainty of the instrument alone might lead you toexpect. If this scattering or dispersal of results is due to (more or less) randomeffects then the results are likely to distribute themselves in a bell or normalshape, known as a Gaussian distribution. The Gaussian PDF takes the form

PDFnormal(x) =1√2πσ

e−(x−x)2

2σ2 (26)

Here, x is the mean as estimated from the data, and σ is the standard devia-tion, also estimated from the data, which indicates the width of the distributiondue to random (often referred to as statistical) effects [see Figure 9]. It is, as wewrote above, the uncertainty associated with individual measurements, not theuncertainty of the mean, which, rather, is given by Equation 5.

Notice that the half-width, a, does not show up in this function. This isbecause the distribution is due to random effects which can, in principle takeany magnitude. That is, a→∞.

17

Figure 8: Cosine probability density function for the first analog thermometer,assuming T = 22 C and a = 2 C.

1.9.6 Confidence Intervals

While each PDF discussed is characterized by a width or uncertainty, σ, themeaning–the probability that the true value is within plus or minus σ of themeasured value–in each case differs. This so-called one-sigma confidence intervalbounds a 58% probability for the uniform distribution, 65% probability for thetriangular distribution, and 68% probability for the Gaussian distribution.

It is rare that wider intervals than ±σ are ever invoked for triangular orrectangular (uniform) PDFs. In the case of the Gaussian probability function,however, 95% (±2σ) and 99% (±3σ) intervals may be even more frequentlyemployed than the 68% confidence interval. I some fields of physics, a discoveryclaim requires 5σ or 99.99997% confidence. Refer to Table 1 for more details.

1.10 Data with Two Variables; Fitting

In experiments where two variables are measured, it is usually the case that theexperimenter varies one of them (the independent variable) to see what happensto the other (the dependent variable). The result of the experiment, then, is acollection of ordered pairs of values for the independent and dependent variables.

The first step in analyzing such data is to graph the pairs in a scatterplot,with the independent values along the abscissa (x-axis) and the dependent valuesalong the ordinate (y-axis). Error bars in each dimension should be includedwith each point, assuming uncertainties are non-negligible, which should always

18

Figure 9: Normal distributions with different means and standard deviations.

be the case with the dependent variable (each point should display error barsparallel to the y-axis, up to the resolution of the graph) [see Figure 10].

To determine the nature and strength of a relationship between the indepen-dent and dependent variables, the data should be fit to a curve [see Figure 11].As there are many sorts of curve, which to use can depend on expectation oran educated guess based on what the data look like. One constraint is that thenumber of parameters of the fit [a straight line is a two-parameter fit (slopeand intercept), as is an exponential (amplitude and coefficient of the exponent),while a parabola is a three-parameter fit] may not be greater than the numberof data points. That is, the degrees of freedom (dof) = # of data points - # fitparameters ≥ 0.

Choosing a reasonable curve when not guided by theory is often problematic.Only linearly correlated (straight-line) data can be identified by eye, and eventhat can be tricky [see Figure 12].

The use of computers has made the exploration of different possibilities lessonerous. The typical fitting package most likely uses a method call least-squaresto determine the parameters of the curve that best fit the data.

The generic curve will have the form y = f(x). The data set consists of Nordered pairs of x and y values: (x1, y1), (x2, y2), . . . , (xN , yN ). For each valueof x, xi, the function provides a value of y, yi, that sits on the curve, but whichmay or may not coincide with the value of y, yi, in the data.

If one defines the residual, ri

ri ≡ yi − yi (27)

then, a good fit would result in∑ri = 0, as with deviations from the mean.

19

Figure 10: Scatterplot with error bars.

Also as with squared deviations from the mean (recall the calculation of thevariance and standard deviation),

∑r2i 6= 0, and so may be employed as a

measure of fit quality.∑r2i is commonly called the sum of squares (or sum of

the square residuals) and designated ssr.A common fitting technique is referred to as least-squares fitting, because it

seeks parameters of the curve function that minimize ssr. This apparently isthe method of Excel trend lines and python’s numpy.polyfit.

A limitation to this sort of least-squares fitting is that it doesn’t take intoaccount uncertainties in the data. This is fine if all each data point has the sameuncertainty, but if they differ, the more certain points should influence the fitmore than the less certain ones. Another limitation is that there’s no objectiveinterpretation of the minimum ss, other than smaller is better than bigger.

The second limitation can be overcome with the use of the coefficient ofdetermination, R2. Defining the total sum of squares, sst ≡

∑(yi − y) (squared

deviations from the mean),

R2 ≡ 1− ssrsst

(28)

Obviously, R2 ≈ 1 indicates a good fit, while R2 ≈ 0 indicates poor agreementbetween data and curve.

Both limitations can be surmounted by weighting each ri by the inverse ofthe uncertainty of the respective data point, yielding a quantity known as a χ2

(chi-square):

20

Figure 11: Linear fit to made-up data.

χ2 ≡N∑i=1

(yi − yi)2

σ2yi

. (29)

Minimizing χ2 by adjusting the parameters of the curve function determines thebest fit. If the agreement between curve and data is good, then the value of χ2

should be about the same as the number of degrees of freedom, χ2/dof ≈ 1. Apoor fit yields χ2/dof 1; if χ2/dof 1, as in Figure 11, the fit is too good:the data have been cooked or the uncertainties over-estimated.

Excel has no built-in function to do the weighted minimization; python hasat least one: scipy.optimize.curve fit().

If the independent variable has an uncertainty, as well, then the total un-certainty in the χ2 must be increased to account for the fact that yi is lesswell-known as a function of not knowing xi perfectly. This is done by addingan effective variance to the y-variance in quadrature:

σtotal, yi =√σ2yi + σ2

eff (30)

Recalling that yi = f(xi), where f(x) is the equation of the curve,

σ2eff =

[f(xi + σxi)− f(xi − σxi)]2

4(31)

In this case, Equation 29 is modified by substituting σtotal, yi for σyi :

21

Figure 12: Various trend lines for positively correlated data.

χ2 ≡N∑i=1

(yi − yi)2

σ2total, yi

. (32)

The best-fit curve will not match the data exactly (see discussion of χ2 1,above). The parameters determined by the fit have associated uncertainties[See Figure 11]. Excel does not provide these except for straight line fits us-ing LINEST or REGRESSION; scipy.optimize.curve fit() provides a covariancematrix, whose diagonal elements are the variances of the respective parameters,regardless of the fit function.

22

measurement uncertainties - physics & astronomy

Documents