accuracy,precision,uncertainty and error

7
A measurement system can be accurate but not precise, precise but not accurate, neither, or both. For example, if an experiment contains a systematic error, then increasing the sample size generally increases precision but does not improve accuracy. The end result would be a consistent yet inaccurate string of results from the flawed experiment. Eliminating the systematic error improves accuracy but does not change precision. A measurement system is designated valid if it is both accurate and precise. Related terms include bias (non-random or directed effects caused by a factor or factors unrelated to the independent variable) and error (random variability). The terminology is also applied to indirect measurements--that is, values obtained by a computational procedure from observed data. In addition to accuracy and precision, measurements may also have a measurement resolution, which is the smallest change in the underlying physical quantity that produces a response in the measurement. In the case of full reproducibility, such as when rounding a number to a representable floating point number, the word precision has a meaning not related to reproducibility. For example, in the IEEE 754- 2008 standard it means the number of bits in the significand, so it is used as a measure for the relative accuracy with which an arbitrary number can be represented. Quantifying accuracy and precision Ideally a measurement device is both accurate and precise, with measurements all close to and tightly clustered around the known value. The accuracy and precision of a measurement process is usually established by repeatedly measuring some traceable reference standard. Such standards are defined in the International System of Units and maintained by national standards organizations such as the National Institute of Standards and Technology.

Upload: michelle-l-pomida

Post on 24-Mar-2015

88 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Accuracy,Precision,Uncertainty and Error

A measurement system can be accurate but not precise, precise but not accurate, neither, or

both. For example, if an experiment contains a systematic error, then increasing the sample

size generally increases precision but does not improve accuracy. The end result would be a

consistent yet inaccurate string of results from the flawed experiment. Eliminating the systematic

error improves accuracy but does not change precision.

A measurement system is designated valid if it is both accurate and precise. Related terms

include bias (non-random or directed effects caused by a factor or factors unrelated to

the independent variable) and error (random variability).

The terminology is also applied to indirect measurements--that is, values obtained by a

computational procedure from observed data.

In addition to accuracy and precision, measurements may also have a measurement resolution,

which is the smallest change in the underlying physical quantity that produces a response in the

measurement.

In the case of full reproducibility, such as when rounding a number to a representable floating

point number, the word precision has a meaning not related to reproducibility. For example, in

the IEEE 754-2008 standard it means the number of bits in the significand, so it is used as a

measure for the relative accuracy with which an arbitrary number can be represented.

Quantifying accuracy and precision

Ideally a measurement device is both accurate and precise, with measurements all close to and

tightly clustered around the known value. The accuracy and precision of a measurement

process is usually established by repeatedly measuring some traceable reference standard.

Such standards are defined in the International System of Units and maintained by

national standards organizations such as the National Institute of Standards and Technology.

This also applies when measurements are repeated and averaged. In that case, the

term standard error is properly applied: the precision of the average is equal to the known

standard deviation of the process divided by the square root of the number of measurements

averaged. Further, the central limit theorem shows that the probability distribution of the

averaged measurements will be closer to a normal distribution than that of individual

measurements.

With regard to accuracy we can distinguish:

Page 2: Accuracy,Precision,Uncertainty and Error

the difference between the mean of the measurements and the reference value, the bias.

Establishing and correcting for bias is necessary for calibration.

the combined effect of that and precision.

A common convention in science and engineering is to express accuracy and/or precision

implicitly by means of significant figures. Here, when not explicitly stated, the margin of error is

understood to be one-half the value of the last significant place. For instance, a recording of

843.6 m, or 843.0 m, or 800.0 m would imply a margin of 0.05 m (the last significant place is the

tenths place), while a recording of 8,436 m would imply a margin of error of 0.5 m (the last

significant digits are the units).

A reading of 8,000 m, with trailing zeroes and no decimal point, is ambiguous; the trailing zeroes

may or may not be intended as significant figures. To avoid this ambiguity, the number could be

represented in scientific notation: 8.0 × 103 m indicates that the first zero is significant (hence a

margin of 50 m) while 8.000 × 103 m indicates that all three zeroes are significant, giving a

margin of 0.5 m. Similarly, it is possible to use a multiple of the basic measurement unit: 8.0 km

is equivalent to 8.0 × 103 m. In fact, it indicates a margin of 0.05 km (50 m). However, reliance

on this convention can lead to false precision errors when accepting data from sources that do

not obey it.

Looking at this in another way, a value of 8 would mean that the measurement has been made

with a precision of 1 (the measuring instrument was able to measure only down to 1s place)

whereas a value of 8.0 (though mathematically equal to 8) would mean that the value at the first

decimal place was measured and was found to be zero. (The measuring instrument was able to

measure the first decimal place.) The second value is more precise. Neither of the measured

values may be accurate (the actual value could be 9.5 but measured inaccurately as 8 in both

instances). Thus, accuracy can be said to be the 'correctness' of a measurement, while

precision could be identified as the ability to resolve smaller differences.

Precision is sometimes stratified into:

Repeatability — the variation arising when all efforts are made to keep conditions constant

by using the same instrument and operator, and repeating during a short time period; and

Reproducibility — the variation arising using the same measurement process among

different instruments and operators, and over longer time periods.

Accuracy and precision in binary classification

Accuracy is also used as a statistical measure of how well a binary classification test

correctly identifies or excludes a condition.

Page 3: Accuracy,Precision,Uncertainty and Error

Condition as determined by Gold standard

True False

Testoutcome

Positive True positive False positive → Positive predictive value

Negative

False negative True negative → Negative predictive value

↓Sensitivity

↓Specificity

Accuracy

That is, the accuracy is the proportion of true results (both true positives and true

negatives) in the population. It is a parameter of the test.

On the other hand, precision is defined as the proportion of the true positives against all

the positive results (both true positives and false positives)

An accuracy of 100% means that the measured values are exactly the same as the

given values.

Also see Sensitivity and specificity.

Accuracy may be determined from Sensitivity and Specificity, provided Prevalence is

known, using the equation:

accuracy = (sensitivity)(prevalence) + (specificity)(1 − prevalence)

The accuracy paradox for predictive analytics states that predictive models with a given

level of accuracy may have greater predictive powerthan models with higher accuracy. It

may be better to avoid the accuracy metric in favor of other metrics such as precision

and recall.[citation needed] In situations where the minority class is more important, F-

measure may be more appropriate, especially in situations with very skewed class

imbalance. An alternate performance measure that treats both classes with equal

importance is "balanced accuracy":

[edit]

Page 4: Accuracy,Precision,Uncertainty and Error

Systematic errors are biases in measurement which lead to the situation where the mean of

many separate measurements differs significantly from the actual value of the measured

attribute. All measurements are prone to systematic errors, often of several different types.

Sources of systematic error may be imperfect calibration of measurement instruments (zero

error), changes in the environment which interfere with the measurement process and

sometimes imperfect methods of observation can be either zero error or percentage error. For

example, consider an experimenter taking a reading of the time period of a pendulum swinging

past a fiducial mark: If his stop-watch or timer starts with 1 second on the clock then all of his

results will be off by 1 second (zero error). If the experimenter repeats this experiment twenty

times (starting at 1 second each time), then there will be a percentage error in the calculated

average of his results; the final result will be slightly larger than the true

period. Distance measured by radar will be systematically overestimated if the slight slowing

down of the waves in air is not accounted for. Incorrect zeroing of an instrument leading to a

zero error is an example of systematic error in instrumentation.

Systematic errors may also be present in the result of an estimate based on a mathematical

model or physical law. For instance, the estimated oscillation frequency of a pendulum will be

systematically in error if slight movement of the support is not accounted for.

Systematic errors can be either constant, or be related (e.g. proportional or a percentage) to the

actual value of the measured quantity, or even to the value of a different quantity (the reading of

a ruler can be affected by environment temperature). When they are constant, they are simply

due to incorrect zeroing of the instrument. When they are not constant, they can change sign.

For instance, if a thermometer is affected by a proportional systematic error equal to 2% of the

actual temperature, and the actual temperature is 200°, 0°, or −100°, the measured temperature

will be 204° (systematic error = +4°), 0° (null systematic error) or −102° (systematic error = −2°),

respectively. Thus, the temperature will be overestimated when it will be above zero, and

underestimated when it will be below zero.

Constant systematic errors are very difficult to deal with, because their effects are only

observable if they can be removed. Such errors cannot be removed by repeating measurements

or averaging large numbers of results. A common method to remove systematic error is

throughcalibration of the measurement instrument.

In a statistical context, the term systematic error usually arises where the sizes and directions of

possible errors are unknown.

Systematic versus random error

Page 5: Accuracy,Precision,Uncertainty and Error

Measurement errors can be divided into two components: random error and systematic error.[1] Random error is always present in a measurement. It is caused by inherently unpredictable

fluctuations in the readings of a measurement apparatus or in the experimenter's interpretation

of the instrumental reading. Random errors show up as different results for ostensibly the same

repeated measurement. Systematic error cannot be discovered this way because it always

pushes the results in the same direction. If the cause of a systematic error can be identified,

then it can usually be eliminated.

Drift

Systematic errors which change during an experiment (drift) are easier to detect. Measurements

show trends with time rather than varying randomly about a mean.

Drift is evident if a measurement of a constant quantity is repeated several times and the

measurements drift one way during the experiment, for example if each measurement is higher

than the previous measurement which could perhaps occur if an instrument becomes warmer

during the experiment. If the measured quantity is variable, it is possible to detect a drift by

checking the zero reading during the experiment as well as at the start of the experiment

(indeed, the zero reading is a measurement of a constant quantity). If the zero reading is

consistently above or below zero, a systematic error is present. If this cannot be eliminated, for

instance by resetting the instrument immediately before the experiment, it needs to be allowed

for by subtracting its (possibly time-varying) value from the readings, and by taking it into

account in assessing the accuracy of the measurement.

If no pattern in a series of repeated measurements is evident, the presence of fixed systematic

errors can only be found if the measurements are checked, either by measuring a known

quantity or by comparing the readings with readings made using a different apparatus, known to

be more accurate. For example, suppose the timing of a pendulum using an

accurate stopwatch several times gives readings randomly distributed about the mean. A

systematic error is present if the stopwatch is checked against the 'speaking clock' of the

telephone system and found to be running slow or fast. Clearly, the pendulum timings need to

be corrected according to how fast or slow the stopwatch was found to be running. Measuring

instruments such as ammeters and voltmeters need to be checked periodically against known

standards.

Systematic errors can also be detected by measuring already known quantities. For example,

a spectrometer fitted with a diffraction gratingmay be checked by using it to measure

the wavelength of the D-lines of the sodium electromagnetic spectrum which are at 600nm and

589.6 nm. The measurements may be used to determine the number of lines per millimetre of

Page 6: Accuracy,Precision,Uncertainty and Error

the diffraction grating, which can then be used to measure the wavelength of any other spectral

line.