accuracy,precision,uncertainty and error
TRANSCRIPT
A measurement system can be accurate but not precise, precise but not accurate, neither, or
both. For example, if an experiment contains a systematic error, then increasing the sample
size generally increases precision but does not improve accuracy. The end result would be a
consistent yet inaccurate string of results from the flawed experiment. Eliminating the systematic
error improves accuracy but does not change precision.
A measurement system is designated valid if it is both accurate and precise. Related terms
include bias (non-random or directed effects caused by a factor or factors unrelated to
the independent variable) and error (random variability).
The terminology is also applied to indirect measurements--that is, values obtained by a
computational procedure from observed data.
In addition to accuracy and precision, measurements may also have a measurement resolution,
which is the smallest change in the underlying physical quantity that produces a response in the
measurement.
In the case of full reproducibility, such as when rounding a number to a representable floating
point number, the word precision has a meaning not related to reproducibility. For example, in
the IEEE 754-2008 standard it means the number of bits in the significand, so it is used as a
measure for the relative accuracy with which an arbitrary number can be represented.
Quantifying accuracy and precision
Ideally a measurement device is both accurate and precise, with measurements all close to and
tightly clustered around the known value. The accuracy and precision of a measurement
process is usually established by repeatedly measuring some traceable reference standard.
Such standards are defined in the International System of Units and maintained by
national standards organizations such as the National Institute of Standards and Technology.
This also applies when measurements are repeated and averaged. In that case, the
term standard error is properly applied: the precision of the average is equal to the known
standard deviation of the process divided by the square root of the number of measurements
averaged. Further, the central limit theorem shows that the probability distribution of the
averaged measurements will be closer to a normal distribution than that of individual
measurements.
With regard to accuracy we can distinguish:
the difference between the mean of the measurements and the reference value, the bias.
Establishing and correcting for bias is necessary for calibration.
the combined effect of that and precision.
A common convention in science and engineering is to express accuracy and/or precision
implicitly by means of significant figures. Here, when not explicitly stated, the margin of error is
understood to be one-half the value of the last significant place. For instance, a recording of
843.6 m, or 843.0 m, or 800.0 m would imply a margin of 0.05 m (the last significant place is the
tenths place), while a recording of 8,436 m would imply a margin of error of 0.5 m (the last
significant digits are the units).
A reading of 8,000 m, with trailing zeroes and no decimal point, is ambiguous; the trailing zeroes
may or may not be intended as significant figures. To avoid this ambiguity, the number could be
represented in scientific notation: 8.0 × 103 m indicates that the first zero is significant (hence a
margin of 50 m) while 8.000 × 103 m indicates that all three zeroes are significant, giving a
margin of 0.5 m. Similarly, it is possible to use a multiple of the basic measurement unit: 8.0 km
is equivalent to 8.0 × 103 m. In fact, it indicates a margin of 0.05 km (50 m). However, reliance
on this convention can lead to false precision errors when accepting data from sources that do
not obey it.
Looking at this in another way, a value of 8 would mean that the measurement has been made
with a precision of 1 (the measuring instrument was able to measure only down to 1s place)
whereas a value of 8.0 (though mathematically equal to 8) would mean that the value at the first
decimal place was measured and was found to be zero. (The measuring instrument was able to
measure the first decimal place.) The second value is more precise. Neither of the measured
values may be accurate (the actual value could be 9.5 but measured inaccurately as 8 in both
instances). Thus, accuracy can be said to be the 'correctness' of a measurement, while
precision could be identified as the ability to resolve smaller differences.
Precision is sometimes stratified into:
Repeatability — the variation arising when all efforts are made to keep conditions constant
by using the same instrument and operator, and repeating during a short time period; and
Reproducibility — the variation arising using the same measurement process among
different instruments and operators, and over longer time periods.
Accuracy and precision in binary classification
Accuracy is also used as a statistical measure of how well a binary classification test
correctly identifies or excludes a condition.
Condition as determined by Gold standard
True False
Testoutcome
Positive True positive False positive → Positive predictive value
Negative
False negative True negative → Negative predictive value
↓Sensitivity
↓Specificity
Accuracy
That is, the accuracy is the proportion of true results (both true positives and true
negatives) in the population. It is a parameter of the test.
On the other hand, precision is defined as the proportion of the true positives against all
the positive results (both true positives and false positives)
An accuracy of 100% means that the measured values are exactly the same as the
given values.
Also see Sensitivity and specificity.
Accuracy may be determined from Sensitivity and Specificity, provided Prevalence is
known, using the equation:
accuracy = (sensitivity)(prevalence) + (specificity)(1 − prevalence)
The accuracy paradox for predictive analytics states that predictive models with a given
level of accuracy may have greater predictive powerthan models with higher accuracy. It
may be better to avoid the accuracy metric in favor of other metrics such as precision
and recall.[citation needed] In situations where the minority class is more important, F-
measure may be more appropriate, especially in situations with very skewed class
imbalance. An alternate performance measure that treats both classes with equal
importance is "balanced accuracy":
[edit]
Systematic errors are biases in measurement which lead to the situation where the mean of
many separate measurements differs significantly from the actual value of the measured
attribute. All measurements are prone to systematic errors, often of several different types.
Sources of systematic error may be imperfect calibration of measurement instruments (zero
error), changes in the environment which interfere with the measurement process and
sometimes imperfect methods of observation can be either zero error or percentage error. For
example, consider an experimenter taking a reading of the time period of a pendulum swinging
past a fiducial mark: If his stop-watch or timer starts with 1 second on the clock then all of his
results will be off by 1 second (zero error). If the experimenter repeats this experiment twenty
times (starting at 1 second each time), then there will be a percentage error in the calculated
average of his results; the final result will be slightly larger than the true
period. Distance measured by radar will be systematically overestimated if the slight slowing
down of the waves in air is not accounted for. Incorrect zeroing of an instrument leading to a
zero error is an example of systematic error in instrumentation.
Systematic errors may also be present in the result of an estimate based on a mathematical
model or physical law. For instance, the estimated oscillation frequency of a pendulum will be
systematically in error if slight movement of the support is not accounted for.
Systematic errors can be either constant, or be related (e.g. proportional or a percentage) to the
actual value of the measured quantity, or even to the value of a different quantity (the reading of
a ruler can be affected by environment temperature). When they are constant, they are simply
due to incorrect zeroing of the instrument. When they are not constant, they can change sign.
For instance, if a thermometer is affected by a proportional systematic error equal to 2% of the
actual temperature, and the actual temperature is 200°, 0°, or −100°, the measured temperature
will be 204° (systematic error = +4°), 0° (null systematic error) or −102° (systematic error = −2°),
respectively. Thus, the temperature will be overestimated when it will be above zero, and
underestimated when it will be below zero.
Constant systematic errors are very difficult to deal with, because their effects are only
observable if they can be removed. Such errors cannot be removed by repeating measurements
or averaging large numbers of results. A common method to remove systematic error is
throughcalibration of the measurement instrument.
In a statistical context, the term systematic error usually arises where the sizes and directions of
possible errors are unknown.
Systematic versus random error
Measurement errors can be divided into two components: random error and systematic error.[1] Random error is always present in a measurement. It is caused by inherently unpredictable
fluctuations in the readings of a measurement apparatus or in the experimenter's interpretation
of the instrumental reading. Random errors show up as different results for ostensibly the same
repeated measurement. Systematic error cannot be discovered this way because it always
pushes the results in the same direction. If the cause of a systematic error can be identified,
then it can usually be eliminated.
Drift
Systematic errors which change during an experiment (drift) are easier to detect. Measurements
show trends with time rather than varying randomly about a mean.
Drift is evident if a measurement of a constant quantity is repeated several times and the
measurements drift one way during the experiment, for example if each measurement is higher
than the previous measurement which could perhaps occur if an instrument becomes warmer
during the experiment. If the measured quantity is variable, it is possible to detect a drift by
checking the zero reading during the experiment as well as at the start of the experiment
(indeed, the zero reading is a measurement of a constant quantity). If the zero reading is
consistently above or below zero, a systematic error is present. If this cannot be eliminated, for
instance by resetting the instrument immediately before the experiment, it needs to be allowed
for by subtracting its (possibly time-varying) value from the readings, and by taking it into
account in assessing the accuracy of the measurement.
If no pattern in a series of repeated measurements is evident, the presence of fixed systematic
errors can only be found if the measurements are checked, either by measuring a known
quantity or by comparing the readings with readings made using a different apparatus, known to
be more accurate. For example, suppose the timing of a pendulum using an
accurate stopwatch several times gives readings randomly distributed about the mean. A
systematic error is present if the stopwatch is checked against the 'speaking clock' of the
telephone system and found to be running slow or fast. Clearly, the pendulum timings need to
be corrected according to how fast or slow the stopwatch was found to be running. Measuring
instruments such as ammeters and voltmeters need to be checked periodically against known
standards.
Systematic errors can also be detected by measuring already known quantities. For example,
a spectrometer fitted with a diffraction gratingmay be checked by using it to measure
the wavelength of the D-lines of the sodium electromagnetic spectrum which are at 600nm and
589.6 nm. The measurements may be used to determine the number of lines per millimetre of
the diffraction grating, which can then be used to measure the wavelength of any other spectral
line.