outline of quality control document introduction why is quality control is needed?

26
OUTLINE OF QUALITY CONTROL DOCUMENT Introduction Why is quality control is needed? Information to accompany data Automatic checks “Scientific” quality control CTD (temperature and salinity) Current meter data (including ADCP) Wave data Sea level Biological data, etc., Quality flags Documentation Quality Control Standards for SEADATANET

Upload: chi

Post on 14-Jan-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Quality Control Standards for SEADATANET. OUTLINE OF QUALITY CONTROL DOCUMENT Introduction Why is quality control is needed? Information to accompany data Automatic checks “Scientific” quality control CTD (temperature and salinity) Current meter data (including ADCP) Wave data Sea level - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: OUTLINE OF QUALITY CONTROL DOCUMENT Introduction Why is quality control is needed?

OUTLINE OF QUALITY CONTROL DOCUMENT

• Introduction

• Why is quality control is needed?

• Information to accompany data

• Automatic checks

• “Scientific” quality control

• CTD (temperature and salinity)

• Current meter data (including ADCP)

• Wave data

• Sea level

• Biological data, etc.,

• Quality flags

• Documentation

Quality Control Standards for SEADATANET

Page 2: OUTLINE OF QUALITY CONTROL DOCUMENT Introduction Why is quality control is needed?

Quality Control Standards for SEADATANET

Data quality control has the following objective:

“To ensure the data consistency within a single data set and within a collection of data sets and to ensure that the quality and errors of the data are apparent to the user who has sufficient information to assess its suitability for a task.” (IOC/CEC Manual, 1993)

Quality control, if done well, brings about a number of key advantages:

• Maintaining standards

• Consistency

• Reliability

Page 3: OUTLINE OF QUALITY CONTROL DOCUMENT Introduction Why is quality control is needed?

Quality Control Standards for SEADATANET

For all types of data information is required about:

• Where the data were collected: location (preferably as latitude and longitude) and depth/height

• When the data were collected (date and time in UTC or clearly specified local time zone)

• How the data were collected (e.g. sampling methods, instrument types, analytical techniques)

• How the data are referenced (e.g. station numbers, cast numbers)

• Who collected the data, including name and institution of the data originator(s) and the principal investigator

• What has been done to the data (e.g. details of processing and calibrations applied, algorithms used to compute derived parameters)

• Comments for other users of the data (e.g. problems encountered and comments on data quality)

Page 4: OUTLINE OF QUALITY CONTROL DOCUMENT Introduction Why is quality control is needed?

Data Collection Details: example 1

Biological Net Tow (Plankton)• Project, ship, cruise identifier• Country, organisation• Date, time, latitude and longitude (for start and end if

sampling via a net tow)• Sounding, maximum and minimum pressure or depth of tow• Description of operational procedures such as tow

orientation (vertical, horizontal or oblique), methods of position fixing (e.g. DGPS, GPS, etc.)

• Weather conditions (including sun and wind)• Gear type (e.g. net mesh size, net mouth size, single or

multi-net, etc.)• Sample preservation method (e.g. pickling, frozen, etc.)• Sample analysis/processing or data collection procedures

(e.g. filtered size ranges, sub-sampling, etc.)• Any additional information of use to secondary users which

may have affected the data or have a bearing on its subsequent use

Page 5: OUTLINE OF QUALITY CONTROL DOCUMENT Introduction Why is quality control is needed?

Data Collection Details: example 2

Shipboard ADCP

• Project, ship, cruise identifier

• Country, organisation

• Details of the instrument and sensors (e.g. manufacturer, instrument type, model number, serial number and any modifications carried out, number of transducers)

• Description of operational procedures including sampling interval (time between ensembles), pings per ensemble, bin size, number of bins, bottom tracking on/off, pitch and roll on/off, percentage good level, method of position fix (e.g. GPS, DGPS), automated data rejection (e.g. fish rejection algorithms), etc.

• Frequency (kHz), band type (broad, narrow)

• Date and time of the start and end of the profiles for each data file

• Any additional information of use to secondary users which may have affected the data or have a bearing on its subsequent use.

Page 6: OUTLINE OF QUALITY CONTROL DOCUMENT Introduction Why is quality control is needed?

Quality Control Standards for SEADATANET

Parameter Details• Parameters measured

(Refer to BODC Parameter Usage Vocabulary if necessary for help with parameter definitions) 

Data Processing Details• Originator's Data Format

• Description of calibrations

• Description of any data processing that has occurred (manufacturers and in-house)

Page 7: OUTLINE OF QUALITY CONTROL DOCUMENT Introduction Why is quality control is needed?

AUTOMATIC QUALITY

CONTROL CHECKS

Page 8: OUTLINE OF QUALITY CONTROL DOCUMENT Introduction Why is quality control is needed?

Basic automatic checks for all data types

Date and time of an observation has to be valid• Year 4 digits• Month between 1 and 12• Day in range expected for month• Hour between 0 and 23• Minute between 0 and 59

Latitude and longitude have to be valid• Latitude in range -90 to 90• Longitude in range -180 to 180

Position must not be on land• Observation latitude and longitude located in ocean• For example, use 5-minute bathymetry (e.g. ETOPO5)

Page 9: OUTLINE OF QUALITY CONTROL DOCUMENT Introduction Why is quality control is needed?

Further automatic checks

Impossible speed • Tests for acceptable speed between stations

Spike• Tests salinity and temperature data for large

differences between adjacent values (other parameters also)

Gradient• Tests for gradient between vertically adjacent salinity

and temperature measurements too steep

Density inversion• Tests where calculated density at a higher pressure in

a profile is less than the calculated density at an adjacent lower pressure

Pressure increasing• Pressures from the profile monotonically increasing

Page 10: OUTLINE OF QUALITY CONTROL DOCUMENT Introduction Why is quality control is needed?

Further automatic checks

Global range• Tests that observed temperature and salinity values are

within the expected extremes encountered in the oceans

Regional range• Tests that observed temperature and salinity values are

within the expected extremes encountered in particular regions

Deepest pressure• Tests that profile does not contain pressures higher

than the highest value expected

Check for duplicates• Cruises or stations within a cruise using a space-time

radius (e.g., for duplicate cruises: 1 mile, 15min or 1day if time is unknown)

Page 11: OUTLINE OF QUALITY CONTROL DOCUMENT Introduction Why is quality control is needed?

SCIENTIFIC QUALITY

CONTROL CHECKS

Page 12: OUTLINE OF QUALITY CONTROL DOCUMENT Introduction Why is quality control is needed?

Quality Control Standards for SEADATANET

Visual inspection of data• Pressure/depth series (e.g. CTD)• Property-property plot• Time series (e.g. current meter, sea level)• Scatter plot (e.g. current meter)• Map covering the locations of series• Ensure that data are free from instrument-

generated spikes, gaps, spurious data at the start and end of the record and other irregularities

• Apply quality flags • Quality flags do not change the data• Visual inspection can be subjective, dependent on

experience

Page 13: OUTLINE OF QUALITY CONTROL DOCUMENT Introduction Why is quality control is needed?

Quality Control Standards for SEADATANET

Spike detection• Can require visual inspection to back up automatic

check

Range check • Comparison with pre-existing climatological statistics• Compare data collected on same cruise and in same

area

Page 14: OUTLINE OF QUALITY CONTROL DOCUMENT Introduction Why is quality control is needed?

Quality Control Standards for SEADATANET

(Current Speed (ms-1), North velocity component (ms-1), East velocity component (ms-1), Current Direction (°), Temperature (°C) (not shown))

Example of a current meter time series with a possible rotor problem

Page 15: OUTLINE OF QUALITY CONTROL DOCUMENT Introduction Why is quality control is needed?

Quality Control Standards for SEADATANET

BELOW: example of a record with suspect

directions

ABOVE: example of a ‘good’ scatter plot

Page 16: OUTLINE OF QUALITY CONTROL DOCUMENT Introduction Why is quality control is needed?

Quality Control Standards for SEADATANET

Common Problems Associated with Current Meters (1)

• Rotor turns, but there is either a breakdown of magnetic coupling between the rotor and follower or reed switch which then fails to register rotations

• Rotor not turning due to fouling with weed or the suchlike. This results in a sudden drop in speed to zero or near zero.

• Directions not being resolved. This could result from a stiff meter suspension or a meter being fouled by its mooring wire.

• Compass sticking. This may occur if the meter is inclined too far from the horizontal plane and can be a problem in fast tidal streams when in-line instruments are used. This is commonly known as .mooring-knockdown’. This is seen in the data as a frequent recurrence of a single direction value or a narrow range of directions.

• Worn compass. This causes some directions to become repetitive.

Page 17: OUTLINE OF QUALITY CONTROL DOCUMENT Introduction Why is quality control is needed?

Quality Control Standards for SEADATANET

Common Problems Associated with Current Meters (2)

• Non linearity of compass. This is usually picked up from the scatter plot of u and v velocity components.

• Sticking encoder pins. This causes spikes in all parameters and is often manifested by the appearance of the value of the pin(s) in the listing (e.g. 0, 256, 512, 768 or 1023).

• Underrated power supply. This often shows in the compass channel first because of the extra current drain during clamping.

• Electronic failure (e.g. dry joints, circuitry broken). This does not always produce a total loss of data however.

• Poor quality recording tape. This is indicated by the appearance of suspect data at regular intervals in all parameters.

• Sensor drift. This is a slow change in the response of the sensor.

Page 18: OUTLINE OF QUALITY CONTROL DOCUMENT Introduction Why is quality control is needed?

Quality Control Standards for SEADATANET

Scatter plot of wave height against (zero up-crossing or crest) period

Wave heights (maximum in pink; significant in blue) are ok for range, basically normal distribution, steepness (all < 5%)

Page 19: OUTLINE OF QUALITY CONTROL DOCUMENT Introduction Why is quality control is needed?

Quality Control Standards for SEADATANET

1-Dimensional and Directional Wave Spectra

• Check slope of energy density spectrum – should follow a set slope due to transfer of energy from lower to higher frequencies (?)

• Check that energy in the spectrum at frequencies below 0.04 Hz is not more than 5% of the total spectral energy

• Check that energy in the spectrum at frequencies above 0.6 Hz is not more than 5% of the total spectral energy

• Check mean direction at high frequencies, which should correspond to the wind direction (assuming coincident meteorological data).

• For 1D spectra, calculate zeroth spectral moment from spectral variance densities and check that it corresponds to the given value

• For 1D spectra, calculate Te as the zeroth divided by first negative spectral moment and check that it correlates with (peak or zero upcrossing) period

Page 20: OUTLINE OF QUALITY CONTROL DOCUMENT Introduction Why is quality control is needed?

Quality Control Standards for SEADATANET

Sea Level Data• Harmonic analysis - generate predictions• Calculate residuals• Spikes • Constant values • Clock malfunctions• Gap filling• Reference changes• Calculation of statistics

Page 21: OUTLINE OF QUALITY CONTROL DOCUMENT Introduction Why is quality control is needed?

Biological data quality control

COPEPOD: A Global Plankton Database (2005)

• Plankton data are variable by nature, influenced by numerous physical and biological events.

• Unlike temperature or salinity values, no tight range of typical values that one can use to easily qualify or disqualify these data.

• Plankton values greatly affected by size of net mesh and depth of tow.

• Very basic value range and statistical techniques to look for anomalous or non-representative data.

• The variety of original units still do not allow for easy inter-comparison of the data: Common Baseunit Value (CBV) was calculated

• Biological Grouping Code (BGC) identifies the plankton taxa’s membership in up to four groupings

Page 22: OUTLINE OF QUALITY CONTROL DOCUMENT Introduction Why is quality control is needed?

Biological data – range checks

• CBV and BGC are used together to perform broad, taxonomic group-based value range checks

• A single range (for the entire world ocean) was used for the major and minor taxonomic groups.

• Future work will divide these ranges into smaller taxonomic sub-groups and individual oceanographic basins or regions, allowing for tighter range checks

• Value ranges very general and encompass the effects of:• Different mesh sizes • Day versus night sampling• Presence of smaller life stages (“number of adults” vs. “number

of adults + juveniles”)

• Will be adjusted as new data and better techniques added to database

• New ranges, as well as ranges for additional plankton sub-groups will be available online.

Page 23: OUTLINE OF QUALITY CONTROL DOCUMENT Introduction Why is quality control is needed?

Biological data - statistical checks

• Used to search for questionable values

• Not used to automatically flag values

• For each BGC group mean and standard deviation calculated based on all observations present in database

• Individual observations >5 standard deviations from mean investigated on a case by-case basis

• Natural variability may account for many “outliers”, method helped identify extreme values caused by misinterpreted units or typographic errors

• In many cases, values off by a factor of 1000

• Readily detected by these simple statistical checks

Page 24: OUTLINE OF QUALITY CONTROL DOCUMENT Introduction Why is quality control is needed?

SeaDataNet quality control flags

Flag Short description

0 No quality control1 The value appears to be correct2 The value appears to be probably

good3 The value appears probably bad4 The value appears erroneous5 The value has been changed6 Below detection limit7 In excess of quoted value8 Interpolated value9 Missing valueA Incomplete information

Based on IGOSS/UOT/GTSPP & Argo quality flags

Page 25: OUTLINE OF QUALITY CONTROL DOCUMENT Introduction Why is quality control is needed?

Quality Control Standards for SEADATANET

Data Documentation• Comprehensive documentation to accompany

the data

• All data sets need to be fully documented to ensure they can be used in the future without ambiguity or uncertainty

• Compiled using:• information supplied by the data originator (e.g. data

reports, comments on data quality)• any further information gained during QC

• Includes: instrument details, mooring details, data quality, calibration and processing carried out by the data originator and data centre processing and quality control

Page 26: OUTLINE OF QUALITY CONTROL DOCUMENT Introduction Why is quality control is needed?

Quality Control Standards for SEADATANET

REFERENCES• NODC procedures (e.g. France, Greece, Italy, Norway, Spain, Sweden, UK)• EU MEDAR-MEDATLAS procedures and SCOOP software• EU SIMORC project (Met-ocean data QC)• EU ESEAS (sea level) and IOC GLOSS documents• Manual of Quality Control Procedures for Validation of Oceanographic Data, UNESCO, IOC -

Manuals & Guides, 1993, Manual And Guides 26• GTSPP QC (IOC Manuals and Guides No. 22)• Argo Quality Control Manual (Real Time and Delayed Mode) • GOSUD Real-time quality control• IODE’s OceanTeacher• ICES WG Marine Data Management Data Type Guidelines• JPOTS Manual, 1991 • WOCE manuals• JGOFS Protocols• World Ocean Database Quality Control documentation• TOGA/COARE Handbook of Quality Control Procedures for Surface Meteorology Data• BODC-WOCE Sea Level Data Assembly Centre Quality Assessment• AODC Quality Control Cookbook for XBT Data• Chapman, A. D. 2005. Principles and Methods of Data Cleaning – Primary Species and Species-

Occurrence Data, version 1.0.• Chapman, A. D. 2005. Principles of Data Quality, version 1.0. Report for the Global Biodiversity

Information Facility, Copenhagen.• ‘Ocean biodiversity informatics’: a new era in marine biology research and management (Mark

J. Costello, Edward Vanden Berghe)• QARTOD (Quality Assurance of Real-Time Oceanographic Data)