lecture 23: data quality and documentation by austin troy ------using gis-- introduction to gis

36
Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Post on 21-Dec-2015

258 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Lecture 23:Data quality and documentation

By Austin Troy

------Using GIS--Introduction to GIS

Page 2: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2008

Data Quality

•Accuracy+ Precision = Quality

•Error= fn(accuracy, precision)

•Cost vs. quality tradeoff

Page 3: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Accuracy• “the degree to which information on a map or in a digital

database matches true or accepted values.”• From Kenneth E. Foote and Donald J. Huebner

http://www.colorado.edu/geography/gcraft/notes/error/error_f.html

• Reflection of how close a measurement represent the actual quantity measured and of the number and severity of errors in a dataset or map.

Image source: http://oopslist.com/

Page 4: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Precision• Intensity or level of preciseness, or exactitude in

measurements. The more precise a measurement is, the smaller the unit which you intend to measure

• Hence, a measurement down to a fraction of a cm is more precise than a measurement to a cm

• However, data with a high level of precision can still be inaccurate—this is due to errors

• Each application requires a different level of precision

Page 5: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Random and Systematic error•Error can be systematic or random

•Systematic error can be rectified if discovered, because its source is understood

Image source: http://oopslist.com/

Page 6: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Random and Systematic error•Systematic errors affect accuracy, but are usually independent of precision; data can use highly precise methods but still be inaccurate due to systematic error

Introduction to GIS

Accurate and precise: no systematic , little random error

inaccurate and precise: little random error but significant systematic error

Accurate and imprecise: no systematic , but considerable random error

inaccurate and imprecise: both types of error

Page 7: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Measurement of AccuracyPositional accuracy is often stated as a confidence

interval: e.g. 104.2 cm +/- .01 means true value lies between 104.21 and 104.19

One of the key measurements of positional accuracy is root mean squared error (MSE); equals squared difference between observed and expected value for observation i divided by total number of observations, summed across each observation i

This is just a standardized measure of error—how close the predicted measure is to observed

Introduction to GIS

Page 8: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Positional Accuracy• Positional accuracy standards specify that

acceptable positional error varies with scale

• Data can have high level of precision but still be positionally inaccurate

• Positional error is inversely related to precision and to amount of processing

Page 9: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Accuracy is tied to scale

Introduction to GIS

Page 10: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Positional Error Standards• Different agencies have different standards for

positional error

• Example: USGS horizontal positional requirements state that 90% of all points must be within 1/30th of an inch for maps at a scale of 1:20,000 or larger, and 1/50th of an inch for maps at scales smaller than 1:20,000

Introduction to GIS

Page 11: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Positional Error Standards• USGS Accuracy standards on the ground:

1:4,800 ± 13.33 feet

1:10,000 ± 27.78 feet

1:12,000 ± 33.33 feet

1:24,000 ± 40.00 feet

1:63,360 ± 105.60 feet

1:100,000 ± 166.67 feet

Introduction to GIS

See image from U. Colorado showing accuracy standards visually

Hence, a point on a map represents the center of a spatial probability distribution of its possible locations

Thanks to Kenneth E. Foote and Donald J. Huebner, The Geographer's Craft Project, Department of Geography, The University of Colorado at Boulder for links

Page 12: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Attribute Accuracy• Attribute accuracy and precision refer to quality of

non-spatial, attribute data

• Precision for numeric data means lots of digits

• Example: recording income down to cents, rather than just dollars

• Quantitative measurement errors: e.g. truncation

• A common error is to measure a phenomenon in only one phase of a temporal cycle: bird counts, river flows, average weather metrics, soil moisture

Introduction to GIS

Page 13: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Categorical Attributes•Accuracy refers to amount of misclassification of categorical data

•The chance for misclassification grows as number of possible classes increases; accuracy is a function of precision, or number of classes

•If just classifying as “land and water”, that is not very precise, and not likely to result in an error

Introduction to GIS

Page 14: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Other measures of data quality• Logical consistency

• Completeness

• Data currency/timeliness

• Accessibility

• These apply to both attribute and positional data

Introduction to GIS

Image source: http://oopslist.com/

Page 15: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Example of Currency and Timeliness

Introduction to GIS

Page 16: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Some common sources of error•Numerical processing (math operations, data type, rounding, etc)

•Geocoding (e.g. rural address matching and street interpolation)

•Topological errors from digitizing (overshoots, dangling nodes, slivers, etc)

•Automated classification steps, like unsupervised or supervised land cover classification in remote sensing, can result in processing errors

Introduction to GIS

Page 17: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Error propagation and cascading•Propagation: where one error leads to another

•Cascading: Refers to when errors are allowed to propagate unchecked from one layer to the next and on to the final set of products or recommendations

•Cascading error can be managed to a certain extent by conducting “sensitivity analysis”

Introduction to GIS

Image source: http://oopslist.com/

Page 18: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Conflation• When one layer is better in one way and another is

better in another and you wish to get the best of both

• Way of reconciling best geometric and attribute features from two layers into a new one

• Very commonly used for case where one layer has better attribute accuracy or completeness and another has better geometric accuracy or resolution

• Also used where newer layer is produced for some theme but is has lower resolution than older one

Introduction to GIS

Page 19: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Two general types of Conflation• Attribute conflation: transferring attributes from

an attribute rich layer to features in an attribute poor layer

• Feature conflation: improvement of features in one layer based on coordinates and shapes in another, often called rubber sheeting. User either transforms all features or specifies certain features to be kept fixed

Introduction to GIS

Page 20: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Attribute conflation• More spatially accurate layer is referred to as the

base, coordinate or target layer

• Layer with more accurate attribution is referred to as the reference, or non-base layer

• TIGER line files: good attribution, poor accuracy; USGS DLGs: opposite. Attribute conflation is frequently used by third party vendors to assign the rich attribute data of TIGER to the positionally accurate DLGs. Nodes are matched by iteratively rubber sheeting the reference layer to the base layer until matching nodes fall within certain tolerance. Then line features are matched up.

Introduction to GIS

Page 21: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Conflation examples

Introduction to GIS

Source: Stanley Dalal, GIS cafe

Page 22: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Documentation and Metadata•To avoid many of these errors, good documentation of source data is needed

•Metadata is data documentation, or “data about data”

•Ideally, the metadata describes the data according to federally recognized standards of accuracy

•Almost all state, local and federal agencies are required to provide metadata with geodata they make

Introduction to GIS

Page 23: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Documentation and Metadata•Metadata usually include sections similar to these

Introduction to GIS

Page 24: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Documentation and Metadata•The federal geographic data committee (FGDC) is a federal entity that developed a “Content Standard for Digital Geospatial Metadata” in 1998, which is a model for all spatial data users to follow

•Purpose is: “to provide a common set of terminology and definitions for the documentation of digital geospatial data.”

•All federal agencies are required to use these standards

Introduction to GIS

Page 25: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Documentation and Metadata• Some roles of metadata

1. Information retrieval, cataloguing, querying and searching for data electronically.

2. Describing fitness for use and documenting the usability and quality of data.

3. Describing how to transfer, access or process data

4. Documenting all relevant characteristics of data needed to use it

Introduction to GIS

Page 26: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Documentation and Metadata•Critical components usually break down into:

•Dataset identification, overview

•Data quality

•Spatial reference information

•Data definition

•Administrative information

•Meta metadata

Introduction to GIS

Page 27: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Documentation and Metadata•Data identification, overview and administrative info:

•General info: name and brief ID of dataset and owner organization, geographic domain, general description/ summary of content, data model used to represent spatial features, intent of production, language used , reference to more detailed documents, if applicable

•Constraints on access and use

•This is usually where info on currency is found

Introduction to GIS

Page 28: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Documentation and Metadata•Data quality should address:

• Positional accuracy

•Attribute accuracy

•Logical consistency

•Completeness

•Lineage

•Processing steps

Introduction to GIS

Page 29: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Documentation and Metadata•Spatial reference should include:

• horizontal coordinate system (e.g. State Plane)

•Includes projection used, scale factors, longitude of central meridian, latitude of projection origin, distance units

•Geodetic model (e.g. NAD 83), ellipsoid, semi-major axis

Introduction to GIS

Page 30: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Documentation and Metadata•Data definition, also known as “Entity and Attribute Information,” should include:

•Entity types (e.g. polygon, raster)

•Information about each attribute, including label, definition, domain of values

•Sometimes will include a data dictionary, or description of attribute codes, while sometimes it will reference a documents with those codes if they are too long and complex

Introduction to GIS

Page 31: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Documentation and Metadata•Data distribution info usually includes:

•Name, address, phone, email of contact person and organization

•Liability information

•Ordering information, including online and ordering by other media; usually includes fees

Introduction to GIS

Page 32: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Documentation and Metadata•Metadata reference, or meta-metadata

•This is data about the metadata

•Contains information on

•When metadata updated

•Who made it

•What standard was used

•What constraints apply to the metadata

Introduction to GIS

Page 33: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Metadata in Arc GIS•Arc GIS allows you to display, import and export metadata in and to a variety of Metadata formats:

•It defaults to FGDC ESRI which looks like:

Introduction to GIS

Page 34: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Metadata in Arc GIS•XML is the most flexible form because its tag structure allows it to be used in programming; tags can be called as variables or can be created through form interfaces; allows for compatibility across platforms and programs

Introduction to GIS

Page 35: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Metadata in Arc GIS•In the past, complete metadata was only available as text; you had to create most embedded metadata tags yourself. Today many state and nationwide datasets come with complete embedded metadata including full attribute codes

•E.g. NEDs, NLCD, all VCGI data

Introduction to GIS

Page 36: Lecture 23: Data quality and documentation By Austin Troy ------Using GIS-- Introduction to GIS

Materials by Austin Troy © 2007

Metadata in Arc GIS•Can edit, import, edit and export metadata in multiple formats allowing helping with proper sharing of data.

Introduction to GIS

•Can also make templates to save time in repeat documentation of big data sets

•See NPS metadata extension for cool utilities