data quality issues-chapter 10
DESCRIPTION
Data Quality Issues-Chapter 10. GiGo: garbage in, garbage out Quality Issues Terminology Sources, propagation, and management What is Data Quality? Overall fitness or suitability of data for a specific purpose. Errors, Accuracy, Precision, & Bias. Errors - PowerPoint PPT PresentationTRANSCRIPT
Data Quality Issues-Chapter 10Data Quality Issues-Chapter 10
GiGo: garbage in, garbage outGiGo: garbage in, garbage out Quality IssuesQuality Issues
– TerminologyTerminology– Sources, propagation, and managementSources, propagation, and management
What is Data Quality?What is Data Quality?– Overall fitness or suitability of data for a Overall fitness or suitability of data for a
specific purposespecific purpose
Errors, Accuracy, Precision, & Bias Errors, Accuracy, Precision, & Bias
ErrorsErrors– Difference between real world and GISDifference between real world and GIS– Could be one error or the whole thing is offCould be one error or the whole thing is off
AccuracyAccuracy– Extent in which an estimated value approaches Extent in which an estimated value approaches
a true valuea true value– Can never get 100% accurateCan never get 100% accurate
PrecisionPrecision– Recorded level of detailRecorded level of detail
Errors, Accuracy, Precision, & BiasErrors, Accuracy, Precision, & Bias
BiasBias– Consistent Consistent
error error throughout throughout data setdata set
– Human, Human, equipmentequipment
– Difficult to Difficult to spotspot
ResolutionResolution
Smallest feature or data that can be Smallest feature or data that can be displayeddisplayed
RasterRasterCell sizeCell size Vector-point size, line widthsVector-point size, line widths
GeneralizationGeneralization
Process of simplifying Process of simplifying
Completeness & Consistency Completeness & Consistency
CompletenessCompleteness– Are all instances of a feature the GIS/map claims to include, Are all instances of a feature the GIS/map claims to include,
in fact, there?in fact, there?– Simply put, how much data is missing?Simply put, how much data is missing?
Logical ConsistencyLogical Consistency– The presence of contradictory relationships in the databaseThe presence of contradictory relationships in the database
Some crimes recorded at place of occurrence, others at Some crimes recorded at place of occurrence, others at place where report takenplace where report taken
Data for one country is for 2000, for another its for 2001 Data for one country is for 2000, for another its for 2001 Annual data series not taken on same day/month etc. Annual data series not taken on same day/month etc.
(sometimes called lineage error)(sometimes called lineage error) Data uses different source or estimation technique for Data uses different source or estimation technique for
different years (again, lineage)different years (again, lineage)
CompatibilityCompatibility CompatibilityCompatibility
– Overlay maps different scalesOverlay maps different scales Can not be combinedCan not be combined
– Combining nominal and ratio Combining nominal and ratio Nominal scales Nominal scales
distinguish one item from distinguish one item from another, but they do not another, but they do not rank or quantify data. rank or quantify data.
– Soil Name, City Name, Soil Name, City Name, Polygon Identification Polygon Identification Number Number
Ordinal scales identify the Ordinal scales identify the relative magnitudes, but relative magnitudes, but they do not quantify they do not quantify exact differences exact differences between values. between values.
– Income = ( low , medium Income = ( low , medium , or high), or high)Slope = ( A , B ); where Slope = ( A , B ); where A = 0-4%, and B = 5-9% A = 0-4%, and B = 5-9%
Slope
Crop
ApplicabilityApplicability
ApplicabilityApplicability– Suitability of data for commands, operations or Suitability of data for commands, operations or
analysisanalysis– Using your GIS data collected points for a Using your GIS data collected points for a
parcel fabricparcel fabric
Sources of Error in GISSources of Error in GIS
Survey DataSurvey Data– surveyor or instrument errorsurveyor or instrument error– choice of spheroid and datumchoice of spheroid and datum– Data encoding and entryData encoding and entry
E.g. keying or digitizing errorsE.g. keying or digitizing errors
Remotely Sensed Data or Aerial Remotely Sensed Data or Aerial PhotographyPhotography– Mistakes in classificationMistakes in classification– Change in timeChange in time
ManualManualDigitizing ErrorsDigitizing Errors
Cleaning and Cleaning and editing always editing always requiredrequired
Vector to Raster or Vector to Raster or Raster to VectorRaster to Vector
Errors in Data Processing and Errors in Data Processing and AnalysisAnalysis
is this data suitable for analysis?is this data suitable for analysis? Is in a suitable format?Is in a suitable format?
– Different datum's?Different datum's?
Are the data sets compatible?Are the data sets compatible?– Incompatible units?Incompatible units?– Widely different scales?Widely different scales?
Will the output mean anything?Will the output mean anything?
Classification Classification ErrorsErrors
EVALUATING CURRENT DATAEVALUATING CURRENT DATA
Most of the information captured in a Most of the information captured in a GIS generally exists somewhere in GIS generally exists somewhere in the office that requires the the office that requires the application. Some additional data application. Some additional data may be purchased or obtained by may be purchased or obtained by data sharing with other agencies.data sharing with other agencies.
The source, accuracy, reliability, The source, accuracy, reliability, condition and scale for each condition and scale for each document or record must be document or record must be evaluated.evaluated.
SOURCESOURCE
The data may be in paper or map The data may be in paper or map form, or it may exist in computer files form, or it may exist in computer files on another system.on another system.– Where did that information come from?Where did that information come from?– What is the source of the source?What is the source of the source?– Do you know how the map was compiled?Do you know how the map was compiled?– Do you know who compiled the map or record?Do you know who compiled the map or record?– Have you spoken with the author to learn as Have you spoken with the author to learn as
much as possible about the data?much as possible about the data?– What are the strong & weak points about the What are the strong & weak points about the
data?data?
Data Accuracy & ReliabilityData Accuracy & Reliability There are different types of accuracy.There are different types of accuracy.
– Absolute positional Absolute positional accuracy refers to the measurement accuracy refers to the measurement of map location as it relates to a real world location (For of map location as it relates to a real world location (For example; a GPS coordinate point).example; a GPS coordinate point).
– Relative positional Relative positional accuracy is a measure of the accuracy is a measure of the relationships between the different features on the map. relationships between the different features on the map. Relative accuracy compares the scaled distance between Relative accuracy compares the scaled distance between features measured from the map data with distances features measured from the map data with distances measured between the same features on the ground. measured between the same features on the ground.
The other type of accuracy deals with the content of the The other type of accuracy deals with the content of the information in the GIS database. Are there errors or missing information in the GIS database. Are there errors or missing data? A road may have positional accuracy but have the data? A road may have positional accuracy but have the wrong road name associated to the feature. We think of this wrong road name associated to the feature. We think of this as Reliability.as Reliability.
Another very important aspect of reliability is how current Another very important aspect of reliability is how current the data sources are.the data sources are. If the map or record has not been If the map or record has not been properlyproperlymaintained some method of bringing the document up to maintained some method of bringing the document up to date must be instituted.date must be instituted.
Data Accuracy & ReliabilityData Accuracy & Reliability
MAINTENANCE OF DATAMAINTENANCE OF DATA
Many of the answers needed to insure Many of the answers needed to insure proper data maintenance are flushed proper data maintenance are flushed out in a preliminary needs and data out in a preliminary needs and data analysis.analysis.– Specifically, maintaining data involves knowingSpecifically, maintaining data involves knowing– Frequency of changeFrequency of change– Quantity of changeQuantity of change– Sources of changeSources of change
It must be re-iterated: If data is not It must be re-iterated: If data is not going to be maintained DO NOT PUT IT going to be maintained DO NOT PUT IT IN YOUR GIS.IN YOUR GIS.
ConditionCondition
The condition of the source The condition of the source documents, especially maps, will documents, especially maps, will determine how difficult the determine how difficult the conversion will be.conversion will be.
Clear mylar and ink drawings will be Clear mylar and ink drawings will be easier to digitize (no matter what the easier to digitize (no matter what the method) than maps of poor legibility.method) than maps of poor legibility.