geographic data validation

28
Geographic data validation

Upload: thy

Post on 07-Jan-2016

28 views

Category:

Documents


0 download

DESCRIPTION

Geographic data validation. Index. Basic concepts Why do we need validation? How to assess geographic data Initial checks Intermediate checks Advanced checks Some final considerations. Index. Basic concepts Why do we need validation? How to assess geographic data Initial checks - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Geographic data validation

Geographic data

validation

Page 2: Geographic data validation

Index• Basic concepts• Why do we need validation?• How to assess geographic data• Initial checks• Intermediate checks• Advanced checks• Some final considerations

Page 3: Geographic data validation

Index• Basic concepts• Why do we need validation?• How to assess geographic data• Initial checks• Intermediate checks• Advanced checks• Some final considerations

Page 4: Geographic data validation

Basic concepts• Quality• Faithful representation of a feature• Quality of data related to quality of output• GIGO principle• Data have the potential to be used in ways

unforeseen when collected.• The value of the data is directly related to the

fitness for a variety of uses.

Page 5: Geographic data validation

Basic concepts• Fitness-for-use• The suitability of a set of data for a specific

purpose• A.K.A. usability• Should not be confused with quality• Quality: Abstract• Usability: Specific• Low-quality dataset may be of a high usability

Page 6: Geographic data validation

Basic concepts• Precision

o Closeness of repeated measurements to a given value, either correct or not

• Accuracyo Closeness of a measurement to the true value

Page 7: Geographic data validation

Precision vs Accuracy

Page 8: Geographic data validation

Basic concepts• Precision

o Closeness of repeated measurements to a given value, either correct or not

• Accuracyo Closeness of a measurement to the true value

• Precision is an intrinsic value• Accuracy depends on knowing the true value of

the variable• Data validation: assessing the accuracy• Compare against a reference value

Page 9: Geographic data validation

Index• Basic concepts• Why do we need validation?• How to assess geographic data• Initial checks• Intermediate checks• Advanced checks• Some final considerations

Page 10: Geographic data validation

Why do we need validation?

Page 11: Geographic data validation

Why do we need validation?

Page 12: Geographic data validation

Why do we need validation?

• This was a striking example, but more subtle issues can (and actually do) happen

• We need to develop techniques and methodologies to explore the data

• In other words, we need to validate the data• Validating gives a sense of the reliability of the

records, and clues on how to improve it

Page 13: Geographic data validation

Index• Basic concepts• Why do we need validation?• How to assess geographic data• Initial checks• Intermediate checks• Advanced checks• Some final considerations

Page 14: Geographic data validation

How to assess?• Depending on the aim of the assessment,

different techniques• Remember that high quality datasets are more

likely to show high fitness-for-use• Ideally, check for quality• If we know the purpose, check for its fitness

Page 15: Geographic data validation

How to assess?• Work with geographic information a la

DarwinCore• Work with individual records as well as collections

of data• Start with the most basic pieces of information• Look for coherence with other pieces of

information• If not, why?• Make modifications of information to see if they

fit• In more advanced levels, make use of available

taxonomic or temporal information

Page 16: Geographic data validation

How to assess?• Tools• Spreadsheet: Microsoft Excel, LibreOffice Calc…

o Well-known environmento Visually easy

• Open Refineo Spreadsheet-like, but with some enhanced features

• Scriptso Database scripts: work directly at the sourceo Other programming language: enhanced capabilities

• GIS softwareo Often linked with other tools, such as spreadsheets or scripts

Page 17: Geographic data validation
Page 18: Geographic data validation

Visualizations• Visual exploration of record set• Useful for a first-level assessment• Primary visualization for geographic data: maps• Next picture has several issues that can be

detected using a map…

Page 19: Geographic data validation
Page 20: Geographic data validation

Coordinate transposition

• This happens when latitude is stored in longitude field and vice-versa

• Usually difficult to detect on a one-by-one basis• But when looked at the whole picture…

Page 21: Geographic data validation

Zero vs Null• One of the most common issues• Storing 0 (zero) instead of leaving the field empty• This happens with some data management

systems• Latitude 0 and longitude 0 are stored meaning

“unknown coordinates”• But we do not know that, that is not what the

standard says

Page 22: Geographic data validation

Negation• Forgetting or altering the positive/negative of the

coordinates• Usually forgetting the minus sign• The most common source: transforming from

DMS to DD, without taking “W” or “S” into account

Page 23: Geographic data validation

Check against country• The easiest way of checking these issues is to

check if the coordinates fall inside the specified country…

• Of course, if we have a country value to check against

• Two ways• Use GIS software• Use webservices like geonames (we will see this

in the openRefine session)

Page 24: Geographic data validation

Georeferencing• Intermediate check• If we have locality information and coordinates,

we can check if they match• Georeferencing is a tough task, and prone to

uncertainties, so some level of imprecision is to be expected

• Make good use of the “uncertainty” fields in DarwinCore!

• But still…

Page 25: Geographic data validation

55.932576, 13.132359Anahuac NWR (UTC 049)GrandvillePOINT(-1.3223333 53.44958)Marine Nature Study Area78º 47’ 52” S; 35º 50’ 31” EStewart ParkPOINT(-1.1735004 53.358746)BackyardMy Habitat55.932576, 13.132359Wilderness Park, north of 14th St.28054Delaney Conservation Area57.3, 11.9

Page 26: Geographic data validation

Multi-domain checks• Using information from different sources to check

quality• Especially use taxonomic information to improve

geospatial data• Most basic example: check data against range

map• If point falls inside range map of the specified

species, OK• Sometimes, temporal information is useful

Page 27: Geographic data validation

Index• Basic concepts• Why do we need validation?• How to assess geographic data• Initial checks• Intermediate checks• Advanced checks• Some final considerations

Page 28: Geographic data validation

Considerations• NEVER modify the original data• Data cleaning is a human task, and thus, it is not

error-free• Information we believe is wrong may be right• Make an “improved copy” of the data• Or “flag” the records as inaccurate

• Re-share the improvements• With the community: so that others don’t have to re-

invent the wheel• With the original owners of the data: so that they can

correct the errors at the source