“how to deal with place name data for obis”

27
“How to deal with place name data for OBIS” Two (different) aspects to consider ... Place name conversion => lats/longs or polygons (data custodian end) Place name conversion to search area (portal end) Tony Rees for OBIS TWG, March 2003

Upload: ella

Post on 11-Jan-2016

44 views

Category:

Documents


2 download

DESCRIPTION

Tony Rees for OBIS TWG, March 2003. “How to deal with place name data for OBIS”. Two (different) aspects to consider ... Place name conversion => lats/longs or polygons (data custodian end) Place name conversion to search area (portal end). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: “How to deal with place name data for OBIS”

“How to deal with place name data for OBIS”

Two (different) aspects to consider ...

• Place name conversion => lats/longs or polygons (data custodian end)

• Place name conversion to search area (portal end)

Tony Rees for OBIS TWG, March 2003

Page 2: “How to deal with place name data for OBIS”

Need to consider both precision and accuracy(not the same!)

• Precision– “a measure of the ability to distinguish between nearly equal

values”

– “the number of decimal places to which a number is computed”

• Accuracy– “How close to the real value a measurement is”.

Note, a number of dictionaries give accuracy as a synonym for precision, which is incorrect...

Page 3: “How to deal with place name data for OBIS”

Relevance for OBIS …• What precision is associated with the quoted locality name?

– locality may be large or small– did specimen come from at/within the actual locality, or the region around it? (i.e., “nearest named

place”)– how precisely are displaced distances and directions quoted (e.g. “5 miles NE of point xyz”)

• Precision of available lat/long values for the quoted locality– to nearest degree? minute? second? If decimal degrees, how many decimal places quoted? Is precision

in fact over-stated?

• Accuracy of available lat/long values– if quoted in different sources, do these agree? If checked on a (“reliable”) map, are they in fact correct?

• Need to distinguish between “real” and “apparent” precision (see next slide)

Page 4: “How to deal with place name data for OBIS”

Which is most precise?• 147.31666666º E (Alexandria Digital Library Gazetteer) - source A• 147.3167º E (Falling Rain Global Gazetteer) - source B• 147.317º E (Australian Gazetteer) - source C

• 147º18’59” E (Alexandria Digital Library Gazetteer) - source A• 147º19’00” E (Falling Rain Global Gazetteer) - source B• 147º19’ E (Australian Gazetteer) - source C

NB, 1’ of latitude/longitude is approx. 0.02 degrees (actually 0.0167)

or around 1.7 km (1 mile approx.) .

Questions to consider ...• What is the true (vs. apparent) precision of the above measurements?• Which of them is most accurate? (or how would you tell?)

Page 5: “How to deal with place name data for OBIS”

Tools for Data Capture

• Place names => lats/longs … common museum exercise (“geocoding” or “georeferencing”)

• Requires use of gazetteers or maps• Some gazetteers/maps available on web e.g.:

– MS Encarta, Expedia.com - http://www.expedia.com/pub/agent.dll?qscr=mmfn

– NGDC/WDC MGG, Boulder Marine Coastline Extractor -http://oas.ngdc.noaa.gov/mgg/plsql/extractor.mapit

– Alexandria Digital Library Gazetteer (4.4 million names) - http://fat-albert.alexandria.ucsb.edu:8827/gazetteer/

– Falling Rain Genomics Global Gazetteer (2.8 million names) -http://www.calle.com/world/

– National, local gazetteers e.g. Australia -http://www.agso.gov.au/map/names/

• May possibly be available as “web services” in future (machine addressable)

Page 6: “How to deal with place name data for OBIS”

Example - search for “Tinderbox, Australia”(small populated place near my home - <500 inhabitants)

Page 7: “How to deal with place name data for OBIS”

MS Encarta, Expedia (max. zoom in)

- includes some quite small places, however not all (inconsistent);

- best for showing other named places in surrounding area

… cf. New Brunswick at same scale:

Page 8: “How to deal with place name data for OBIS”

NGDC/WDC MGG, Boulder-Marine Coastline Extractor(near max. zoom in)

… no placenames, but could use overlay lat/long grid for georeferencing where coastal features are unambiguously recognizable

Page 9: “How to deal with place name data for OBIS”

“Falling Rain” Global Gazetteer

result (max. zoom in)

- gives lat, long and locator map/s

- coastline is quite high resolution

- max. zoom level is a bit restricting

Page 10: “How to deal with place name data for OBIS”

Alexandria Dig. Libr. Gazetteer search result (max. zoom in)

Page 11: “How to deal with place name data for OBIS”

“Australian Gazetteer” search result

- a very well-populated gazetteer, couldn’t find anything missing (on a brief look)

Page 12: “How to deal with place name data for OBIS”

“Australian Gazetteer”

search result - cont’d - show on map (not zoomable)

Page 13: “How to deal with place name data for OBIS”

Printed 1:50 000 map (detail)

( = indexed in Austr. Gazetteer)

Page 14: “How to deal with place name data for OBIS”

Where actually is “Tinderbox”?

• Alexandria Digital Library Gazetteer: 147.31666666º E (147º18’59”), 43.049999º S (43º2’59”)

• Falling Rain Global Gazetteer: 147.3167º E (147º19’00”), 43.0500º S (43º2’60”)

• Australian Gazetteer (Official): 147.317º E (147º19’), 43.050º S (43º03’)

• NGDC/WDC Coastline Extractor (eye estimate): 147.33º E, 43.05º S

• 1981 1:50 000 map (eye estimate): 147º19’30”E, 43º03’30”S

• maybe need to go down there with a hand-held GPS!!

- all compatible (surprise!) - actually (1) - (2) are derived from (3) anyway!!

- map allows additional accuracy to maybe +/- 1/5 min (12 seconds, ~0.003 degrees or 300 meters) - however would need to be careful about datum actually used at this fine scale (pre-1980s datum slightly different from current)

- at fine scale, difficulty in determining exact centre of “Tinderbox”

- “Official” Australian coordinates stated as only accurate to nearest 1 minute (approx. 1.8 km) - so really we are “snapping to a grid” (potentially misleading precision when converted to decimals) --- see next slide ...

Page 15: “How to deal with place name data for OBIS”

Printed 1:50 000 map (enlarged detail - grid squares are 1 x 1 km)

9

99

9

9

147º19’ 147º20’ 147º21’

43º03’

43º04’

9

Page 16: “How to deal with place name data for OBIS”

revisit questions posed earlier ...• 147.31666666º E (Alexandria Digital Library Gazetteer) - source A

• 147.3167º E (Falling Rain Global Gazetteer) - source B

• 147.317º E (Australian Gazetteer) - source C

• 147º18’59” E (Alexandria Digital Library Gazetteer) - source A

• 147º19’00” E (Falling Rain Global Gazetteer) - source B

• 147º19’ E (Australian Gazetteer) - source C

• What is the true (vs. apparent) precision of the above measurements?– true precision given by source C (which the others have simply copied to their own systems) to be +/- 30” (0.008º) - thus true quoted value should be

147.32, and seconds are meaningless

– Sources A and B are wildly overstating the true precision, in both decimal places and seconds

– Even source C is overstating the true precision when expressed in decimal degrees!

• Which of them is most accurate?– all equally inaccurate - by around 0.8 km (0.5 miles) - true position determined from map is 147º19’30” E, = 147.325 (+/- 0.003 approx.)

Page 17: “How to deal with place name data for OBIS”

Potential traps when georeferencing from gazetteers/maps

• May be multiple places with the same name (including some maybe not in Gazetteer)

• May be variant/misspelled/historic names for the same place

• A feature extent can be much larger than designated lat/long reference would imply (e.g. River, Bay, Island, Channel, Strait, Sea…) - where within (or adjacent to) a larger “polygon” is the real locality? (Centre - as typically quoted - may well be an incorrect assignation)

• Precision of coordinates from any source needs to be known and not unintentionally misrepresented when converted to decimals

• Map, Gazetteer source used should be recorded, in case it contains errors which can be detected/corrected retrospectively if needed

(continued…)

Page 18: “How to deal with place name data for OBIS”

Potential traps when georeferencing from gazetteers/maps (continued)

• Map can be misleading if older datum used or if otherwise erroneous - however often still best source to see “real” feature extents, minor feature names, and detailed coastal topography

• “Distance from”, “Direction from …” may be approximations only and introduce their own errors/uncertainty; also, may be unclear where actually measured from … (e.g., centre or edge of named locality?)

• Numeric values obtained should be believable - e.g. marine locations not on land, species not too far from expected range (otherwise more checks needed - maybe ID is wrong, locality mis-reported, etc.)

• Precision needs to be reported in a consistent manner - e.g. see next slide.

Page 19: “How to deal with place name data for OBIS”

Precision for OBIS ...

My suggestion would be to use a scale e.g. 1-n (1=best), e.g.:

• 1 = estimated precision better than 100m / 0.001 degrees

• 2 = estimated precision better than 500m / 0.005 degrees

• 3 = estimated precision better than 1km / 0.01 degrees

• 4 = estimated precision better than 5km / 0.05 degrees

• 5 = estimated precision better than 10 km / 0.1 degrees

• 6 = estimated precision better than 50 km / 0.5 degrees

• 7 = estimated precision better than 100 km / 1 degree

• 8 = estimated precision better than 500 km / 5 degrees

• 9 = estimated precision better than 1000 km / 10 degrees

• 0 = estimated precision > 1000 km (unmappable)

… could then select points of relevant precision when mapping at different scales (e.g. precisions 1-6 acceptable for map at 0.5 deg. resolution, precisions 1-8 acceptable for map at 5 deg. resolution)

Note: Darwin Core already has a field “Coordinate Precision” - to hold a value in meters (although such values may be too precise!!!)

Page 20: “How to deal with place name data for OBIS”

9

NB: could only improve on this if (1) locality accurately known (e.g. by named small coastal feature or “X” on map), and (2) coordinates quoted to higher precision.

1 km (precision “3”) and 5 km (precision “4”) radii from 147º19’E, 43º03’S(official quoted position for “Tinderbox”)

- note, stated locality may simply be “nearest named place”, not actual location

Page 21: “How to deal with place name data for OBIS”

Is it worth representing localities by polygons?

• Could be done, but may be too difficult for OBIS to query at this time

• Could represent improved precision and accuracy, cf. “point and radius” treatment - but a tool would be needed for data input, unless standard lookup table available (such a tool has been described - see Proctor/Blum/Chaplin, 2001 *), also polygon boundaries would need to be stored in a standard notation (e.g. ISO metadata format)

• Value might be questionable, considering the likely precision associated with quoted [marine] localities - how useful is a polygon for “Gulf of xyz” if the exact locality is not known? (Would be different if the data actually were polygon, rather than point data).

* “A Software Tool for Retrospectively Georeferencing Specimen Localities using ArcView” - Elizabeth J. Proctor, Stanley D. Blum and George Chaplin (available on the web at

http://www.calacademy.org/research/informatics/georef/Main_Pages/2_Background.html)

Page 22: “How to deal with place name data for OBIS”

Some implications for OBIS data storage ...

• 1. Store original (“verbatim”) locality information as well as designated lat/long -- may contain important information lost during “translation” (and may also be more precise). OBIS may wish to display it for additional user information if available ??

– ?= Darwin Core “Locality” (or locality + other fields?) -- see below

• 2. Designated lat/long and assigned precision would be fundamental parameters for OBIS to query.

• 3. Need to consider value of storing/accessing polygons if available (or not bother?)

• NB comparison with Darwin Core v2:

– Darwin Core has the following “optional” fields available (in addition to lat, long)…• “Continent Ocean”

• “Country” (from ISO list)

• “State Province”

• “County”

• “Locality” (place name + optional displacement from…)

• “Coordinate Precision” (= radius of circle in meters)

– How much of this is relevant to OBIS usage (marine, cf. terrestrial specimens)?

Page 23: “How to deal with place name data for OBIS”

2: Searching the portal by placename

• Most gazetteers hold only centre point (lat, long) for any place

– Some entries in Alexandria D.L. Gazetteer have bounding box

– My “MarLIN” system (+ others in Australia) has bounding box for ~120 pre-defined areas including named ocean/seas, etc., e.g. ...

Page 24: “How to deal with place name data for OBIS”

• Then can implement as part of a search interface, e.g.:

Page 25: “How to deal with place name data for OBIS”

• Available alternatives to region representation by bounding box:– Every data point pre-assigned to an item on a controlled list of named regions

(in hierarchy) - then could do text/index search• (similar to e.g. ASFA indexing terms)

– More realistic geographic “footprint” stored for each defined region - either:• Centre point and radius (probably not ideal)• True polygon (requires GIS back end for the searching)• C-squares representation (or similar) - multiple small rectangles per region

– At present, my own system uses bounding boxes to represent regions, most users seem happy with the approximations required

• Suggest OBIS implements rectangle representation as first step, investigate possible improvements/refinements later

Page 26: “How to deal with place name data for OBIS”

• Requirement then is to construct a list of geographic search terms to be made available - and to define/store the relevant rectangle for each.

• Q1: How big should the list be? (hundred/s, thousand/s …)

• Q2: What types of locality should be listed - e.g.– ocean/sea/gulf/bay name– river/estuary name– island/cape/point names (coastal features)– seafloor feature names (seamounts, canyons, reefs)– political/administrative entities e.g. country, state names– coastal city names … (list soon gets long!)

• Comment: could eventually envisage more complex queries with full GIS capability, e.g. “within x miles of named region y” (although probably not at this time)

• OBIS would need to expend some resources to build such a list, unless available from a public/commercial source (e.g. as vector data from which rectangles could easily be calculated).

Page 27: “How to deal with place name data for OBIS”

• Reactions? Discussion? Experiences?