introduction to geographic information systems fall 2013 (inf 385t-28620) dr. david arctur...
DESCRIPTION
Introduction to Geographic Information Systems Fall 2013 (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin Lecture 8 October 17, 2013 Geocoding. Outline. Geocoding overview Polygon geocoding Linear (street) geocoding - PowerPoint PPT PresentationTRANSCRIPT
Introduction to Geographic Information Systems Fall 2013 (INF 385T-28620)
Dr. David ArcturResearch Fellow, Adjunct Faculty
University of Texas at Austin
Lecture 8October 17, 2013Geocoding
Outline Geocoding overview Polygon geocoding Linear (street) geocoding Problems and solutions Geocoding layer sources Geocoding in ArcGIS
2INF385T(28620) – Fall 2013 – Lecture 8
Overview Process of creating geometric
representations for locations (such as points) from descriptions of locations (such as street addresses)
Uses a computer program called a geocoding engine that employs code tables and rules to standardize address components
3INF385T(28620) – Fall 2013 – Lecture 8
4
Examples City’s economic development
department Maps technology businesses by street address to
determine technology-rich areas in a city Hospital
Maps patients to determine where to open a satellite clinic
Emergency dispatch Maps callers’ addresses to determine who should respond
to an emergency Retail store chain
Maps store and customer locations, and compares to mapped competitor locations
Others?INF385T(28620) – Fall 2013 – Lecture 8
Tabular data Text file or database
Street addresses ZIP Codes
5INF385T(28620) – Fall 2013 – Lecture 8
Geocoding reference layers Street centerlines ZIP Code polygons
6INF385T(28620) – Fall 2013 – Lecture 8
POLYGON GEOCODINGLecture 8
ZIP Code geocoding Method to map data whose
geocode is for a polygon Assign each record to its polygon Count the records for each polygon Join the table to the corresponding
polygon layer Symbolize using a choropleth map or
graduated point symbols
8INF385T(28620) – Fall 2013 – Lecture 8
ZIP Code geocoding
9INF385T(28620) – Fall 2013 – Lecture 8
ZIP Code geocodingPoints created at ZIP Code centroids
10INF385T(28620) – Fall 2013 – Lecture 8
ZIP Code geocoding
Points (attendees) spatially joined to ZIP Code polygons
11INF385T(28620) – Fall 2013 – Lecture 8
12
ZIP Code geocoding Choropleth map created
INF385T(28620) – Fall 2013 – Lecture 8
LINEAR (STREET) GEOCODING
Lecture 8
Linear geocoding (streets) TIGER (Census Bureau) street maps
Four street address numbers, low to high for each side of a street segment
100 198
101 199
Oak Street
14INF385T(28620) – Fall 2013 – Lecture 8
Number 125 Oak St E, Apt. 2, Pittsburgh, PA 15213Street name 125 Oak St E, Apt. 2, Pittsburgh, PA 15213Street type 125 Oak St E, Apt. 2, Pittsburgh, PA 15213Direction, suffix 125 Oak St E, Apt. 2, Pittsburgh, PA 15213Direction, prefix 125 E Oak St, Apt. 2, Pittsburgh, PA 15213Unit number 125 Oak St E, Apt. 2, Pittsburgh, PA 15213Zone, city 125 Oak St E, Apt. 2, Pittsburgh, PA 15213Zone, ZIP Code 125 Oak St E, Apt. 2, Pittsburgh, PA 15213
Items for single-number street address:Address Unit City ZIP Code125 Oak St E Apt. 2 Pittsburgh 15213
Address components
15INF385T(28620) – Fall 2013 – Lecture 8
Street Intersections Put intersections in address field
Forbes AV & Craig STGrant ST & 5th AVE North Star RD & Duncan AV
Do not include street numbers3999 Forbes Ave & 100 Craig ST
ConnectorsAny unusual character (e.g., &, @, |)Just be consistent
16
Geocoding Flowchart
OutputNo match
ParseAddress
GenerateSoundex Key
FindCandidates:No Range &Soundex Key
Score Matches
Best match >= 90?
OutputAddress
InputAddress Matches
?Yes No
NoYes
INF385T(28620) – Fall 2013 – Lecture 8 17
Geocoding stepsOriginal address: 125 East Oak Street 15213
Address parsed: |125|East|Oak|Street|15213
Abbreviations standardized: |125|E|Oak|St|15213
Elements assigned to match keys:[HN]:125 [SN]:Oak[ST]:St [SD]:E [ZP]:15213
Index values calculated: [HN]:125 [SN]:Oak(Soundex #) [ST]:St [SD]:E [ZP]:15213 (Index #)
18INF385T(28620) – Fall 2013 – Lecture 8
Soundex index Matches names based on
how they sound (if indices match) Translates names to a 4-digit
index of 1 letter and 3 numbers
First character of name remains unchanged
Adjacent letters in the name which have the same Soundex key are assigned a single digit
If the end of the name is reached before filling 3 digits, use zeros to complete the code
Key Letters1 b f p v2 c g j k q
s x z 3 d t 4 l5 m n6 rdisregard
a e h i o u y w
Oake = O-200, Oak = O-200Smith = S-530, Smythe = S-530Paine = P-500, Payne = P-500Callahan = C-450, Calahan = C-450
Beadles = B-342, Beattles = B-342Schultz = S-243, Shults = S-432
http://www.sconsig.com/sastips/soundex-01.htmhttp://www.archives.gov/research/census/soundex.html
19
Scoring candidates Use a rule base to score
source and reference matches Start with score of 100 Subtract points for each mismatch Examples from rule base
Soundex indices match but street names do not (-2)
Street type missing in source (-1) Street types do not match (-2)
20INF385T(28620) – Fall 2013 – Lecture 8
Candidate streets
From To Street Type Side Parity Direction Street_2 98 Oak St R E W 43441 99 Oak St L O W 4345100 198 Oak St R E E 4346101 199 Oak St L O E 4357
Candidates identified: 125 East Oak Street 15213
Candidates scored and filtered:
From To Street Type Side Parity Direction Street_100 198 Oak St R E E 4346101 199 Oak St L O E 4357
21INF385T(28620) – Fall 2013 – Lecture 8
Address matched as point
From To Street Type Side Parity Direction Street_101 199 Oak St L O E 4357
Best candidate matched
Oak StPi
ne
Ave
100101
198199
125
21
9899
22INF385T(28620) – Fall 2013 – Lecture 8
PROBLEMS AND SOLUTIONSLecture 8
Possible problems Variations in street names
Fifth Avenue, Fifth Ave., 5th AV Saw Mill Run Blvd, Route 51
Data entry errors Fidth Avenue Sawmill Run
Place names White House, Heinz Field, Empire State Building
Intersections Fifth Avenue and Craig Street
24INF385T(28620) – Fall 2013 – Lecture 8
Possible problems Zones
100 Main ST 15101, 100 Main ST 16202 P.O. boxes
P.O. Box 125 Missing street data
25INF385T(28620) – Fall 2013 – Lecture 8
Solutions Clean data before geocoding Purchase or build high-quality maps
(field verification) Use postal address standards Assign house numbers in rural areas Use alias tables
26
Alias Address
White House 1600 Pennsylvania Avenue
Heinz Field 100 Art Rooney Avenue
Empire State Building 350 5th Ave
INF385T(28620) – Fall 2013 – Lecture 8
27
Alias table
Alias AddressCMU 5000 Forbes AvCarnegie Mellon 5000 Forbes AvCarnegie Mellon U 5000 Forbes AvCarnegie Mellon Univ 5000 Forbes AvCarnegie Mellon University
5000 Forbes Av
Etc.
INF385T(28620) – Fall 2013 – Lecture 8
GEOCODING LAYER SOURCES
Lecture 8
US Census TIGER files
29
Digitized from 1:100,000 scale maps Pros:
Free and easy to download Uniform across jurisdictional lines
(nationally) Street address formatting works well with
standard GIS geocoding capacities Cons:
Incomplete data Placement of address point is approximate
INF385T(28620) – Fall 2013 – Lecture 8
TIGER line attribute table
30
Census street centerlines extracted from lines that make up census boundaries tl_2009_04013_edges.shp "FEATCAT" = 'S'
INF385T(28620) – Fall 2013 – Lecture 8
31
MAF/TIGER Master Address File / Topologically Integrated
Geographic Encoding and Referencing MAF is a complete inventory of housing units and businesses in
the United States and its territoriesTIGER is a collection of lines as we know it
MAF produces mail-out census forms and ACS random samples
MAF/TIGER produces maps for on-the-ground census takers MAF is confidential TIGER 2009 and newer have much improved positional
accuracyINF385T(28620) – Fall 2013 – Lecture 8
US Census ZIP Codes
32
ZIP Code Tabulation Areas (ZCTAs) Approximations for census purposes Do not reflect actual ZIP Code areas
and are not kept up to date
INF385T(28620) – Fall 2013 – Lecture 8
33
Local jurisdictions Parcel address points
Pros: Accurate placement of residential location (parcel positional data is often very good; e.g., +/- 5 meters or less)
Cons: May need to contact individuals within
agencies to get most up-to-date data May not be available, or may cost a
substantial amount of money Data ends at jurisdictional boundaries Data files tend to be very large
INF385T(28620) – Fall 2013 – Lecture 8
34
Local jurisdictions Street centerlines
Pros: Potential to be more up to date (often
yearly updates, sometimes quarterly) Often accuracy adequate to meet city
infrastructure needs (typically +/- 10 meters or less)
Cons: May need to contact individuals within
agencies to get most up-to-date data Data ends at jurisdictional boundaries
INF385T(28620) – Fall 2013 – Lecture 8
35
Private vendors StreetMap USA
National dataset (US and Canada) Address locators prebuilt, can geocode across the
United States
GDT Dynamap/2000 US street data Small fee for individual ZIP Code layers. Map layers are the highest quality street map layers
in terms of appearance, completeness, and accuracy.
More than one million changes every quarter Maps include more than 14 million US street
segments and include postal boundaries, landmarks, water features, and other features
INF385T(28620) – Fall 2013 – Lecture 8
36
Online geocoding ArcGIS.com, Google, GeoCommons,
Maptive, etc. Pros:
Fast and easy to access Free or inexpensive
Cons Loss of privacy/confidentiality Accuracy Usability in desktop GIS
INF385T(28620) – Fall 2013 – Lecture 8
GEOCODING IN ARCGISLecture 8
Create address locator ArcCatalog
38INF385T(28620) – Fall 2013 – Lecture 8
39
Choose address locator style Skeleton of the address locator Based on data tables and reference
layer
INF385T(28620) – Fall 2013 – Lecture 8
40
Address locator styles
INF385T(28620) – Fall 2013 – Lecture 8
StyleReference dataset geometry
Reference dataset representation
Address search parameters
Example Applications
US Address—Dual Ranges
LinesAddress range for both sides of street segment
All address elements in a single field
320 Madison St.N2W1700 County Rd. 105-30 Union St.
Finding a house on a specific side of the street
US Address—Single House
Points or polygons
Each feature represents an address
All address elements in a single field
71 Cherry Ln.W1700 Rock Rd. 38-76 Carson Rd.
Finding parcels, buildings, or address points
41
Note: there are other styles…
INF385T(28620) – Fall 2013 – Lecture 8
42
Queens, NY
Salt Lake City, UT
Regions of Illinois & Wisconsin
Germany
… and many others!
INF385T(28620) – Fall 2013 – Lecture 8
Other styles… (build custom locators)
Choose reference layer Streets, ZIP Codes
43INF385T(28620) – Fall 2013 – Lecture 8
44
ArcGIS locator parameters
INF385T(28620) – Fall 2013 – Lecture 8
45
Geocode in ArcMap Add tabular data and streets layer Add address locator Geocode addresses View geocoding results Interactively rematch addresses
INF385T(28620) – Fall 2013 – Lecture 8
46
Address rematching Investigate
unmatched addresses Generally requires
expertise and knowledge of local streets
Compare a street name in the attributes of the streets table and the address table.
INF385T(28620) – Fall 2013 – Lecture 8
47
Prepare log file Log file includes reasons why
addresses did not get geocoded. Useful for future work on cleaning
addresses or repairing street maps
Incorrect address Possible reason/solution490 Penn Avenue Missing ZIP Code111 Hawksworth Spelled incorrectly900 Smallman Street TIGER street missing900 Lib Ave Spelled incorrectly
INF385T(28620) – Fall 2013 – Lecture 8
Summary Geocoding overview Polygon geocoding Linear (street) geocoding Problems and solutions Geocoding layer sources Geocoding in ArcGIS
Next week: Tutorial chapter 9, and discussion of term projects – see iSchool syllabus links:http://courses.ischool.utexas.edu/Arctur_David/2013/fall/385T/schedule.php 48INF385T(28620) – Fall 2013 – Lecture 8