2016 foss4 g track: developing and implementing spatial etl processes with open source tools by...

Post on 07-Jan-2017

37 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Developing and Implementing Spatial ETL Processes with Open Source Tools

Matthew BakerDenver Public Schools

Dep't of Planning and Analysis

Quick FOSS4G Update

● Replaced ArcSDE with PostGIS– Dev, QA (Prod TBA...)

● 'Non-GIS' users using QGIS● Editable PostGIS Views!● Data-driven cartography

– OSM data with styles saved in PostGIS

● Manager now using QGIS

SQL Server

Dev QA Prod

PostGIS

ArcGIS

Enterprise

maps.dpsk12.org

Database Structure

ETL: ExtractTransformLoad

ETL Process Needs

● PostGIS read/write● SQL Server Spatial read/write● Daily Updating of Tables● Daily Building of Datasets● Daily Delivery to DPS Enterprise

Goals of ETL Development

● Break dependency on GUI-based tools● Overcome 'other' FOSS ETL Tools

– Geokettle

– GDAL

● Avoid commercial ETL tools– SSIS

– FME

Creating New Tables

● Import Shapefiles to Dev– QGIS DB Manager

● Import Non-Spatial Tables– CSVKit - Python via command line

– Read CSV Schema● Generate SQL ‘Create Table’

PostGIS and Spatial “Text”

select

geom

from dpsdata."Schools_Current"

PostGIS and Spatial “Text”

select

cast(geom as varchar)

from dpsdata."Schools_Current"

PostGIS and Spatial “Text”

select

ST_AsText(geom)

from dpsdata."Schools_Current"

PostGIS Spatial Transformation

select

ST_Transform(geom, 2877)

from dpsdata."Schools_Current"

select

ST_AsText(ST_Transform(geom, 2877))

from dpsdata."Schools_Current"

Python for Databases

● Pypyodbc– Pure python implementation of pyodbc

– Connect to databases using ODBC● MS SQL Server

● Psycopg2– PostgreSQL adapter (libraries) for Python

SQL Inserts

insert

into tablename (column1, column2)

values ('value1', 'value2')

SQL Inserts with Parameters

INSERT INTO "Schools_Current"

(school_name, abbreviation, elem, mid, high, schnum, geom classification)

VALUES (?, ?, ?, ?, ?, ?, ?);*

* postgresql syntax

Python ETL Pattern

Connect to databases

Truncate Destination

Insert into Destination

Select from Source

Python ETL Pattern● Connect to databases

● Source● Destination

● Set Up Cursors

● Select from Source● Use SQL Expression (with spatial function)● Assign data to rows (in memory)

● Insert into Destination● Create insert statement with parameters● Iterate through rows (data)

● Assign row values to variables● Commit data with Insert

● Truncate Destination

Example: PostGIS to PostGISimport psycopg2

connSource = psycopg2.connect("host=arcgisdev01 dbname=dpspgisdev user=dpsdata password=*** ")curSource = connSource.cursor()connDest = psycopg2.connect("host=FOSS4GLin01 dbname=dpspgisqa user=dpsdata password=*** ")curDest = connDest.cursor()

curSource.execute('''select addressid, cast(geom as varchar) from public."Address_Master"''')

sql = ('''insert into dqmt.Address_Master (addressid, geom) values (%s, %s)''')

data = []

rows = curSource.fetchall()

for row in rows: data = [row[0], row[1]] curDest.execute (sql, data) connDest.commit()

connSource.close()connDest.close()

Deployed Processes

● Daily Active Students – Extract from MSSQL View joining geometry to students

– Deliver to PostGIS and MSSQL

● Refresh Boundaries– PostGIS Materialized Views

● Geocoding● Enterprise Delivery

– Schools and Boundaries

– Shared Enrollment Zone Info

– Current Addresses and Boundary Information (spatial join)

Deployment

● Microsoft Windows Server– Task Scheduler

– (still doesn't run FME / ArcPY scripts)

Ubuntu Server Deployment● Cron Task Scheduler

0 3 * * * python /home/dpspgisqa/scripts/SchoolBoundaries_All.py

0 3 * * * python /home/dpspgisqa/scripts/SchoolBoundaries...

0 3 * * * python /home/dpspgisqa/scripts/Schools_Current.py

0 3 * * * python /home/dpspgisqa/scripts/Schools_Projected.py

* * * * * /folder/runThisFile.py

| | | | |

| | | | ----- Day of week (0 - 7) (Sunday=0 or 7)

| | | ------- Month (1 - 12)

| | --------- Day of month (1 - 31)

| ----------- Hour (0 - 23)

------------- Minute (0 - 59)

Other Python Tricks

● Error Handling– On script fail

● Send Email● Insert message to database

● Run single SQL Script– Within 1 database

● Bulk Inserts

Next Steps

● Implement PostGIS Prod server– CentOS (new IT staff!!!)

● Document Internally● Share Externally

– github.com/DPSSpatial

● Web maps – Internal– External

THANK YOU!

planning@dpsk12.org

github.com/dpsspatial

top related