building a global invasive species information network with a tapir protocol jim graham, annie...
Post on 22-Dec-2015
218 views
TRANSCRIPT
Building A Global Invasive Species Information Network with a TAPIR
Protocol
Jim Graham, Annie Simpson, Michael Browne, Bob Morris, Tom
Stohlgren, Greg Newman, …
Research vs. Production
Attribute Research Production
Quality Accurate Robust
Number of Users Few Lots
Technology Latest and greatest What works
Learning Curve Typically long Short
Support None to Informal Must
Documentation Minimal and techy is ok
Must be complete and easy to understand
Bottom Line If it’s cool they will come
If it doesn’t work, they will go elsewhere
The Tire Swing
What the customerneeded
What wasdesigned
What marketing suggested
What management
approved
What was delivered
Alan Chapman, http://www.businessballs.com/treeswing.htm
Questions to Answer
• Who is the customer?– Invasive species data providers– Invasive species data consumers– Stake holders
• What are we selling/giving them?– A network to allow the exchange of
information on invasive species
• What do we need to do to get them to want to buy/use it?
Technology Adoption Lifecycle
Bohlen, Joe M. & George M. Beal (May 1957), "The Diffusion Process", Special Report No. 18 1: 56-77
Time
Survey & Interview Highlights
• At least 3 languages/frameworks important
• 1 hour to “as long as it takes” for commitment
• Minimal web service expertise
• Various installation scenarios
• DiGIR did not meet all needs– Complex queries not needed– Database problems
History
• National Biological Information Infrastructure (NBII)
• Global Invasive Species Information Network (GISIN)
• NISBase: Brian Steves and Shawn Dalton
• GIS standards (WMS)
• Common web services
• Invasive Alien Species Profile Schema - (IAS-PS)
Situation
• Need:– Toolkits in 3 languages– Documentation– Support– Registry/Directory– Portal– Provider test bed
• Have:– Existing:
• Protocols• Schemas/Data Models• Toolkits• Portals• Registries• Databases
– Minimal funding for development
– No funding for support?
Complexity
• Complexity is a multiplier on:– Development: more to implement– Testing: more to test– Support: more to document, train, and
upgrade– Performance: larger data transfers, longer
parsing time
• Simpler means we can get tools; with higher quality, better support, that run faster, and for less money
Architecture
GISIN Data
Providers
GISIN
Consumers
GISIN
Portals
TCS
Database
Other
Consumers
Other
Providers
End-Users
Other Web
Sites
BGIF
Registry
GISIN
Registry/Directory
Web Services
Web Browser Communication
Protocol Design
• Approaches:– TAPIR-Light
• Key Value Pair Only
– Flat data models• Performance : 1 million records in 14 minutes
– Controlled vocabulary wherever possible
Required Data Models
• BioStatus: Indigenous, Harmful, etc.• Occurrences: X, Y (DarwinCore)• ProfileURLs: Language, URL• ImpactStatus: Human, Agriculture, etc.• ManagementStatus: Activity, etc.• DistributionStatus: Growing, Stable, etc.
All have: Scientific Name, Location
Implementation Requirements
1. Automatic Installation– Installer and DiGIR-like admin pages
2. Adapt toolkit to database, web server, security
3. Roll toolkit to another language (Perl, C++)
4. Do it themselves – Just the documentation
• Existing toolkits/protocols are too complex and lack the development documentation to do 2 through 4 quickly
Protocol Transaction Diagram
Locations
Observations
Organisms
SQL QuerySELECT Latitude,…FROM LocationsJOIN Observation…JOIN Organisms…WHERE Genus=‘Tamarix’
Latitude Longitude Date Scientific
Name
-105 40 10/2/2007 Tamarix aphyla
-110 35 2/10/1999 Tamarix chinensis
Requesthttp://provider.org/GISIN.php?Op=Inventory&Model=Occurrences&Count=true&Genus=Tamarix&Concept=Latitude&Concept=Longitude&Concept=Date&Concept=ScientificName
Response<response> <inventory> <records> <record> <Latitude>-105</Latitude> <Longitude>40 <Date>10/12/2000</Date> <ScientificName> Tamarix aphyla </ScientificName> </record> … </records> </inventory></response>
Database
Toolkit Design: Data Flow
Provider
Web Service
DatabaseConnection
ProviderDatabase
Metadata.xml
Capabilities.xml
GISIN Protocol
Internet
Web
Date
Utilities
Admin Web Site
Query Builder
Service
Manager
Provider.xml
Configuration Files
Performance by Time Per Record
Time Per Record
0.0000
0.1000
0.2000
0.3000
0.4000
0.5000
0.6000
0.7000
0.8000
0.9000
10 100 1000 10000 100000 1E+06
Records in Database
Fet
ch T
ime
Per
Rec
ord
in
Mill
isec
on
ds
7. Fetch Rows with 1 field
8. Fetch rows with 10 fields
10. Fetch blocks of rowswith limit query
1 million records: 14 hours -> 14 minutes
Products Mapped to Customer Needs
Consumers Consumers
Providers Providers
Complex QueriesRDF
KVPXML
Adapted from Peter Fox, Debra McGuiness (personal communication)
More Sophisticated Users
Invasive Species Databases Other Databases
TAPIR/DarwinCore…GISIN
Next Steps
• Resolve Issues• Toolkit Development:
– Complete the design– Roll to Java and ASP– User’s Guide
• Testing:– 2-4 more databases connected– Automated tests– Defect tracking
• Portal– Incremental improvements
• Provider Meeting in November
Current Web Site
• GISIN Organization Site: www.GISINetwork.org• GISIN Directory: www.niiss.org/GISIN
– Until end of September: www.niiss.org/GISS– Browse Directory– Search for data: BioStatus, Occurrences, ProfileURLs
• GISIN Technical Site– Documentation– For providers:
• Get Toolkit• Sample Provider (based on the toolkit)• Manual exercising of TAPIR-GISIN web services• Automated tests are coming!
Acknowledgements
• Funded by NSF, NBII (USGS), GBIF, TDWG• Thanks to: Renato de Giovanni, Roger Hyam,
Donald Hobern, Markus Döring, Hannu Saarenmaa, Kevin Richards, Peter Fox, Debra McGuiness, Brain Steves, Pam Fuller, John Pickering, Shawn Dalton, Greg Ruiz, and the other members of GISIN
• Review: www.niiss.org/GISIN (or GISS)• Contact: [email protected]