Download - COBWEB A quality assurance workflow authoring tool for citizen science and crowdsourced data
A Quality Assurance workflow Authoring Tool for citizen science and crowd-sourced data.
Didier Leibovici, Julian Rosser, Mike Jackson and the COBWEB project
Nottingham Geospatial InstituteUniversity of Nottingham, UK
• Aim is to bring together a precise, structured, top-down and formal standards-based institutional approach with low cost, relevant, rich and timely citizen-focussed approach of the crowd but where there are short-comings of completeness, precision, interoperability and often minimal direction.
• Not straight forward - the two perspectives of what
constitutes useful, QA’d, fit-for-use data are very different.
Research Objective - to integrate (with QA) authoritative and crowd-sourced data
Crowd Sourcing Authoritative Government Data
‘Non-systematic incomplete coverage vs Systematic + comprehensive Near ‘real-time’ and ongoing data collection allowing trend analysis
vs ‘Historic’ and ‘snap-shot’ map data
Free ‘un-calibrated’ data but often at hi-res and up-to-the-minute
vs Quality assured ‘expensive’ data.
‘Unstructured’ and mass consumer driven metadata and mash-ups.
vs ‘Structured’ and defined metadata but often in rigid ontologies.
Unconstrained capture + distribution from ‘ubiquitous’ mobile devices
vs ‘Controlled’ licensing, access policies and digital rights.
Simple’ consumer driven web services for data collection + processing.
vs ‘Complex ‘institutional survey + GIS applications
A clash of paradigms and Market Dynamics:
Jackson, M. J., Rahemtulla, H. + Morley, J. (2010). “The Synergistic Use of Authenticated + Crowd-Sourced Data for Emergency Response”, Proc, 2nd Int Workshop on Validation of Geo-Information Products for Crisis Management (VALgEO), 11-13/10/10, Ispra, Italy, pp 91-99. http://globesec.jrc.ec.europa.eu/workshops/valgeo-2010/proceedings
mobile data capture & Quality Assurance / conflation
http://cobwebproject.eu
When considering the use of crowd-sourced GI data we need to quality assure it from:
1. A Spatial (geometric) perspective2. A Thematic (domain attribution) perspective3. A Temporal (time-related attribution) perspective
And in terms of data quality “Elements” we have to consider: Completeness – by area, by class, Consistency – e.g. topological, semantic, temporal Accuracy – relative, absolute Usability – fitness for purpose for a particular application or
requirement
Aspects of Quality
Solution adopted (i)
• “Internal” quality metrics <Completeness, positional accuracy, consistency, etc.> defined by ISO 19157
• “External” consumer quality <fitness for purpose> metrics based on GeoViQua [www.geoviqua.org>]
• Stakeholder model QA <data collector’s judgement, trust, reliability> [Meek et al 2014]
Metadata on Data Quality three models• ISO19157 (producer model)
where DQ_Scope will be ”feature"DQ_Usability• DQ_CompletenessDQ_CompletenessCommissionDQ_CompletenessOmission• DQ_ThematicAccuracyDQ_ThematicClassificationCorrectnessDQ_NonQuantitativeAttributeAccuracyDQ_QuantitativeAttributeAccuracy• DQ_LogicalConsistencyDQ_ConceptualConsistencyDQ_DomainConsistencyDQ_FormatConsistencyDQ_TopologicalConsistency• DQ_TemporalAccuracyDQ_AccuracyOfATimeMeasurementDQ_TemporalConsistencyDQ_TemporalValidity• DQ_PositionalAccuracyDQ_AbsoluteExternalPositionalAccuracyDQ_GriddedDataPositionalAccuracyDQ_RelativeInternalPositionalAccuracy
Simplified GeoViqua model (consumer model) where DQ_Scope will be ”external data"GVQ_PositiveFeedbackGVQ_NegativeFeedback
COBWEB Stakeholder Quality Model where DQ_Scope will be ”volunteer"CSQ_VaguenessCSQ_AmbiguityCSQ_JudgementCSQ_ReliabilityCSQ_ValidityCSQ_TrustCSQ_NoContribution
Solution adopted (ii)
• OGC WPS standard which allows access to a repository of processes and services from compliant clients
• A key aspect of the standard is the provision to chain disparate processes and services to form a reusable workflow
• Use of BPMN rather than (BPEL) for workflow engine - excels in modelling processes visually allowing non-domain experts to communicate and mutually understand their models.
• Configurable workflows - stakeholders able to design a solution to fit use case from a generic set of WPS processes
Solution adopted (iii)
• Github used for code repository and open source evolution of solution
• Built on open source implementations of WPS, client libraries (52 North), BPMN implementation is JBPM maintained by JBOSS, WPS runs on Apache Tomcat, JBPM deployed on JBOSS Wildfly
• Full details in “A BPMN solution for chaining OGC services to quality assure location-based crowd-sourced data”, Meek, Jackson, Leibovici (2015) submitted to: Computers and Geosciences
Mike Jackson, 4-5 Nov., 2015, China
the COBWEB QAQC the 7+ pillars of Quality Controls (QC)
7 pillars of QC and the 7+ cross-pillar a QC
.workflow authoring toolBPMN encoding
.composition supportSKOS encoding
.repository of QCsas WPS
QAQC: workflow of QC as WPS
QAwAT
QAwOnt
QAwWPS
Example Workflow from EU COBWEB Project https://cobwebproject.eu
Qualifying the Observations, the Volunteers and the Authoritative data
Quality elements generated & evolving
QC examplesExample of a QA workflow
Design and composition using a graphical tool
QC examplesQAQC workflow Authoring Tool (QAwAT)
QAwAT
Design and composition in Eclipse
Design and composition JBPM web editor
Some results on the Japanese knotweed co-design
beforeQA
Some results (ground truth from photo)
afterQA
Rosser J, Pourabdolllah A, Brackin R, Jackson MJ, Leibovici DG (2016) Full Meta Objects for Flexible Geoprocessing Workflows: profiling WPS or BPMN? 19th AGILE Conference, 14-17 June 2016, Helsinki, FinlandLeibovici DG, Williams J, Rosser J.F, Hodges C, Scott D, Chapman C, Higgins C, and Jackson M.J (2016) The COBWEB Quality Assurance System in Practice: Example for an Invasive Species Study. ECSA conference 19-21 May 2016, Berlin, Germany Meek, S., Jackson, M., Leibovici, L. (2016), A BPMN solution for chaining OGC services to quality assure location-based crowdsourced data , Computers &Geosciences, 87(2016)76–83Leibovici DG, Meek S, Rosser J and Jackson MJ (2015) DQ in the citizen science project COBWEB: extending the standards. Data Quality DWG, OGC/TC Nottingham, September 2015, U.KLeibovici DG, Evans B, Hodges C, Wiemann S, Meek S, Rosser J and Jackson MJ (2015 ) On Data Quality Assurance and Conflation Entanglement in Crowdsourcing for Environmental Studies. ISSDQ 2015 - The 9th International Symposium on Spatial Data Quality, 29-30 September, La Grande Motte, FranceMeek S, Jackson MJ, Leibovici DG (2014) A flexible framework for assessing the quality of crowdsourced data. AGILE conference, 3-6 June 2014, Castellon, SpainLeibovici DG and Jackson MJ (2013) Copula metadata est. AGILE conference, 14-17 May 2013, Leuven, BelgiumLeibovici DG, Pourabdollah A and Jackson MJ (2013) Which Spatial Data Quality can be meta-propagated? Journal of Spatial Sciences, 58(1): 3-14Leibovici DG, Pourabdollah A and Jackson M (2011) Meta-propagation of Uncertainties for Scientific Workflow Management in Interoperable Spatial Data Infrastructures. EGU 2011, European Geosciences Union, General Assembly, Vienna, Austria April 2011Pawlowicz S, Leibovici DG, Haines-Young R, Saull R and Jackson M (2011) Dynamical Surveying Adjustments for Crowd-sourced Data Observations. EnviroInfo 2011, Ispra, ItalyLeibovici DG and Pourabdollah A (2010) Workflow Uncertainty using a Metamodel Framework and Metadata for Data and Processes. OGC TC/PC Meetings, 20-24 September 2010, Toulouse, FranceJackson, M., Rahemtulla, H., Morley, J. (2010). The synergistic use of authenticated and crowd-sourced data for emergency response, International Workshop on Validation of Geo-Information Products for Crisis Management (VALgEO), Ispra, Italy. pp 91-99.
Quality Assurance workflow Authoring Tool (QAwAT)
Didier G. Leibovici, Julian Rosser, Mike Jackson and the COBWEB project
Nottingham Geospatial InstituteUniversity of Nottingham, UK
Email: [email protected]
Thank you!