semantic water quality portal jin guang zheng and ping wang tetherless world constellation
DESCRIPTION
Introduction Semantic Water Quality Project –Continuing swqp project from last semester’s Semantic Escience Class –Goal: Help citizens to identify polluted water sources, and potential pollution sources, therefore, alleviating/controlling adverse health effects. –Credits: Evan, Theodora, Ping, JinTRANSCRIPT
![Page 1: Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation](https://reader036.vdocuments.net/reader036/viewer/2022062317/5a4d1b8a7f8b9ab0599be444/html5/thumbnails/1.jpg)
Semantic Water Quality Portal
Jin Guang Zheng and Ping WangTetherless World Constellation
![Page 2: Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation](https://reader036.vdocuments.net/reader036/viewer/2022062317/5a4d1b8a7f8b9ab0599be444/html5/thumbnails/2.jpg)
Outline
• Introduction• Methods
– System Architecture– Ontology– Provenance– Visualization
• Demo• Claims• Conclusion
– Improvements– Future Work– Contributions
![Page 3: Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation](https://reader036.vdocuments.net/reader036/viewer/2022062317/5a4d1b8a7f8b9ab0599be444/html5/thumbnails/3.jpg)
Introduction
• Semantic Water Quality Project– Continuing swqp project from last semester
’s Semantic Escience Class– Goal: Help citizens to identify polluted
water sources, and potential pollution sources, therefore, alleviating/controlling adverse health effects.
– Credits: Evan, Theodora, Ping, Jin
![Page 4: Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation](https://reader036.vdocuments.net/reader036/viewer/2022062317/5a4d1b8a7f8b9ab0599be444/html5/thumbnails/4.jpg)
Motivation Use Case
• Use Case:– Children start getting sick: vomiting– Residents request authority perform
checks on the water supply.– Authority collects data from various
sources: EPA, USGS, State regulation, etc.– Authority analyze the data– Authority reports the analyzed result– And More …
![Page 5: Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation](https://reader036.vdocuments.net/reader036/viewer/2022062317/5a4d1b8a7f8b9ab0599be444/html5/thumbnails/5.jpg)
SWQP
• Semantic Water Quality Portal can ease the process:– Integrate data from various sources– Perform automatic analysis(reasoning) on
polluted water sources and possible sources of pollutants: facilities that violate regulations
– Present analyzed results in an user friendly interface.
![Page 6: Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation](https://reader036.vdocuments.net/reader036/viewer/2022062317/5a4d1b8a7f8b9ab0599be444/html5/thumbnails/6.jpg)
Research Question
How can we use semantic web technology to solve environmental related problems?
![Page 7: Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation](https://reader036.vdocuments.net/reader036/viewer/2022062317/5a4d1b8a7f8b9ab0599be444/html5/thumbnails/7.jpg)
System Architecture
![Page 8: Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation](https://reader036.vdocuments.net/reader036/viewer/2022062317/5a4d1b8a7f8b9ab0599be444/html5/thumbnails/8.jpg)
Ontology
• Two types of Ontology:– Core Ontology
• Encode main inference, reasoning rules– Regulation Ontologies
• Encode regulations from different states
• Reasoning Example:– “any water source has a measurement over certain threshold
is a polluted water source” (core ontology)– “any measurement has value 0.01 mg/l of Arsenic is a
threshold” (regulation ontology)– “any water source contains 0.01 mg/l of Arsenic is a polluted
water source.” (inferred from the above rules)
![Page 9: Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation](https://reader036.vdocuments.net/reader036/viewer/2022062317/5a4d1b8a7f8b9ab0599be444/html5/thumbnails/9.jpg)
Provenance
• Data Level Provenance– Where are the original data?– Provide provenance based query.
• Application Level Provenance– What data did we use in the analysis and
reasoning step?– Provide explanation to the user when a
water source is marked as polluted water source
![Page 10: Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation](https://reader036.vdocuments.net/reader036/viewer/2022062317/5a4d1b8a7f8b9ab0599be444/html5/thumbnails/10.jpg)
Visualization
• Map Visualization:– Presents analyzed results with Google Map
• Polluted Water Source, Polluting Facility– Presents explanation on why a water source
is marked as polluted – Use “Facet” type filter to select type of data
• Trend Visualization:– Presents data in trend visualization for user
to explore and analyze the data.
![Page 11: Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation](https://reader036.vdocuments.net/reader036/viewer/2022062317/5a4d1b8a7f8b9ab0599be444/html5/thumbnails/11.jpg)
Demo Time
![Page 12: Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation](https://reader036.vdocuments.net/reader036/viewer/2022062317/5a4d1b8a7f8b9ab0599be444/html5/thumbnails/12.jpg)
Claim I - Problem
• Problem:– Data are collected from various sources:
• EPA, USGS, etc.– Heterogeneous Data:
• Difficult to perform query• Data are stored using different schema, and the
semantics of the terms in different schema can be very different from each other
![Page 13: Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation](https://reader036.vdocuments.net/reader036/viewer/2022062317/5a4d1b8a7f8b9ab0599be444/html5/thumbnails/13.jpg)
Claim I
Semantic Data Integration helps SWQP to integrate data from various sources, eases the process of future data integration, and make it easier to use existing reasoners to perform reasoning.
![Page 14: Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation](https://reader036.vdocuments.net/reader036/viewer/2022062317/5a4d1b8a7f8b9ab0599be444/html5/thumbnails/14.jpg)
Claim I Example• Various Data Sources:
– Convert into RDF, and load to triple store.– Use Sparql to query data
• Use EPA ontology as central schema to encode converted data
• Easier for future data integration:– Easier to accommodate schema changes: add
equivalent statements, new properties, new classes etc.
• Easier to use existing reasoners:– Jena, Pellet, etc.
![Page 15: Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation](https://reader036.vdocuments.net/reader036/viewer/2022062317/5a4d1b8a7f8b9ab0599be444/html5/thumbnails/15.jpg)
Claim II - Problem
• Problem:– Analysis process of identifying a water
source is polluted can be complex and time consuming.
• Example:– 10 contaminants in a water source.– Each contaminant has been measured 10
times.– There are 50 regulation limits.
![Page 16: Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation](https://reader036.vdocuments.net/reader036/viewer/2022062317/5a4d1b8a7f8b9ab0599be444/html5/thumbnails/16.jpg)
Claim II
Automatic inference and reasoning supported by semantic web technologies helps SWQP to perform automatic analysis on water qualities etc.
![Page 17: Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation](https://reader036.vdocuments.net/reader036/viewer/2022062317/5a4d1b8a7f8b9ab0599be444/html5/thumbnails/17.jpg)
Claim II Example
• Reasoning and Inference:– Identify measured object is a water source– Find all measurements for the water source– Validate measurement is measuring water
contaminants.– Perform reasoning on whether the
measurement exceeds threshold• What element? What Unit? What Value?
– Identify the type of water source: polluted?
![Page 18: Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation](https://reader036.vdocuments.net/reader036/viewer/2022062317/5a4d1b8a7f8b9ab0599be444/html5/thumbnails/18.jpg)
Claim III - Problem
• Problem:– User may not trust the analyzed result
presented by SWQP.• I don’t think Hudson river has been polluted.
– User may trust data from certain sources only.
• I don’t trust the data collected by a student for his class project.
![Page 19: Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation](https://reader036.vdocuments.net/reader036/viewer/2022062317/5a4d1b8a7f8b9ab0599be444/html5/thumbnails/19.jpg)
Claim III
Provenance information encoded in semantic web technology helps SWQP solve trust related problems.
![Page 20: Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation](https://reader036.vdocuments.net/reader036/viewer/2022062317/5a4d1b8a7f8b9ab0599be444/html5/thumbnails/20.jpg)
Claim III Example
• Data Source Based Query:– User can select what data to be
analyzed.– Data Source Provenance
• Explanation on polluted water source:– Pop out window to show the regulation used and measured value
![Page 21: Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation](https://reader036.vdocuments.net/reader036/viewer/2022062317/5a4d1b8a7f8b9ab0599be444/html5/thumbnails/21.jpg)
Conclusion
![Page 22: Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation](https://reader036.vdocuments.net/reader036/viewer/2022062317/5a4d1b8a7f8b9ab0599be444/html5/thumbnails/22.jpg)
Improvements
• More data:– Regulation data from CA, NY, MASS, EPA– EPA, USGS data for multiple states– Provenance data are captured for both regulation data, and
EPA, USGS data.• More Features:
– Provenance based data query and analysis– Trend visualization
• Speed:– ~ 15 – 30 seconds.– Main draw-back now is real-time inference and reasoning
and the large size of the data
![Page 23: Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation](https://reader036.vdocuments.net/reader036/viewer/2022062317/5a4d1b8a7f8b9ab0599be444/html5/thumbnails/23.jpg)
Future Work
• Provenance:– support building, linking and displaying proof traces that track
how the answers are derived from source data.
• Health Related Reasoning:– Model the effects of drinking polluted water source.
• Identify which polluted water source cause people vomit more quickly.
• Flood Reasoning:– Model Flood
• Identify which water sources will flood with high probability• Identify possible effects of flood w.r.t water quality
• Other Work:– Pollutant based query: e.g. interested in Arsenic
![Page 24: Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation](https://reader036.vdocuments.net/reader036/viewer/2022062317/5a4d1b8a7f8b9ab0599be444/html5/thumbnails/24.jpg)
Contributions• Ping:
– Use Tim’s converter to convert EPA and USGS Data.– Preprocess regulation data to CSV format– Implement data visualization part of the project– Write part of this final class write up, and present the
visualization part of the demo.
• Jin:– Write script to convert data to RDF format encoded use Ontology– Design Ontology to support automatic reasoning and inference– Re-implement Jena-Pellet based backend reasoner.– Class related works: since this project is Ping’s out of class
project, I am responsible for most of the project related write up, presentation, etc.
![Page 25: Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation](https://reader036.vdocuments.net/reader036/viewer/2022062317/5a4d1b8a7f8b9ab0599be444/html5/thumbnails/25.jpg)
Questions
Thank you for your attention!