semantics-empowered approaches to big data processing for physical-cyber-social applications
DESCRIPTION
Presentation at the AAAI 2013 Fall Symposium on Semantics for Big Data, Arlington, Virginia, November 15-17, 2013 Additional related material at: http://wiki.knoesis.org/index.php/Smart_Data Related paper at: http://www.knoesis.org/library/resource.php?id=1903 Abstract: We discuss the nature of Big Data and address the role of semantics in analyzing and processing Big Data that arises in the context of Physical-Cyber-Social Systems. We organize our research around the five V's of Big Data, where four of the Vs are harnessed to produce the fifth V - value. To handle the challenge of Volume, we advocate semantic perception that can convert low-level observational data to higher-level abstractions more suitable for decision-making. To handle the challenge of Variety, we resort to the use of semantic models and annotations of data so that much of the intelligent processing can be done at a level independent of heterogeneity of data formats and media. To handle the challenge of Velocity, we seek to use continuous semantics capability to dynamically create event or situation specific models and recognize new concepts, entities and facts. To handle Veracity, we explore the formalization of trust models and approaches to glean trustworthiness. The above four Vs of Big Data are harnessed by the semantics-empowered analytics to derive Value for supporting practical applications transcending physical-cyber-social continuum.TRANSCRIPT
![Page 1: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/1.jpg)
Semantics-empowered Big Data Processing for PCS ApplicationsKrishnaprasad Thirunarayan (T. K. Prasad) and Amit Sheth
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing
Wright State University, Dayton, OH-45435
![Page 2: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/2.jpg)
Prasad 2
Outline
• 5 V’s of Big Data Research
• Semantic Perception for Scalability
• Lightweight semantics to manage heterogeneity – Cost-benefit trade-off and continuum
• Hybrid Knowledge Representation and Reasoning– Anomaly, Correlation, Causation
11/15/2013
![Page 3: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/3.jpg)
Prasad 3
5V’s of Big Data Research
Volume
Velocity
Variety
Veracity
Value
11/15/2013
Big Data => Smart Data
![Page 4: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/4.jpg)
Prasad 4
Volume : Assorted Examples
• 25+ billion sensors have been deployed.
• About 250TB of sensor data are generated for a NY-LA flight on Boeing 737.
• Parkinson disease dataset that tracked 16 patients with mobile phone using 7 sensors over 8 weeks is 12GB.
Check engine light analogy11/15/2013
![Page 5: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/5.jpg)
Prasad 5
Volume : Semantic Perception
• Abstracting machine-sensed data – E.g., fine-grained to coarse-grained– E.g., average, peak, rate of change
• Extracting human-comprehensible features/entities• Machine perception
– Derive conclusions using domain models
and hybrid abductive/deductive reasoning
Goal: Human accessible situational awareness and actionable intelligence for decision making
11/15/2013
![Page 6: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/6.jpg)
Prasad 6
Weather Use Case
• Machine-sensed phenomenon– temperature, precipitation, humidity, wind speed, etc.
• Human perceived features– blizzard, flurry, rain storm, clear, etc.– categories of hurricanes (SSHWS)
• Machine perception– Using domain models from NOAA
• Ultimately, generate weather alerts …
11/15/2013
![Page 7: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/7.jpg)
Prasad 7
Parkinson’s Disease Use Case
• Data from machine-sensors– accelerometer, GPS, compass, microphone, etc.
• Human perceived features– tremors, walking style, balance, slurred speech, etc.
• Machine perception– Using domain models to be created to diagnose and
monitor disease progression
• Ultimately, recommend options to control chronic conditions …
11/15/2013
![Page 8: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/8.jpg)
Prasad 8
Heart Failure Use Case
• Machine-sensed data – weight, heart rate, blood pressure, oxygen level, etc.
• Human perceived features– Risk-level for hospital readmission of CHF/ADHR patient
• Machine perception– Using domain models to be created to monitor heart
condition of a cardiac patient post hospital discharge
• Ultimately, recommend treatments to reduce preventable hospital readmissions …
11/15/2013
![Page 9: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/9.jpg)
Prasad 9
Asthma Use Case
• Data from machine-sensors– Environmental sensors, physiological sensors, etc.
• Human perceived features– Asthma severity gleaned from frequency of asthma
attacks, wheezing, coughing, sleeplessness, etc.
• Machine perception– Using domain models to be created to monitor asthma
patients and their surroundings
• Ultimately, recommend prevention and control options …
11/15/2013
![Page 10: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/10.jpg)
Prasad 10
Traffic Use Case
• Data from machine-sensors, social media stream, and planned event schedules– Traffic flow sensors : link speed, link volume, Event-
specific tweets, etc.
• Human perceived features– traffic delays and congestion, etc.
• Machine perception– Using domain models to be created to understand traffic
patterns in response to events
• Ultimately, recommend traffic management options …
11/15/2013
![Page 11: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/11.jpg)
Slow moving traffic
Link Description
Scheduled Event
Scheduled Event
511.org
511.org
Schedule Information
511.org
Traffic Monitoring
11
Heterogeneity in a Physical-Cyber-Social System
![Page 12: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/12.jpg)
Prasad 12
Volume with a Twist
Resource-constrained reasoning on mobile-devices
Goal: Boolean encodings to ensure feasibility, efficiency, and economy
11/15/2013
![Page 13: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/13.jpg)
13* based on Neisser’s cognitive model of perception
ObserveProperty
PerceiveFeature
Explanation
Discrimination
1
2
Perception Cycle* that exploits background knowledge / domain models
Abstracting raw data for human
comprehension
Focus generation for disambiguation and action(incl. human in the loop)
Prior Knowledge
![Page 14: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/14.jpg)
Virtues of Our Approach to Semantic Perception
Blends simplicity, effectiveness, and scalability.
• Declarative specification of explanation and discrimination;
• With applications (e.g., to healthcare) that are of contemporary relevance and interdisciplinary;
• Using encodings/algorithms that are significant (asymptotic order of magnitude gain) and necessary (“tractable” due to time/memory reduction for typical problem sizes); and
• Prototyped using extant PCs and mobile devices.
![Page 15: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/15.jpg)
O(n3) < x < O(n4) O(n)
Efficiency Improvement
• Problem size increased from 10’s to 1000’s of nodes• Time reduced from minutes to milliseconds• Complexity growth reduced from polynomial to
linear
Evaluation on a mobile device
15
![Page 16: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/16.jpg)
Prasad 16
Volume and Velocity
• Lightweight semantics-based Adaptive/Continuous Filtering
E.g.,: Track evolution of crowd-sourced and verified Wikipedia event pages for relevance ranking of Twitter hashtags in Disaster response use-case
• Building domain models dynamically
11/15/2013
![Page 17: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/17.jpg)
Heliopolis is a suburb of
Cairo.
Dynamic Model Creation
Continuous Semantics 17
![Page 18: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/18.jpg)
Variety
Syntactic and semantic heterogeneity • in textual and sensor data, • in (legacy) materials data• in (long tail) geosciences data
Idea: Semantics-empowered integration
11/15/2013 Prasad 18
![Page 19: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/19.jpg)
Prasad 19
Variety (What?): Materials/Geosciences Use Case
• Structured Data (e.g., relational)
• Semi-structured, Heterogeneous Documents (e.g., Publications and technical specs, which usually include text, numerics, maps and images)
• Tabular data (e.g., ad hoc spreadsheets and complex tables incorporating “irregular” entries)
11/15/2013
![Page 20: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/20.jpg)
20
Variety (How?/Why?): Granularity of Semantics & Applications
• Lightweight semantics: File and document-level annotation to enable discovery and sharing
• Richer semantics: Data-level annotation and extraction for semantic search and summarization
• Fine-grained semantics: Data integration, interoperability and reasoning in Linked Open Data
Cost-benefit trade-off and continuum
![Page 21: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/21.jpg)
Prasad 21
Challenges Associated with Typical Spreadsheet/Table
• Meant for human consumption • Irregular :
– Not simple rectangular grid• Heterogeneous
– All rows not interpreted similarly• Complex
– Meaning of each row and each column context dependent • Footnotes modify meaning of entries (esp. in materials
and process specifications)
11/15/2013
![Page 22: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/22.jpg)
22
![Page 23: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/23.jpg)
Prasad 23
Practical Semi-Automatic Content Extraction
• DESIGN: Develop regular data structures that can be used to formalize tabular information.– Provide a natural expression of data – Provide semantics to data, thereby removing potential
ambiguities– Enable automatic translation
• USE: Manual population of regular tables and automatic translation into LOD
11/15/2013
![Page 24: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/24.jpg)
Variety (What?) : Sensor Data Use Case
Develop/learn domain models to exploit complementary and corroborative information
• To relate patterns in multimodal data to “situation”
• To integrate machine sensed and human sensed data
11/15/2013 Prasad 24
![Page 25: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/25.jpg)
Variety: Hybrid KRR
Blending data-driven models with declarative knowledge – Data-driven: Bottom-up, correlation-based,
statistical– Declarative: Top-down, causal/taxonomical,
logical– Refine structure to better estimate parameters
E.g., Traffic Analytics using PGMs + KBs
11/15/2013 Prasad 25
![Page 26: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/26.jpg)
Variety (Why?): Hybrid KRR
Data can help compensate for our overconfidence in our own intuitions and reduce the extent to which our desires distort our perceptions.
-- David Brooks of New York Times
However, inferred correlations require clear justification that they are not coincidental, to inspire confidence.
11/15/2013 Prasad 26
![Page 27: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/27.jpg)
• Correlations due to common cause or origin
• Coincidental due to data skew or misrepresentation
• Coincidental new discovery
• Strong correlation vs causation
• Anomalous and accidental
• Correlation turning into causations
Correlations vs Causation vs Anomalies
11/15/2013 Prasad 27
![Page 28: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/28.jpg)
• Correlations Due to common cause or origin– E.g., Planets: Copernicus > Kepler > Newton > Einstein
• Coincidental due to data skew or misrepresentation – E.g., Tall policy claims made by politicians!
• Coincidental new discovery– E.g., Hurricanes and Strawberry Pop-Tarts Sales
• Strong correlation vs causation– E.g., Spicy foods vs Helicobacter Pyroli : Stomach Ulcers
• Anomalous and accidental– E.g., CO2 levels and Obesity
• Correlation turning into causations– E.g., Pavlovian learning: conditional reflex
Correlations vs Causation vs Anomalies
11/15/2013 Prasad 28
![Page 29: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/29.jpg)
• Correlations Due to common cause or origin– E.g., Planets: Copernicus > Kepler > Newton > Einstein
• Coincidental due to data skew or misrepresentation – E.g., Tall policy claims made by politicians!
• Coincidental new discovery– E.g., Hurricanes and Strawberry Pop-Tarts Sales
• Strong correlation vs causation– E.g., Spicy foods vs Helicobacter Pyroli : Stomach Ulcers
• Anomalous and accidental– E.g., CO2 levels and Obesity
• Correlation turning into causations– E.g., Pavlovian learning: conditional reflex
Paradoxes: The Seeds of Progress
Correlations vs Causation vs Anomalies
11/15/2013 Prasad 29
![Page 30: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/30.jpg)
Veracity
Lot of existing work on Trust ontologies, metrics and models, and on Provenance tracking
• Homogeneous data: Statistical techniques• Heterogeneous data: Semantic models
Open Problem: Develop semantics of trust using expressive frameworks that are both declarative and computational • To make explicit all aspects that go into trust
formation, to inspire confidence in inferences
11/15/2013 Prasad 30
![Page 31: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/31.jpg)
Veracity
Machine sensing: objective, quantitative,
but prone to environmental effects, battery life, …
Human sensing: subjective, qualitative,
but prone to bias, perceptual errors, rumors, …
Open problem: Improving trustworthiness by combining machine sensing and human sensing– E.g., 2002 Überlingen mid-air collision :Pilot incorrectly
using Traffic controller advice over electronic TCAS system recommendation
11/15/2013 Prasad 31
![Page 32: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/32.jpg)
(More on) Value
Learning domain models from “big data” for prediction
E.g., Harnessing Twitter "Big Data" for Automatic Emotion Identification
Idea: Exploit “emotion-hashtagged” tweets as training dataset
11/15/2013 Prasad 32
![Page 33: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/33.jpg)
(More on) Value
Discovering gaps and enriching domain models using data
E.g., Data driven knowledge acquisition method for domain knowledge enrichment in the healthcare
Idea: Use associations between diseases, symptoms and medications in EMR documents
11/15/2013 Prasad 33
![Page 34: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/34.jpg)
Prasad 34
Conclusions
• Glimpse of our research organized around
the 5 V’s of Big Data• Discussed role in harnessing Value
– Semantic Perception (Volume)– Continuum of Semantic models to manage
Heterogeneity (Variety)– Hybrid KRR: Probabilistic + Logical (Variety)– Continuous Semantics (Velocity)– Trust Models (Veracity)
11/15/2013
![Page 35: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.vdocuments.net/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/35.jpg)
Prasad35
thank you, and please visit us at
http://knoesis.org/
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled ComputingWright State University, Dayton, Ohio, USA
Kno.e.sis
11/15/2013
Special Thanks to: Pramod Anantharam and Cory Henson