data science powered apps for internet of things
TRANSCRIPT
1© Copyright 2016 Pivotal. All rights reserved.
Data Science-Powered Apps for the Internet of ThingsChris Rawles1 and Jarrod Vawdrey2
1. Sr. Data Scientist in New York, New York2. Sr. Data Scientist in Atlanta, Georgia
2© Copyright 2016 Pivotal. All rights reserved.
Today’s talk1. A real-time data science app
A. The app: a live demonstrationB. How can a data scientist build a data science application?C. Revisiting the app
2. Generalizing the framework: Solving new data science challengesA. Internet of Things – Creating a smart app to prevent oil spill disastersB. Financial data - How can retail banks influence their cardholders’
behavior?
3© Copyright 2016 Pivotal. All rights reserved.
Today’s talk1. A real-time data science app
A. The app: a live demonstrationB. How can a data scientist build a data science application?C. Revisiting the app
2. Generalizing the framework: Solving new data science challengesA. Internet of Things – Creating a smart app to prevent oil spill disastersB. Financial data - How can retail banks influence their cardholders’
behavior?
5© Copyright 2016 Pivotal. All rights reserved.
Today’s talk1. A real-time data science app
A. The app: a live demonstrationB. How can a data scientist build a data science application?C. Revisiting the app
2. Generalizing the framework: Solving new data science challengesA. Internet of Things – creating a smart appB. Financial data - How can retail banks influence their cardholders’
behavior?
6© Copyright 2016 Pivotal. All rights reserved.
Trainingapp
API Call
Model Scoring asa service
API Call
Model Training asa service
Sensorapp
Scoringapp
Dashboardapp
Data science workflow: Movement classification
1. Sensor + Dashboard2. Redis3. Training app4. Scoring app
7© Copyright 2016 Pivotal. All rights reserved.
here is my source coderun it on the cloud for me
- Onsi Fakhouri@onsijoe
i do not care how
8© Copyright 2016 Pivotal. All rights reserved.
cf push CF determines app type (Java, Python, Ruby, …) Installs necessary environment Provisions and binds data services Creates domain, routing, and load balancing Continual app health checks and restarts
9© Copyright 2016 Pivotal. All rights reserved.
Data ingestion: Accelerometric data
Accelerometric data streamed from mobile phone at 15 Hz (15x / second)
Other sensor data: gyroscopic data, magnetometer data, lon/lat, etc.
Accelerometer axes
10© Copyright 2016 Pivotal. All rights reserved.
For real-time applications, low-latency data ingestion into the data store is essential
WebSocket protocol - socket.io– Mobile phone Webserver– Webserver Dashboard
socket.io redis
Data ingestion
Trainingapp
Sensorapp
11© Copyright 2016 Pivotal. All rights reserved.
Data storage
We are using a redis store for: – Storing training data– Model persistence– Storing a micro-batch of scoring data
Other storage systems include GemFire, HAWQ/Hadoop, Greenplum Database, PostgreSQL, …
12© Copyright 2016 Pivotal. All rights reserved.
Modeling
Scalable machine learning applications in Pivotal Cloud Foundry
1. Training app2. Scoring app
13© Copyright 2016 Pivotal. All rights reserved.
Modeling – Training app Goal: build a data-driven model that learns accelerometric
motions associated with each activity
Feature Engineering
• Time-domain transformations
• Fast Fourier Transform analysis
Machine Learning Classification Model
• Random Forest Model using 2 second time windows (30 samples)
…
Training data
Trainedmodel
14© Copyright 2016 Pivotal. All rights reserved.
Model building 20 seconds per
training activity Two second moving
window on training data
Features: time-domain summary statistics and Fourier transform coefficients
15© Copyright 2016 Pivotal. All rights reserved.
Model training approaches1. Near-real-time model training
– Use small batches to train model
2. Real-time model training– Online machine learning algorithm : continually update model
using each new data point
3. Offline model training– Build a model offline using batches – Useful for models requiring finer model tuning and calibration
16© Copyright 2016 Pivotal. All rights reserved.
Feature Engineering
• Time-domain transformations
• Fast Fourier Transform analysis
Machine Learning Classification Model
• Random Forest Model using 2 second time windows (30 samples)
Trained model
Streaming input window
ModelPrediction
API Call
Modelprediction
PCF App:Scoring app• Real-time model scoring• The dashboard initiates a request via
an API call and receives a model prediction
{ "channel": "1234", "label": ”walking", "label_value": 0.746 }
17© Copyright 2016 Pivotal. All rights reserved.
1. Application auto-scaling– As the data grows, the model scales
2. Application autonomy– The model application is independent of other applications = faster
development iterations– Faster development = rapid feedback loop
3. Multiple applications can access model scoring app
Operationalizing scalable data science applications
Model scoring as a service
Why?
18© Copyright 2016 Pivotal. All rights reserved.
Today’s talk1. A real-time data science app
A. The app: a live demonstrationB. How can a data scientist build a data science application?C. Revisiting the app
2. Generalizing the framework: Solving new data science challengesA. Internet of Things – creating a smart appB. Financial data - How can retail banks influence their cardholders’
behavior?
20© Copyright 2016 Pivotal. All rights reserved.
Today’s talk1. A real-time data science app
A. The app: a live demonstrationB. How can a data scientist build a data science application?C. Revisiting the app
2. Generalizing the framework: Solving new data science challengesA. Internet of Things – Creating a smart app to prevent oil spill disastersB. Financial data - How can retail banks influence their cardholders’
behavior?
21© Copyright 2016 Pivotal. All rights reserved.
Gene Sequencing
Smart GridsCOST TO SEQUENCE ONE GENOMEHAS FALLEN FROM
$100M IN 2001
TO $10K IN 2011TO $1K IN 2014
READING SMART METERSEVERY 15 MINUTES IS3000X MOREDATA INTENSIVE
Stock Market
Social Media
FACEBOOK UPLOADS250 MILLIONPHOTOS EACH DAY
In all industries billions of data points represent opportunities for the Internet of Things
Oil Exploration
Video Surveillance
OIL RIGS GENERATE25000DATA POINTS PER SECOND
Medical Imaging
Mobile Sensors
22© Copyright 2016 Pivotal. All rights reserved.
How can we use datato help prevent
accidents like the Macondo Disaster ?
23© Copyright 2016 Pivotal. All rights reserved. 23© Copyright 2016 Pivotal. All rights reserved.
…by creating a Smart Application
24© Copyright 2016 Pivotal. All rights reserved.
Trainingapp
API Call
Model Scoring asa service
API Call
Model Training asa service
Sensorapp
Scoringapp
Dashboardapp
Data science workflow: Movement classification
25© Copyright 2016 Pivotal. All rights reserved.
Trainingapp
API Call
Model Scoring asa service
API Call
Model Training asa service
Sensorapp
Scoringapp
Dashboardapp
Data science workflow: Creating a smart app to prevent oil spill disasters • Alert operator
• Send signal to control system to change operating parameters
• Replace old machinery• Shut down plant
26© Copyright 2016 Pivotal. All rights reserved.
Trainingapp
API Call
Model Scoring asa service
API Call
Model Training asa service
Sensorapp
Scoringapp
Dashboardapp
Data science workflow: How can retail banks influence their cardholders’ behavior? • Provide customized services
and promotions• Next best offer• Characterize and improve
customer satisfaction