data science powered apps for internet of things

28
1 © Copyright 2016 Pivotal. All rights reserved. Data Science-Powered Apps for the Internet of Things Chris Rawles 1 and Jarrod Vawdrey 2 1. Sr. Data Scientist in New York, New York 2. Sr. Data Scientist in Atlanta, Georgia

Upload: pivotal

Post on 13-Apr-2017

234 views

Category:

Technology


0 download

TRANSCRIPT

1© Copyright 2016 Pivotal. All rights reserved.

Data Science-Powered Apps for the Internet of ThingsChris Rawles1 and Jarrod Vawdrey2

1. Sr. Data Scientist in New York, New York2. Sr. Data Scientist in Atlanta, Georgia

2© Copyright 2016 Pivotal. All rights reserved.

Today’s talk1. A real-time data science app

A. The app: a live demonstrationB. How can a data scientist build a data science application?C. Revisiting the app

2. Generalizing the framework: Solving new data science challengesA. Internet of Things – Creating a smart app to prevent oil spill disastersB. Financial data - How can retail banks influence their cardholders’

behavior?

3© Copyright 2016 Pivotal. All rights reserved.

Today’s talk1. A real-time data science app

A. The app: a live demonstrationB. How can a data scientist build a data science application?C. Revisiting the app

2. Generalizing the framework: Solving new data science challengesA. Internet of Things – Creating a smart app to prevent oil spill disastersB. Financial data - How can retail banks influence their cardholders’

behavior?

4© Copyright 2016 Pivotal. All rights reserved.

App

5© Copyright 2016 Pivotal. All rights reserved.

Today’s talk1. A real-time data science app

A. The app: a live demonstrationB. How can a data scientist build a data science application?C. Revisiting the app

2. Generalizing the framework: Solving new data science challengesA. Internet of Things – creating a smart appB. Financial data - How can retail banks influence their cardholders’

behavior?

6© Copyright 2016 Pivotal. All rights reserved.

Trainingapp

API Call

Model Scoring asa service

API Call

Model Training asa service

Sensorapp

Scoringapp

Dashboardapp

Data science workflow: Movement classification

1. Sensor + Dashboard2. Redis3. Training app4. Scoring app

7© Copyright 2016 Pivotal. All rights reserved.

here is my source coderun it on the cloud for me

- Onsi Fakhouri@onsijoe

i do not care how

8© Copyright 2016 Pivotal. All rights reserved.

cf push CF determines app type (Java, Python, Ruby, …) Installs necessary environment Provisions and binds data services Creates domain, routing, and load balancing Continual app health checks and restarts

9© Copyright 2016 Pivotal. All rights reserved.

Data ingestion: Accelerometric data

Accelerometric data streamed from mobile phone at 15 Hz (15x / second)

Other sensor data: gyroscopic data, magnetometer data, lon/lat, etc.

Accelerometer axes

10© Copyright 2016 Pivotal. All rights reserved.

For real-time applications, low-latency data ingestion into the data store is essential

WebSocket protocol - socket.io– Mobile phone Webserver– Webserver Dashboard

socket.io redis

Data ingestion

Trainingapp

Sensorapp

11© Copyright 2016 Pivotal. All rights reserved.

Data storage

We are using a redis store for: – Storing training data– Model persistence– Storing a micro-batch of scoring data

Other storage systems include GemFire, HAWQ/Hadoop, Greenplum Database, PostgreSQL, …

12© Copyright 2016 Pivotal. All rights reserved.

Modeling

Scalable machine learning applications in Pivotal Cloud Foundry

1. Training app2. Scoring app

13© Copyright 2016 Pivotal. All rights reserved.

Modeling – Training app Goal: build a data-driven model that learns accelerometric

motions associated with each activity

Feature Engineering

• Time-domain transformations

• Fast Fourier Transform analysis

Machine Learning Classification Model

• Random Forest Model using 2 second time windows (30 samples)

Training data

Trainedmodel

14© Copyright 2016 Pivotal. All rights reserved.

Model building 20 seconds per

training activity Two second moving

window on training data

Features: time-domain summary statistics and Fourier transform coefficients

15© Copyright 2016 Pivotal. All rights reserved.

Model training approaches1. Near-real-time model training

– Use small batches to train model

2. Real-time model training– Online machine learning algorithm : continually update model

using each new data point

3. Offline model training– Build a model offline using batches – Useful for models requiring finer model tuning and calibration

16© Copyright 2016 Pivotal. All rights reserved.

Feature Engineering

• Time-domain transformations

• Fast Fourier Transform analysis

Machine Learning Classification Model

• Random Forest Model using 2 second time windows (30 samples)

Trained model

Streaming input window

ModelPrediction

API Call

Modelprediction

PCF App:Scoring app• Real-time model scoring• The dashboard initiates a request via

an API call and receives a model prediction

{ "channel": "1234", "label": ”walking", "label_value": 0.746 }

17© Copyright 2016 Pivotal. All rights reserved.

1. Application auto-scaling– As the data grows, the model scales

2. Application autonomy– The model application is independent of other applications = faster

development iterations– Faster development = rapid feedback loop

3. Multiple applications can access model scoring app

Operationalizing scalable data science applications

Model scoring as a service

Why?

18© Copyright 2016 Pivotal. All rights reserved.

Today’s talk1. A real-time data science app

A. The app: a live demonstrationB. How can a data scientist build a data science application?C. Revisiting the app

2. Generalizing the framework: Solving new data science challengesA. Internet of Things – creating a smart appB. Financial data - How can retail banks influence their cardholders’

behavior?

19© Copyright 2016 Pivotal. All rights reserved.

App

20© Copyright 2016 Pivotal. All rights reserved.

Today’s talk1. A real-time data science app

A. The app: a live demonstrationB. How can a data scientist build a data science application?C. Revisiting the app

2. Generalizing the framework: Solving new data science challengesA. Internet of Things – Creating a smart app to prevent oil spill disastersB. Financial data - How can retail banks influence their cardholders’

behavior?

21© Copyright 2016 Pivotal. All rights reserved.

Gene Sequencing

Smart GridsCOST TO SEQUENCE ONE GENOMEHAS FALLEN FROM

$100M IN 2001

TO $10K IN 2011TO $1K IN 2014

READING SMART METERSEVERY 15 MINUTES IS3000X MOREDATA INTENSIVE

Stock Market

Social Media

FACEBOOK UPLOADS250 MILLIONPHOTOS EACH DAY

In all industries billions of data points represent opportunities for the Internet of Things

Oil Exploration

Video Surveillance

OIL RIGS GENERATE25000DATA POINTS PER SECOND

Medical Imaging

Mobile Sensors

22© Copyright 2016 Pivotal. All rights reserved.

How can we use datato help prevent

accidents like the Macondo Disaster ?

23© Copyright 2016 Pivotal. All rights reserved. 23© Copyright 2016 Pivotal. All rights reserved.

…by creating a Smart Application

24© Copyright 2016 Pivotal. All rights reserved.

Trainingapp

API Call

Model Scoring asa service

API Call

Model Training asa service

Sensorapp

Scoringapp

Dashboardapp

Data science workflow: Movement classification

25© Copyright 2016 Pivotal. All rights reserved.

Trainingapp

API Call

Model Scoring asa service

API Call

Model Training asa service

Sensorapp

Scoringapp

Dashboardapp

Data science workflow: Creating a smart app to prevent oil spill disasters • Alert operator

• Send signal to control system to change operating parameters

• Replace old machinery• Shut down plant

26© Copyright 2016 Pivotal. All rights reserved.

Trainingapp

API Call

Model Scoring asa service

API Call

Model Training asa service

Sensorapp

Scoringapp

Dashboardapp

Data science workflow: How can retail banks influence their cardholders’ behavior? • Provide customized services

and promotions• Next best offer• Characterize and improve

customer satisfaction

27© Copyright 2016 Pivotal. All rights reserved.

Thank you

Questions and comments

[email protected]

28© Copyright 2016 Pivotal. All rights reserved.