apachecon big data 2015 - stock prediction.key

Post on 13-Feb-2017

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

1 Pivotal Confidential–Internal Use Only

2

William Markito@william_markito

Fred Melo@fredmelo_br

(incubating)

Implementing a highly scalable Stock prediction system with Apache Geode,

Spring XD and Spark MLib

About us

Fred Melo

Technical Director for Data

fmelo@pivotal.io

@fredmelo_br

William Markito

Enterprise Architect for GemFire

wmarkito@pivotal.io

@william_markito

A Simple Example

Data SourcesLook for patterns

Forecast

"Smart System"

Applicability

Smart System

Learns with HISTORICAL TRENDS

Live data becomes historical over time

Real-Time

Evaluates LIVE DATA

Historical

What do we want to build?

Trading Data

“According to historical trends, there’s an 80% chance this stock prices might go down within the next few minutes"

"How were the technical indicator readings when the latest price drops happened? "

Live Data

Data Temperature

Hot

Cold

Apache Hawq

Apache Geode / GemFire1- Live data is ingested into the grid

3 - Results are pushed immediately to deployed applications

4 - “Hot" data ages, becoming part of the historical dataset

5 - Re-training triggered, ML model updated.

Spring XD

2 - Trained ML model compares new data to historical patterns

The Machine Learning Pipeline data flow

Spring XD

Machine Learning model

Live Data

Data Temperature

Hot

Warm

Apache Geode / GemFire1- Live data is ingested into the grid

3 - Results are pushed immediately to deployed applications

Machine Learning model

2 - Trained ML model compares new data to historical patterns

The Machine Learning Pipeline data flow

5 - Re-training triggered, ML model updated.

Spring XD

Simplified Model

Spring XD

Transform Sink

SpringXD

Extensible Open-Source Fault-Tolerant Horizontally Scalable Cloud-Native

Machine Learning

Enrich Filter

Split

Dashboard

Indicators

1

2

Predict

3

Real data

Simulator

/Stocks

/TechIndicators

/Predictions

Too complex?? Eating it in small bites…

SpringXD GemFire

Transform Sink

SpringXD

Extensible Open-Source Fault-Tolerant Horizontally Scalable Cloud-Native

Machine Learning

Enrich Filter

Split

Dashboard

Indicators

1

2

Predict

3

Real data

Simulator

/Stocks

/TechIndicators

/Predictions

/Stocks

/TechIndicators

/Predictions

• Cache • Configurable through XML, ,Java

• Region • Distributed j.u.Map on steroids • Highly available, redundant

• Member • Locator, Server, Client

• Callbacks • Listener, Writer, AsyncEventListener, Parallel/Serial

Apache Geode Concepts

Apache Geode HA and Fail-Tolerance

Transform Sink

SpringXD

Extensible Open-Source Fault-Tolerant Horizontally Scalable Cloud-Native

Machine Learning

Enrich Filter

Split

Dashboard

Indicators

1

2

Predict

3

Real data

Simulator

/Stocks

/TechIndicators

/Predictions

Transform Sink

SpringXDEnrich Filter

Split1

2

Predict3

Streams Pipelines Sources Sinks Filters Taps

Transform Sink

SpringXD

Extensible Open-Source Fault-Tolerant Horizontally Scalable Cloud-Native

Machine Learning

Enrich Filter

Split

Dashboard

Indicators

1

2

Predict

3

Real data

Simulator

/Stocks

/TechIndicators

/Predictions

medium avg (x+1)

relative strength (x)

medium avg (x)

price(x)

Machine Learning Model (e.g. Linear Regression)

Features Label

medium avg (x+1)

relative strength (x)

medium avg (x)

price(x)

Machine Learning Model (e.g. Linear Regression)

Features Label

Demo Time

Error

https://github.com/Pivotal-Open-Source-Hub/StockInference-SparkSource code and detailed instructions available at:

22

William Markito@william_markito

Fred Melo@fredmelo_br

Follow us on Twitter!

23

1 Pivotal Confidential–Internal Use Only

top related