trends affecting system architectures: mobile …...ka;a streams akka streams … sessions streams...

Post on 25-May-2020

10 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Dean Wampler, Ph.D. dean@lightbend.com polyglotprogramming.com/talks

Trends Affecting System Architectures: Mobile Engagement, Streaming Data, and Machine Learning

©DeanWampler&Lightbend,2007-2019 @deanwampler

lbnd.io/fast-data-ebook

Data Streaming Architectures

@deanwampler

lightbend.com/lightbend-platform @deanwampler

But first, introductions…

What we’re up to at Lightbend… lightbend.com/lightbend-pipelines-demo

What We’ll Discuss

What We’ll Discuss

• Forces driving us towards stream processing• Streaming architectures • Requirements• An example

• Mixing data science and data engineering • Other production concerns

Forces driving us towards stream processing

EnergyMedical

Finance

… and IoT

Telecom

MobileStateoftheartphone!

EnergyMedical

Finance

… and IoT

Telecom

Mobile@deanwampler

Information value has a half life;

it decays with time

EnergyMedical

Finance

… and IoT

Telecom

Mobile

Historically, software developers have downplayed the

importance of data.We can’t anymore.

@deanwampler

@deanwampler

Streaming Architectures: Requirements

@deanwampler

• Reliability - fault and “surprise” tolerant• Availability - “always on”• Low latency - for some definition of “low”• Scalability - up and down• Adaptability - ideally without restarts

@deanwampler

Reminds me of microservices

Browse

REST

AccountOrders

ShoppingCart

API Gateway

Inventory

@deanwampler

Streaming Architectures: An Example

DataCenter

Mini-batch,Batch

Spark

LowLatencyMicroservices

Ka;aStreamsAkkaStreams

Sessions

Streams

Storage

Device

Telemetry1

2. → Model Training3. ← New models

2, 3

4. ← Model Storage5. → Boot up, historical data

4, 5

6. → Data Pipeline 7. → Model Serving 8. ← Anomalies

6, 7, 8Ingest Scores9

Corrective Action

10

Spark

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

Broker

@deanwampler

Detect anomalies in IoT devices

DataCenter

Mini-batch,Batch

Spark

LowLatencyMicroservices

Ka;aStreamsAkkaStreams

Sessions

Streams

Storage

Device

Telemetry1

2. → Model Training3. ← New models

2, 3

4. ← Model Storage5. → Boot up, historical data

4, 5

6. → Data Pipeline 7. → Model Serving 8. ← Anomalies

6, 7, 8Ingest Scores9

Corrective Action

10

Spark

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

Broker

@deanwampler

Sessionmanagement,RESTmicroservices Model

Scoring

ModelTraining

Threegroupsoffunctionality

DataCenter

Mini-batch,Batch

Spark

LowLatencyMicroservices

Ka;aStreamsAkkaStreams

Sessions

Streams

Storage

Device

Telemetry1

2. → Model Training3. ← New models

2, 3

4. ← Model Storage5. → Boot up, historical data

4, 5

6. → Data Pipeline 7. → Model Serving 8. ← Anomalies

6, 7, 8Ingest Scores9

Corrective Action

10

Spark

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

Broker

@deanwampler

IngestdevicetelemetrytoKafka

DataCenter

Mini-batch,Batch

Spark

LowLatencyMicroservices

Ka;aStreamsAkkaStreams

Sessions

Streams

Storage

Device

Telemetry1

2. → Model Training3. ← New models

2, 3

4. ← Model Storage5. → Boot up, historical data

4, 5

6. → Data Pipeline 7. → Model Serving 8. ← Anomalies

6, 7, 8Ingest Scores9

Corrective Action

10

Spark

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

Broker

@deanwampler

IngestdevicetelemetrytoKafka

Analogs

At restStreaming

Microservices

Ka+aStreamsAkkaStreams

Ka#a Cluster

Broker

Mini-batch

SparkSparkFlink…

HDFS

Batch

SparkMapReduce

DataCenter

Mini-batch,Batch

Spark

LowLatencyMicroservices

Ka;aStreamsAkkaStreams

Sessions

Streams

Storage

Device

Telemetry1

2. → Model Training3. ← New models

2, 3

4. ← Model Storage5. → Boot up, historical data

4, 5

6. → Data Pipeline 7. → Model Serving 8. ← Anomalies

6, 7, 8Ingest Scores9

Corrective Action

10

Spark

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

Broker

@deanwampler

ReadtelemetryintoaperiodicSparkjobformodeltrainingtodetectanomalies

Largedatavolume,Longlatency(seconds-days)

DataCenter

Mini-batch,Batch

Spark

LowLatencyMicroservices

Ka;aStreamsAkkaStreams

Sessions

Streams

Storage

Device

Telemetry1

2. → Model Training3. ← New models

2, 3

4. ← Model Storage5. → Boot up, historical data

4, 5

6. → Data Pipeline 7. → Model Serving 8. ← Anomalies

6, 7, 8Ingest Scores9

Corrective Action

10

Spark

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

Broker

@deanwampler

UpdatedmodelparametersarewrittenbacktoKafkainanewtopic

DataCenter

Mini-batch,Batch

Spark

LowLatencyMicroservices

Ka;aStreamsAkkaStreams

Sessions

Streams

Storage

Device

Telemetry1

2. → Model Training3. ← New models

2, 3

4. ← Model Storage5. → Boot up, historical data

4, 5

6. → Data Pipeline 7. → Model Serving 8. ← Anomalies

6, 7, 8Ingest Scores9

Corrective Action

10

Spark

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

Broker

@deanwampler

Updatedmodelparametersarealsowrittentolonger-termstorage(forsystemrestarts,auditing,…)

DataCenter

Mini-batch,Batch

Spark

LowLatencyMicroservices

Ka;aStreamsAkkaStreams

Sessions

Streams

Storage

Device

Telemetry1

2. → Model Training3. ← New models

2, 3

4. ← Model Storage5. → Boot up, historical data

4, 5

6. → Data Pipeline 7. → Model Serving 8. ← Anomalies

6, 7, 8Ingest Scores9

Corrective Action

10

Spark

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

Broker

@deanwampler

Ingestthetelemetrydatatoscoreit,lookingforanomalies

DataCenter

Mini-batch,Batch

Spark

LowLatencyMicroservices

Ka;aStreamsAkkaStreams

Sessions

Streams

Storage

Device

Telemetry1

2. → Model Training3. ← New models

2, 3

4. ← Model Storage5. → Boot up, historical data

4, 5

6. → Data Pipeline 7. → Model Serving 8. ← Anomalies

6, 7, 8Ingest Scores9

Corrective Action

10

Spark

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

Broker

@deanwampler

Smalldatavolume,Lowlatency(milliseconds-…)

Ingestmodelparameterstoupdatemodelinthelow-latencymicroservicesformodelserving

DataCenter

Mini-batch,Batch

Spark

LowLatencyMicroservices

Ka;aStreamsAkkaStreams

Sessions

Streams

Storage

Device

Telemetry1

2. → Model Training3. ← New models

2, 3

4. ← Model Storage5. → Boot up, historical data

4, 5

6. → Data Pipeline 7. → Model Serving 8. ← Anomalies

6, 7, 8Ingest Scores9

Corrective Action

10

Spark

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

Broker

@deanwampler

WritethedetectedanomaliesbacktoKafkainanewtopic

DataCenter

Mini-batch,Batch

Spark

LowLatencyMicroservices

Ka;aStreamsAkkaStreams

Sessions

Streams

Storage

Device

Telemetry1

2. → Model Training3. ← New models

2, 3

4. ← Model Storage5. → Boot up, historical data

4, 5

6. → Data Pipeline 7. → Model Serving 8. ← Anomalies

6, 7, 8Ingest Scores9

Corrective Action

10

Spark

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

Broker

@deanwampler

Readanomalyinformationintomicroservicesthatmanagethedevicesremotely

DataCenter

Mini-batch,Batch

Spark

LowLatencyMicroservices

Ka;aStreamsAkkaStreams

Sessions

Streams

Storage

Device

Telemetry1

2. → Model Training3. ← New models

2, 3

4. ← Model Storage5. → Boot up, historical data

4, 5

6. → Data Pipeline 7. → Model Serving 8. ← Anomalies

6, 7, 8Ingest Scores9

Corrective Action

10

Spark

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

Broker

@deanwampler

Takecorrectiveaction,e.g.,uploadmoreinformation,downloadpatches,control-alt-delete,…

DataCenter

Mini-batch,Batch

Spark

LowLatencyMicroservices

Ka;aStreamsAkkaStreams

Sessions

Streams

Storage

Device

Telemetry1

2. → Model Training3. ← New models

2, 3

4. ← Model Storage5. → Boot up, historical data

4, 5

6. → Data Pipeline 7. → Model Serving 8. ← Anomalies

6, 7, 8Ingest Scores9

Corrective Action

10

Spark

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

Broker

@deanwampler

Sessionmanagement,RESTmicroservices Model

Scoring

ModelTraining

Threegroupsoffunctionality

@deanwampler

Streaming Architectures: An Example

Variation: Model Serving as a Service

DataCenter

Mini-batch,Batch

Spark

LowLatencyMicroservices

Ka;aStreamsAkkaStreams

Sessions

Streams

Storage

Device

Telemetry1

2. → Model Training3. ← New models

2, 3

4. ← Model Storage5. → Boot up, historical data

4, 5

6. → Data Pipeline 7. → Model Serving 8. ← Anomalies

6, 7, 8Ingest Scores9

Corrective Action

10

Spark

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

Broker

@deanwampler

From this…

@deanwampler

DataCenter

REST,gRPC,…

Spark

LowLatency

Ka9aStreamsAkkaStreams

Sessions

Streams

Storage

Device

Telemetry1

2. → Model Training

3

3. ← Model Storage4. → Boot up, historical data

3, 4

5. → Data Pipeline 7. ← Anomalies

5, 7Ingest Scores8

Corrective Action

9

TensorFlow,…

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

Broker 66. Model Serving

Bothtrainingandservingdonebyaseparateservice,likeTensorFlow

To this…

@deanwampler

Streaming Architectures: An Example

Variation: Model Serving on the Edge

DataCenter

Mini-batch,Batch

Spark

LowLatencyMicroservices

Ka;aStreamsAkkaStreams

Sessions

Streams

Storage

Device

Telemetry1

2. → Model Training3. ← New models

2, 3

4. ← Model Storage5. → Boot up, historical data

4, 5

6. → Data Pipeline 7. → Model Serving 8. ← Anomalies

6, 7, 8Ingest Scores9

Corrective Action

10

Spark

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

Broker

@deanwampler

From this…

DataCenter

Mini-batch,Batch

SparkSessions

Streams

Storage

Device

Telemetry1

2. → Model Training3. ← New models

2, 3

4. ← Model Storage5. → Boot up, historical data

4, 5

Corrective Action

8

Spark

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

Model Parameters6Model Parameters7

Broker

@deanwampler

Pushmodelupdatestothedeviceandscorethere.Eliminatesround-tripoverheadforscoring.

To this…

@deanwampler

Mixing data science and data engineering

@deanwampler

SoftwareEngineering toolbox

Data Science toolbox

Tools

@deanwampler

Data Science:• Less process oriented• Iterative, experimental• … but still within the

Scientific Method

Data Engineering:• Process oriented• Agile Manifesto• … which does not

mention data!

Process

https://derwen.ai/s/6fqt

@deanwampler

Data Science:• Comfortable with

uncertainty• Probabilities and

Statistics

Data Engineering:• Uncomfortable with

uncertainty• Prefer determinism• … while admitting that

distributed systems aren’t deterministic

Philosophy

@deanwampler

Other Production Concerns

@deanwampler

Models Are

Data

@deanwampler

Auditing• Kind of Model• Parameters and hyperparameters•When trained• Data used for training•When deployed, undeployed, etc.•…

@deanwampler

Auditing• Quality metrics• Serving metrics (how many records, scoring

times…)• Provenance of decision to retrain• The metrics gathered above that were used to

decide when to retrain

@deanwampler

Auditing•… and maybe most important:• What model was used to score a particular

record??

@deanwampler

Model Updates

Concept Drift

Models have a half life, too

@deanwampler

DataCenter

Mini-batch,Batch

Spark

LowLatencyMicroservices

Ka;aStreamsAkkaStreams

Sessions

Streams

Storage

Device

Telemetry1

2. → Model Training3. ← New models

2, 3

4. ← Model Storage5. → Boot up, historical data

4, 5

6. → Data Pipeline 7. → Model Serving 8. ← Anomalies

6, 7, 8Ingest Scores9

Corrective Action

10

Spark

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

Broker

Model Updates

Recap

Information value has a half life;

it decays with time

Streaming systems have similar requirements as

microservices

DataCenter

Mini-batch,Batch

Spark

LowLatencyMicroservices

Ka;aStreamsAkkaStreams

Sessions

Streams

Storage

Device

Telemetry1

2. → Model Training3. ← New models

2, 3

4. ← Model Storage5. → Boot up, historical data

4, 5

6. → Data Pipeline 7. → Model Serving 8. ← Anomalies

6, 7, 8Ingest Scores9

Corrective Action

10

Spark

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

Broker

Integratingdata science anddata engineering

is hard

Models also have ahalf life;

They have to be updated periodically

@deanwamplerMarslastSummer

DustyMilkyWay

References• General Information about Stream Processing• My O'Reilly Report on Architectures• Streaming Systems Book• Stream Processing with Apache Spark• Designing Data-Intensive Apps book

@deanwampler

But first, introductions…

What we’re up to at Lightbend… lightbend.com/lightbend-pipelines-demo

Buy my stuff!!

@deanwampler

Dean Wampler, Ph.D. dean@lightbend.com

@deanwampler lightbend.com/lightbend-platform

polyglotprogramming.com/talks

Questions?

top related