data pipeline monitoring - michiel kalkman · data pipeline reporng monitoring diagnoscs alerng...

39

Data Pipeline Monitoring Michiel Kalkman

Upload: others

Post on 27-Aug-2020

20 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Data Pipeline Monitoring

Michiel Kalkman

Page 2: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Mental model of a pipeline

Figure 1: Actually a duct (Source:wikimedia)

https://en.wikipedia.org/wiki/File:Regenerative_thermal_oxidizer.jpg

Page 3: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Map of a real pipeline

Figure 2: Typical pipeline (Source:wikimedia)

https://en.wikipedia.org/wiki/File:GTS_Ukraine1.jpg

Page 4: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Notes

Pipelines,▶ are systems▶ cross multiple political zones▶ cross multiple technical zones▶ have multiple inputs (providers, sources)▶ have multiple outputs (consumers, sinks)▶ carry payloads in multiple stages (refinements)

Page 5: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Break it down

By administrative zones

Defines supportability, frames arguments over responsibility

Page 6: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Observability

Page 7: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Pillars of Observability

Logs Metrics TracingAccounting X XReporting X XAlerting X XTesting X X XDiagnostics X X XVerification X XAuditing X

Page 8: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

What gets measured gets managed

Products

Data

Pipeline

Repor�ngMonitoringDiagnos�csAudi�ng Aler�ng Accoun�ng

MetricsLogs Tracing

Pipeline component

Figure 3: Component observability

Page 9: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

What to monitor

▶ Data flowing across platform boundaries▶ Cycles in the pipeline▶ Data flow pressure points▶ Baseline operation separate from service operation▶ Infrastructure separate from service operation▶ Quality control gateways for change

Page 10: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Feed

Products

Observability Data

Asset

Repor�ng Monitoring Diagnos�cs Aler�ng

Metrics

Pipeline component

Figure 4: Observability - Metrics

Page 11: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Metrics focus

A wide variety of metrics out there. It’s easy to get lost. Define high level metrics thatcan be compared consistenty across the entire landscape. Focus on two distinct areas.Different sides of the same coin,

Utilization, Saturation, Errors (USE)

These are resource focused and provide technical information▶ “Which servers are overloaded?”

Rate, Errors, Duration (RED)

These are service focused and provide business information▶ “Am I meeting my SLA targets?”

Page 12: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Four Golden Signals (Google SRE)

1. Latency2. Traffic3. Errors4. Saturation

Page 13: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Throughput - components

DownstreamInput

ForwarderOutput

Upstream

Count bytesCount events

Count bytesCount events

Figure 5: Component metrics

Page 14: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Throughput

Counter t1 t2input bytes 100 200output bytes 150 270input events 20 30output events 30 55

▶ Throughput Rate is (𝑡2 − 𝑡1)▶ Average event size

▶ 𝐼𝑛 = 𝑅𝑎𝑡𝑒(𝐵𝑦𝑡𝑒𝑠𝐼𝑛)𝑅𝑎𝑡𝑒(𝐸𝑣𝑒𝑛𝑡𝑠𝐼𝑛)

▶ 𝑂𝑢𝑡 = 𝑅𝑎𝑡𝑒(𝐵𝑦𝑡𝑒𝑠𝑂𝑢𝑡)𝑅𝑎𝑡𝑒(𝐸𝑣𝑒𝑛𝑡𝑠𝑂𝑢𝑡)

▶ Internal buffer pressure▶ 𝑅𝑎𝑡𝑒(𝐸𝑣𝑒𝑛𝑡𝑠𝐼𝑛) − 𝑅𝑎𝑡𝑒(𝐸𝑣𝑒𝑛𝑡𝑠𝑂𝑢𝑡)

Page 15: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Tracing

Page 16: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Feed

Products

Data

Pipeline

Repor�ng Monitoring Diagnos�cs Aler�ng Accoun�ng

Tracing

Pipeline component

Figure 6: Observability - Tracing

Page 17: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Three Ts

Inputs Outputs TransformationTransaction 1 1 New dataTransportation 1 1+ EnrichmentTransformation 1+ 1+ New data, enrichment

Page 18: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Transportation tracing

Downstream

Forwarder

Upstream

Figure 7: Transportation

Page 19: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Transaction tracing

User

Component A Component B

Figure 8: Distributed transaction

Page 20: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Transformation tracing

Source A Source B

Transformer

Upstream Target

Figure 9: Transformation

Page 21: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Monitoring

Page 22: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Plan for failure

Figure 10: Hopefully not this bad (Source:wikimedia)

https://en.wikipedia.org/wiki/File:Defense.gov_photo_essay_100421-G-0000L-003.jpg

Page 23: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Key monitoring points

▶ Integrity▶ Packet/event/record drops▶ Timeouts, queue expiries▶ Data loss scenarios

▶ Capacity▶ Backpressure signaling▶ Backlog processing▶ Peak hour spikes

Page 24: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Heartbeats

▶ Add a dummy input channel to each input▶ Continuously generate fixed data at fixed rate▶ Monitor dummy channel on each boundary▶ Alert on dummy channel rate at each boundary

Page 25: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Buffers, Backlogs and Backpressure

Page 26: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

MQ pipeline with push - dataflow

Topic A

Topic B

Topic C

Topic D

Transform

Transform

Transform

Producer MQ Handler 1 Handler 2 Handler 3 Consumer

Figure 11: MQ pipeline with push - dataflow

Page 27: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

MQ pipeline with push - sequenceProducer

Producer

MQ

MQ

Handler 1

Handler 1

Handler 2

Handler 2

Handler 3

Handler 3

Consumer

Consumer

PUSH Topic A

PUSH Topic A

Pressure point

Process

PUSH Topic B

PUSH Topic B

Pressure point

Process

PUSH Topic C

PUSH Topic C

Pressure point

Process

PUSH Topic D

PUSH Topic D

Pressure point

Figure 12: Kafka pipeline with push - sequence

Page 28: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

MQ pipeline notes

▶ This design is active here, sends data as it comes in▶ Server-push model for moving data

▶ Yes, you can also poll a queue▶ Complex programming model

▶ MQ-specific protocol▶ Requires registration of callback▶ Handler process might be unavailable

Page 29: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Model

def next(records_in, buffer_size, output_capacity):buffer_size = buffer_size + records_in

if ((buffer_size - output_capacity) >= 0):records_out = output_capacitybuffer_size = buffer_size - output_capacity

else:records_out = buffer_sizebuffer_size = 0

plot(records_in, buffer_size, records_out)return buffer_size

Page 30: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Input rate =< output capacity

Figure 13: Output capacity = 15 eps

Page 31: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Backlog processing

Figure 14: Output capacity = 5 eps

Page 32: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Backlog processing with finite buffer

Figure 15: Limit reached with no backpressure means data loss

Page 33: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Observing buffer change rate

t1 t2 t3 t4𝐶𝑜𝑢𝑛𝑡𝑒𝑟(𝐼𝑛) 5 12 19 26𝐶𝑜𝑢𝑛𝑡𝑒𝑟(𝑂𝑢𝑡) 5 10 15 20𝑅𝑎𝑡𝑒𝐼𝑛(𝑡) N/A 7 7 7𝑅𝑎𝑡𝑒𝑂𝑢𝑡(𝑡) N/A 5 5 5𝑅𝑎𝑡𝑒𝐼𝑛(𝑡) − 𝑅𝑎𝑡𝑒𝑂𝑢𝑡(𝑡) N/A 2 2 2

𝑅𝑎𝑡𝑒(𝑛) = 𝐶𝑜𝑢𝑛𝑡𝑒𝑟(𝑛) − 𝐶𝑜𝑢𝑛𝑡𝑒𝑟(𝑛 − 1) 𝐵𝑢𝑓𝑓𝑒𝑟(𝑛) = 𝑅𝑎𝑡𝑒𝐼𝑛(𝑛) − 𝑅𝑎𝑡𝑒𝑂𝑢𝑡(𝑛)

Page 34: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Buffer change rate

Figure 16: Long term average of the red line should approach zero

Page 35: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Kafka pipeline - Dataflow - by asset

Schema A Topic A

Schema B Topic B

Schema C Topic C

Schema D Topic D

Transform

Transform

Transform

Producer Ka�a Spark 1 Spark 2 Spark 3 Consumer

Figure 17: Kafka pipeline Dataflow

Page 36: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Kafka pipeline - connection initiation - by asset

Producer

Producer

Ka�a

Ka�a

Spark 1

Spark 1

Spark 2

Spark 2

Spark 3

Spark 3

Consumer

Consumer

PUSH Topic A

PULL Topic A

Process

PUSH Topic B

PULL Topic B

Process

PUSH Topic C

PULL Topic C

Process

PUSH Topic D

PULL Topic D

Figure 18: Kafka pipeline sequence

Page 37: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Kafka pipeline - Dataflow - by service

Topic A

Transform

Topic B

Transform

Topic C

Transform

Topic D

Producer Ka�aTopic A

Spark 1 Ka�aTopic B

Spark 2 Ka�aTopic C

Spark 3 Ka�aTopic D

Consumer

Figure 19: Kafka pipeline Dataflow

Page 38: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Kafka pipeline - connection initiation - by serviceKa�a Ka�a Ka�a Ka�a

Producer

Producer

Topic A

Topic A

Spark 1

Spark 1

Topic B

Topic B

Spark 2

Spark 2

Topic C

Topic C

Spark 3

Spark 3

Topic D

Topic D

Consumer

Consumer

PUSH

PULL

Process

PUSH

PULL

Process

PUSH

PULL

Process

PUSH

PULL

Figure 20: Kafka pipeline sequence

Page 39: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Kafka pipeline notes

▶ This design is passive, does not send data unless asked▶ Client-pull model for moving data▶ All persistence is done on Kafka▶ Very simple programming model▶ Well understood wire-protocol (HTTP)

SUBSEA HEAT TRACING SOLUTIONS · 2019. 3. 6. · Kirinskoe Nakhodka Shanghai Yubeleynoye Yarudeiskoye Yamal LNG Priobskoye Mamontovskoye Medvezhe Zapolyarny Zapolyarye-Purpe pipeline

Basics of Rendering Pipeline Based Rendering –Objects in the scene are rendered in a sequence of steps that form the Rendering Pipeline. Ray-Tracing –A

Psychological Diagnoscs and Quality Management (MSc Alexandra Faessler)

IEEE TRANSACTIONS ON VISUALIZATION AND ......ray database obsolete and re-computing the ray-database is too costly to be done per frame. 2.2 Ray Tracing The ray tracing pipeline [1],

Graphic pipeline OpenGL, DirectX etc · 2 Computer Graphics Graphic pipeline Approach that is complementary to ray-tracing Lots of different implementations Software implementation

Naive Ray-Tracing: A Divide-And-Conquer Approach · ray-tracing engineering, as spatial subdivision data structures are no longer considered in the graphics pipeline. This is possible

Skip Tracing Tools | Skip Trace Data | Skip Tracing Ideas

The Law of Tracing...3. Common law tracing: an illusion 4. Equitable tracing: prerequisites 5. Equitable tracing: mixing 6. Equitable tracing: the “lowestintermediate balance”rule

Integration of Ray-Tracing Methods into the … of Ray-Tracing Methods into the Rasterisation Process by ... Integration of Ray-Tracing Methods into the ... 3.4 Rendering Pipeline

Shifting paradigms: Best practice in juvenile justice ... 2/Session III/U... · Tracing the School to Prison Pipeline from zero tolerance policies to juvenile justice dispositions

Heat Tracing Basics - Mullan Consultants Tracing Basics... · By: Homi R. Mullan 2 Heat Tracing Basics What is Heat Tracing? Why Heat Tracing? Fundamentals of Heat Loss and Heat Replenishment

Graphics Pipeline: Projective Transformations Last Timegroups.csail.mit.edu/graphics/classes/6.837/F04/lectures/06... · 2 MIT EECS 6.837, Durand and Cutler Forward Ray Tracing •

Lecture 17Slide 16.837 Fall 2001 Ray Tracing Beyond the Graphics Pipeline Ray Casting Ray-Object Intersection Global Illumination - Reflections and Transparency

Computer graphics & visualization. Image Synthesis – WS 07/08 Dr. Jens Krüger – Computer Graphics and Visualization Group Our GPU Photon Tracing Pipeline

STEAM TRACING DESIGN AND INSTALLATION · STEAM TRACING DESIGN AND INSTALLATION 1. STEAM TRACING DESIGN AND INSTALLATION CONSIDERATIONSMMER When designing a modern steam tracing system,

Letter Tracing! aa - wileykindergarten.weebly.com · dddd dddddd ddddddd Letter Tracing! Name: ee eee eeee eeeeee eeeeeee Letter Tracing! Name: ff fff ffff ffffff fffffff Letter Tracing!

Tracing Ethnography

Bayesian Knowledge Tracing Prediction Models. Bayesian Knowledge Tracing

RANCANG BANGUN RAY TRACING PIPELINE PADA …web.ee.its.ac.id/~fuad/papers/THESIS.pdfreza fuad rachmadi 2208 205 740 dosen pembimbing: prof. dr. ir. mauridhi hery p., meng. program

PNSC0035 tracing

Pipeline graphique OpenGL, DirectX etc · 2 Infographie Pipeline graphique Approche complémentaire du ray-tracing Beaucoup de versions Implémentation logicielle, ex. Pixar Renderman

Conventional heat tracing (1)inko.com.sg/image/data/CATALOG/CSI/CSI-9.pdf · 2018-09-19 · Conventional heat tracing (2) • Product flow in the main pipeline • Steam filled tracing

Tracing 1. Contents Why Tracing Why Tracing Tracing in ASP.NET Tracing in ASP.NET Page Level tracing Page Level tracing Application

Tracing and using data lineage for pipeline processing in Astro-WISE

Tracing circles

17 Graphics Pipeline - Computer · PDF fileGraphics Pipeline Final Projects ... • Reports Due: Monday 5/10 ... Graphics Pipeline (OpenGL) Ray Tracing . 2 Scan Conversion

第五课 Ray Tracing. Overview of the Section Why Ray Tracing? What is Ray Tracing? How to tracing rays? How to accelerating ray-tracing?

Laser Diagnoscs for H‐ Beams - USPAS | U.S. Particle ...uspas.fnal.gov/materials/09UNM/Laser Diagnostics-Shea.pdf · Advantages Disadvantages 2. USPAS09 at UNM Accelerator and Beam

Hardware-assisted software tracing - eLinux.org · Hardware-assisted software tracing ... hardware-assisted branch tracing faster than pure-software event tracing?” BTS not meant