![Page 1: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/1.jpg)
Designing Agile Data Pipelines
Ashish Singh | Software Engineer, Cloudera
![Page 2: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/2.jpg)
2 ©2014 Cloudera, Inc. All rights reserved.
• Software Engineer @ Cloudera • Contributed to Kafka, Hive, Parquet and Sentry • Used to work in HPC • @singhasdev
About Me
![Page 3: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/3.jpg)
“Big Data” is stuck at The Lab.
![Page 4: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/4.jpg)
4
We want to move to The Factory
![Page 5: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/5.jpg)
5 Click to enter confidentiality information
![Page 6: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/6.jpg)
6
What does it mean to “Systemize”? • Ability to easily add new data sources • Easily improve and expend analytics • Ease data access by standardizing metadata and storage • Ability to discover mistakes and to recover from them • Ability to safely experiment with new approaches
Click to enter confidentiality information
![Page 7: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/7.jpg)
7
We will discuss: • Actual decision making • Data Science • Machine learning • Algorithms
Click to enter confidentiality information
We will not discuss: • Architectures • Patterns • Ingest • Storage • Schemas • Metadata • Streaming • Experimenting • Recovery
![Page 8: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/8.jpg)
8
So how do we build real data architectures?
Click to enter confidentiality information
![Page 9: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/9.jpg)
9
The Data Bus
![Page 10: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/10.jpg)
10
Client Backend
Data Pipelines Start like this.
![Page 11: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/11.jpg)
11
Client Backend
Client
Client
Client
Then we reuse them
![Page 12: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/12.jpg)
12
Client Backend
Client
Client
Client
Then we add multiple backends
Another Backend
![Page 13: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/13.jpg)
13
Client Backend
Client
Client
Client
Then it starts to look like this
Another Backend
Another Backend
Another Backend
![Page 14: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/14.jpg)
14
Client Backend
Client
Client
Client
With maybe some of this
Another Backend
Another Backend
Another Backend
![Page 15: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/15.jpg)
15
Adding applications should be easier We need: • Shared infrastructure for sending records • Infrastructure must scale • Set of agreed-upon record schemas
![Page 16: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/16.jpg)
16
Kafka Based Ingest Architecture
Source System Source System Source System Source System
Hadoop Security Systems
Real-time monitoring
Data Warehouse
Kafka
Producers
Brokers
Consumers
Kafka decouples Data Pipelines
![Page 17: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/17.jpg)
17
Retain All Data
Click to enter confidentiality information
![Page 18: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/18.jpg)
18
Data Pipeline – Traditional View Raw data
Raw data Clean data
Aggregated data Clean data Enriched data
Input Output Waste of diskspace
![Page 19: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/19.jpg)
19 ©2014 Cloudera, Inc. All rights reserved.
It is all valuable data Raw data
Raw data Clean data
Aggregated data Clean data Enriched data
Filtered data Dash board Report
Data scientist Alerts
OMG
![Page 20: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/20.jpg)
20
Hadoop Based ETL – The FileSystem is the DB
/user/… /user/gshapira/testdata/orders /data/<database>/<table>/<partition> /data/<biz unit>/<app>/<dataset>/partition /data/pharmacy/fraud/orders/date=20131101 /etl/<biz unit>/<app>/<dataset>/<stage> /etl/pharmacy/fraud/orders/validated
![Page 21: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/21.jpg)
21
Store intermediate data /etl/<biz unit>/<app>/<dataset>/<stage>/<dataset_id> /etl/pharmacy/fraud/orders/raw/date=20131101 /etl/pharmacy/fraud/orders/deduped/date=20131101 /etl/pharmacy/fraud/orders/validated/date=20131101 /etl/pharmacy/fraud/orders_labs/merged/date=20131101 /etl/pharmacy/fraud/orders_labs/aggregated/date=20131101 /etl/pharmacy/fraud/orders_labs/ranked/date=20131101
Click to enter confidentiality information
![Page 22: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/22.jpg)
22
Batch ETL is old news
Click to enter confidentiality information
![Page 23: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/23.jpg)
23
Small Problem! • HDFS is optimized for large chunks of data • Don’t write individual events of micro-batches • Think 100M-2G batches • What do we do with small events?
Click to enter confidentiality information
![Page 24: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/24.jpg)
24
Well, we have this data bus…
Click to enter confidentiality information
0 1 2 3 4 5 6 7 8 9 10
11
12
13
0 1 2 3 4 5 6 7 8 9 10
11
0 1 2 3 4 5 6 7 8 9 10
11
12
13
Partition 1
Partition 2
Partition 3
Writes
Old New
![Page 25: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/25.jpg)
25
Kafka has topics How about? <biz unit>.<app>.<dataset>.<stage> pharmacy.fraud.orders.raw pharmacy.fraud.orders.deduped pharmacy.fraud.orders.validated pharmacy.fraud.orders_labs.merged
Click to enter confidentiality information
![Page 26: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/26.jpg)
26 ©2014 Cloudera, Inc. All rights reserved.
It’s (almost) all topics Raw data
Raw data Clean data
Aggregated data Clean data
Filtered data Dash board Report
Data scientist Alerts
OMG
Enriched Data
![Page 27: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/27.jpg)
27
Benefits • Recover from accidents • Debug suspicious results • Fix algorithm errors • Experiment with new algorithms
Click to enter confidentiality information
![Page 28: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/28.jpg)
28
Kinda Lambda
![Page 29: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/29.jpg)
29
Lambda Architecture • Immutable events • Store intermediate stages • Combine Batches and Streams • Reprocessing
Click to enter confidentiality information
![Page 30: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/30.jpg)
30
What we don’t like
Maintaining two applications Often in two languages That do the same thing
Click to enter confidentiality information
![Page 31: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/31.jpg)
31
Pain Avoidance #1 – Use Spark + SparkStreaming
• Spark is awesome for batch, so why not? – The New Kid that isn’t that New Anymore – Easily 10x less code – Extremely Easy and Powerful API – Very good for machine learning – Scala, Java, and Python – RDDs – DAG Engine
Click to enter confidentiality information
![Page 32: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/32.jpg)
32
Spark Streaming • Calling Spark in a Loop • Extends RDDs with DStream • Very Little Code Changes from ETL to Streaming
Confidentiality Information Goes Here
![Page 33: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/33.jpg)
33
Spark Streaming
Confidentiality Information Goes Here
Single Pass
Source Receiver RDD
Source Receiver RDD
RDD
Filter Count Print
Source Receiver RDD
RDD
RDD
Single Pass
Filter Count Print
Pre-first Batch
First Batch
Second Batch
![Page 34: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/34.jpg)
34
Small Example val sparkConf = new SparkConf() .setMaster(args(0)).setAppName(this.getClass.getCanonicalName)
val ssc = new StreamingContext(sparkConf, Seconds(10))
// Create the DStream from data sent over the network
val dStream = ssc.socketTextStream(args(1), args(2).toInt, StorageLevel.MEMORY_AND_DISK_SER)
// Counting the errors in each RDD in the stream
val errCountStream = dStream.transform(rdd => ErrorCount.countErrors(rdd))
val stateStream = errCountStream.updateStateByKey[Int](updateFunc)
errCountStream.foreachRDD(rdd => {
System.out.println("Errors this minute:%d".format(rdd.first()._2))
})
Click to enter confidentiality information
![Page 35: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/35.jpg)
35
Pain Avoidance #2 – Split the Stream Why do we even need stream + batch? • Batch efficiencies • Re-process to fix errors • Re-process after delayed arrival
What if we could re-play data?
Click to enter confidentiality information
![Page 36: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/36.jpg)
36
Kafka + Stream Processing
Click to enter confidentiality information
0 1 2 3 4 5 6 7 8 9 10
11
12
13
Streaming App v1 Result set 1
App
![Page 37: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/37.jpg)
37
Lets Re-Process with new algorithm
Click to enter confidentiality information
0 1 2 3 4 5 6 7 8 9 10
11
12
13
Streaming App v1
Streaming App v2
Result set 1
Result set 2
App
![Page 38: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/38.jpg)
38
Lets Re-Process with new algorithm
Click to enter confidentiality information
0 1 2 3 4 5 6 7 8 9 10
11
12
13
Streaming App v1
Streaming App v2
Result set 1
Result set 2
App
![Page 39: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/39.jpg)
39
Oh no, we just got a bunch of data for yesterday!
Click to enter confidentiality information
0 1 2 3 4 5 6 7 8 9 10
11
12
13
Streaming App
Streaming App
Today
Yesterday
![Page 40: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/40.jpg)
40
Note:
No need to choose between the approaches. There are good reasons to do both.
Click to enter confidentiality information
![Page 41: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/41.jpg)
41
Prediction:
Batch vs. Streaming distinction is going away.
Click to enter confidentiality information
![Page 42: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/42.jpg)
42
Yes, you really need a Schema
Click to enter confidentiality information
![Page 43: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/43.jpg)
43
Schema is a MUST HAVE for data integration
Click to enter confidentiality information
![Page 44: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/44.jpg)
44
Client Backend
Client
Client
Client
Another Backend
Another Backend
Another Backend
![Page 45: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/45.jpg)
45
Remember that we want this?
Source System Source System Source System Source System
Hadoop Security Systems
Real-time monitoring
Data Warehouse
Kafka
Producers
Brokers
Consumers
![Page 46: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/46.jpg)
46
This means we need this:
Click to enter confidentiality information
Source System Source System Source System Source System
Hadoop Security Systems
Real-time monitoring
Data Warehouse
Kafka Schema Repository
![Page 47: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/47.jpg)
47
We can do it in few ways • People go around asking each other:
“So, what does the 5th field of the messages in topic Blah contain?” • There’s utility code for reading/writing messages that everyone
reuses • Schema embedded in the message • A centralized repository for schemas
– Each message has Schema ID – Each topic has Schema ID
Click to enter confidentiality information
![Page 48: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/48.jpg)
48
I Avro • Define Schema • Generate code for objects • Serialize / Deserialize into Bytes or JSON • Embed schema in files / records… or not • Support for our favorite languages… Except Go. • Schema Evolution
– Add and remove fields without breaking anything
Click to enter confidentiality information
![Page 49: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/49.jpg)
49
Schemas are Agile • Schemas allow adding readers and writers easily • Schemas allow modifying readers and writers independently • Schemas can evolve as the system grows
Click to enter confidentiality information
![Page 50: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/50.jpg)
50 Click to enter confidentiality information
![Page 51: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/51.jpg)
51
Woah, that was lots of stuff!
Click to enter confidentiality information
![Page 52: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/52.jpg)
52
Recap – if you remember nothing else… • After the POC, its time for production • Goal: Evolve fast without breaking things For this you need: • Keep all data • Design pipeline for error recovery – batch or stream • Integrate with a data bus • And Schemas
![Page 53: Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information](https://reader035.vdocuments.net/reader035/viewer/2022062919/5edfacb5ad6a402d666b013c/html5/thumbnails/53.jpg)
Thank you