data platform evolution
TRANSCRIPT
![Page 1: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/1.jpg)
Data PlatformEvolution
![Page 2: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/2.jpg)
About One by Aol.
![Page 3: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/3.jpg)
About Our Team
![Page 4: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/4.jpg)
About Our Data
Video Tracking
Ad Tracking
User Tracking
![Page 5: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/5.jpg)
LEGACYPLATFORM
![Page 6: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/6.jpg)
Legacy SystemDWH Cluster
SSIS Manager
External Data Providers
Event Collector
Caching
Reporting
DWH
Application Servers
![Page 7: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/7.jpg)
Legacy Scale
500TBStorage
40KEvents Processed
per Second
3.5BEvent Processed Daily
Daily Processing
20GBData Daily
![Page 8: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/8.jpg)
The Need To Change
Cost
Processing Time
Scale
Development ROI
Testability
Accessibility
![Page 9: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/9.jpg)
NEXTSTEPS
![Page 10: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/10.jpg)
Next Steps
3 Stages
Outcome
Component Description
Examples
![Page 11: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/11.jpg)
Legacy SystemDWH Cluster
SSIS Manager
External Data Providers
Event Collector
Caching
Reporting
DWH
Application Servers
![Page 12: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/12.jpg)
First Stage
Data warehouse
Servers Servers
Servers
Data Collection
Servers
Data Distribution
Servers
DWH API
Servers
External Data Providers
Event Collector Analytics
Reporting
Monitoring
Servers
sFTPFTP
sFTPFTP
Legacy DWH
Servers
![Page 13: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/13.jpg)
First Stage Summary
Full Redundancy
Comparison Legacy vs. Batch
Linear Scale
Partial Test Coverage
Raw Level Data Access
CD
![Page 14: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/14.jpg)
First Stage
Data warehouse
Servers Servers
Servers
Data Collection
Servers
Data Distribution
Servers
DWH API
Servers
External Data Providers
Event Collector Analytics
Reporting
Monitoring
Servers
sFTPFTP
sFTPFTP
Legacy DWH
Servers
![Page 15: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/15.jpg)
Second Stage
Data warehouse
Servers Servers
Servers
Servers
Data Collection
Servers
Data Distribution
Servers
DWH API
Servers
External Data Providers
Event Collector
Scheduling
Reporting
Monitoring
Servers
S3AzuresFTPFTP
AzureS3sFTPFTP
Real Time DWH
Servers
Servers
Analytics
![Page 16: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/16.jpg)
First Stage Summary
Near Real time Processing
Comparison Batch vs. Real Time
Full Monitoring
Full Test Coverage
“Product” Event/Report Definition
DevOps Automation
![Page 17: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/17.jpg)
MOREDETAILS
![Page 18: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/18.jpg)
Batch Event Processing
Hadoop Cluster
Hadoop Monitoring
Aggregated data exporter
Processed data aggregator
Error Processing
Data Archivator
Data Collection Cluster
Raw data processingMap-Reduce
Raw data files pushed to Hadoop (WEB HDFS)
Vertica
External\Internal DWH Clusters
Data flow direction
Monitoring data
Raw data processing1. Cleaning/Transformation/Enrichment/Validation of data from main data sources with Map-Reduce2. Month history
Aggregator Process1. DSL for defining new kind of aggregation
Data exporter1. Export aggregated data2. Export processed data
Processed\Aggregated data
Logging Framework Elastic Search
Logs will be exposed through Kibana to monitor data flow
Monitoring
Monitoring of data flow inside and outside of Event Processing Cluster
Hadoop monitoring data
Error Processing1. Automatic error re-processing with time window
S3
![Page 19: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/19.jpg)
Examples Event Processing
![Page 20: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/20.jpg)
Examples Event Processing
![Page 21: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/21.jpg)
Examples Event Processing
![Page 22: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/22.jpg)
Examples Event Processing
![Page 23: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/23.jpg)
Data Collection
Data Collection Cluster
Servers
Servers
Servers
Video TrackingAd TrackingUser Tracking
3rd Party Ad Tracking
SQL Server
CSV data received every hour via FTP. Raw Events and Dimensions.
Text files received every five minutes. From Public and Private Cloud.Raw Events.
Logging Framework Elastic Search
Hadoop Processing Cluster
Data about received files\events reported with logging
framework
Raw data files pushed to Hadoop (WEB HDFS)
Dimension tables
Servers to acquireStage 1 :.NET Application will pull FTP, SQL DWH server for loggers and SQL Replication for dimension dataStage 2:Think to move to other more appropriate technology like Akka
Data flow direction
Logs will be exposed through Kibana to monitor data flow
Monitoring data
Monitoring
Monitoring of data flow inside and outside of Data Collection Cluster
MongoDb
![Page 24: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/24.jpg)
Data Distribution
Data Distribution Cluster
Hive
Vertica
MongoDB
Report Distributor
Logging Framework Elastic Search
Reporting Platform
Data flow direction
Logs will be exposed through Kibana to monitor data flow
Monitoring data
Monitoring
Monitoring of data flow inside and outside of Data Distribution Cluster
Report S3 Storage
![Page 25: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/25.jpg)
Examples Data & Distribution Collection
![Page 26: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/26.jpg)
Examples Data & Distribution Collection
![Page 27: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/27.jpg)
Examples Data & Distribution Collection
![Page 28: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/28.jpg)
Examples Data & Distribution Collection
![Page 29: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/29.jpg)
Reporting Platform
Vertica
Hive
SQL Server
1. Distributed2. Encapsulate Repository3. Versioning4. Smart query execution5. Testable
MongoDb
Reporting Platform
Report Designer
Report Provider
Report Distributor
Reporting API
Statistics Provider
S3 Report Storage
Data sources of Reporting platform are in Private and Public
Application Servers
![Page 30: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/30.jpg)
Examples Applications
![Page 31: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/31.jpg)
Examples Applications
![Page 32: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/32.jpg)
Examples Applications
![Page 33: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/33.jpg)
Examples Applications
![Page 34: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/34.jpg)
MonitoringMonitoring Cluster
Cloudera Manager
Elastic Search Cluster
Vertica Management
Kibana
Zabbix
Applications
Vertica
Hadoop
MongoDb
![Page 35: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/35.jpg)
Examples Monitoring & Alerting
![Page 36: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/36.jpg)
Examples Monitoring & Alerting
![Page 37: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/37.jpg)
Examples Monitoring & Alerting
![Page 38: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/38.jpg)
Examples Monitoring & Alerting
![Page 39: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/39.jpg)
Examples Monitoring & Alerting
![Page 40: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/40.jpg)
Examples Monitoring & Alerting
![Page 41: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/41.jpg)
Migration Outcome
15%Cost Reduction
Linear Scale
90%Unit Test Coverage
x280Processing Time
x50Development ROI
![Page 42: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/42.jpg)
Current Scale
86BEvent Processed Daily
120TBData Daily
1MEvents Processed
per Second
Near Real Time ProcessingMinimum Interval : 5 min
15+Event Sources
4.5PBHadoop
70TBVertica
![Page 43: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/43.jpg)
Scale Growth
x15Event Processed Daily
x6000Daily Processed Data
x25Events Processed
per Second
x280Processing Time
![Page 44: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/44.jpg)
Second Stage
Data warehouse
Servers Servers
Servers
Servers
Data Collection
Servers
Data Distribution
Servers
DWH API
Servers
External Data Providers
Event Collector
Scheduling
Reporting
Monitoring
Servers
S3AzuresFTPFTP
AzureS3sFTPFTP
Real Time DWH
Servers
Servers
Analytics
![Page 45: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/45.jpg)
Third Stage
Data warehouse
Servers
Servers
Servers
Servers
Data Collection
Servers
Data Distribution
Servers
DWH API
Servers
External Data Providers
Event Collector
Scheduling
Reporting
Monitoring
Servers
S3AzuresFTPFTP
AzureS3sFTPFTP
Real Time DWH
Servers ServersServers
Analytics
![Page 46: Data platform evolution](https://reader031.vdocuments.net/reader031/viewer/2022022412/58f9a91f760da3da068b6b3a/html5/thumbnails/46.jpg)
THANKYOU