the olx data theory of everything - aws-de-media.s3...
TRANSCRIPT
![Page 1: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/1.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Caspar Schönau
Head of Global BI
Jakub Orłowski
Data engineering manager
The OLX data theory of everything
![Page 2: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/2.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
‘The biggest internet company that you have
never heard of’
![Page 3: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/3.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Founded 1915South-Africa
Market cap: $100B
![Page 4: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/4.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
![Page 5: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/5.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
43Countries
+350M MAU
+5,000 Employees
35Offices
+4B events / day
![Page 6: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/6.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The OLX challenge:
‘Give everybody the data that he or she
needs’ (but also not much more)
![Page 7: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/7.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
![Page 8: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/8.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
![Page 9: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/9.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Halleluja!We are a data-driven company!
![Page 10: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/10.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What decisions are you really taking on a daily basis? And how does data play a role?
Do I need move to the next valley to
survive the winter?
Can I win the javelin throw competition at the next Olympics?
Can I buy a higher desk, so I don’t destroy my back while
working?
• Kg of food in the valley• # of apes in the valley• Kg of food / ape needed
per day• # days of left this winter
• WR javelin throw in m• PB javelin throw in m• Time between date of
PB and next Olympics
• $ in the bank• Price of a decent
desk in $
![Page 11: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/11.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The same goes for an organization like OLX.Which data points are really influencing your decisions?
CEO GM CS Manager
months secondsurgency
Infra engineerBusiness analyst
Do I launch a new car portal
in Mexico?
• Size of the prize of online cars market in Mexico
• Cost of success• Chance of success• Available war chest
Shall I invest more in online
or in offline marketing?
• ROI of a typical offline marketing campaign
• ROI of continuous online marketing
• Expected reach of both offline and online marketing
Should I fire CS agent #253?
• Average # of ads moderated by agent #6 in the last month
• Average # of ads moderated by the team in the last month
• Error rate agent #6• Error rate of the team
Is the platform still online?
Can I predict which listings
have the highest probability to sell?
• Requests per second• Post per second
• Detailed properties of all listings (# pictures, attributes, length of the the title, etc)
• All individual replies, including related buyers and seller data
![Page 12: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/12.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
![Page 13: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/13.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
![Page 14: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/14.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Disc
42
is broken
![Page 15: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/15.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
![Page 16: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/16.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
![Page 17: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/17.jpg)
The OLX data iceberg model
![Page 18: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/18.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
OLX data lake
The old school way of providing data…
![Page 19: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/19.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Platform A Platform B Any other stack
Top management
Business owner
Data analyst
Global reporting
Data self service tool
Operational dashboards
Data warehouses
Data reservoir A Data reservoir B
Data reservoir C Data reservoir D Data reservoir E
Global BI
Local reporting
![Page 20: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/20.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Platform A Platform B Any other stack
Top management
Business owner
Global reporting
Data self service tool
Operational dashboards
Data reservoir A Data reservoir B
Data reservoir C Data reservoir D Data reservoir E
Global BI
Local reporting
![Page 21: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/21.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Platform A Platform B Any other stack
Top management Global reporting
Data self service tool
Operational dashboards
Data reservoir A Data reservoir B
Data reservoir C Data reservoir D Data reservoir E
Global BI
![Page 22: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/22.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Platform A Platform B Any other stack
Data self service tool
Operational dashboards
Data reservoir A Data reservoir B
Data reservoir C Data reservoir D Data reservoir E
Global BI
![Page 23: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/23.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Platform A Platform B Any other stack
Top management
Business owner
Data analyst
Global reporting
Data self service tool
Operational dashboards
Data warehouses
Data reservoir A Data reservoir B
Data reservoir C Data reservoir D Data reservoir E
Global BI
Local reporting
Operations/ data scientist
Designated reservoirs
![Page 24: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/24.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Platform A Platform B Any other stack
Top management
Business owner
Data analyst
Global reporting
Data self service tool
Operational dashboards
Data warehouses
Data reservoir A Data reservoir B
Data reservoir C Data reservoir D Data reservoir E
Global BI
Local reporting
Operations/ data scientist
Designated reservoirs
![Page 25: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/25.jpg)
4 “V”s of Big Datatop challenges of data processing
![Page 26: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/26.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
![Page 27: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/27.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Volumedata at rest
AmazonS3
Amazon Redshift
Amazon EMR
![Page 28: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/28.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
![Page 29: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/29.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Velocitydata in flight
Amazon Kinesis
AmazonSQS
![Page 30: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/30.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
![Page 31: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/31.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Varietydata in many shapes
AWS Glue
![Page 32: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/32.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
![Page 33: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/33.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Veracitydata quality
AmazonAthena
![Page 34: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/34.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Volume
Data at rest
Terabytes of existing, historical data that
needs to be stored for extended period
Velocity
Data in flight
Streaming, near real-time data sources, short
reaction and computation time required
Variety
Data types
Platform databases, user behaviour, infrastructure
monitoring, pictures
Veracity
Data quality
Tracking coverage, downtime,
implementation errors, schema changes
![Page 35: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/35.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data democratization
Data understandingData access
Volume Velocity Variety Veracity
![Page 36: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/36.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
producers
complianceconsumers
![Page 37: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/37.jpg)
Data democratization at OLXarchitecture overview
![Page 38: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/38.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Platform A Platform B Any other stack
Top management
Business owner
Data analyst
Global reporting
Data self service tool
Operational dashboards
Data warehouses
Data reservoir A Data reservoir B
Data reservoir C Data reservoir D Data reservoir E
Global BI
Local reporting
![Page 39: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/39.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
producers
complianceconsumers
![Page 40: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/40.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The OLX challenge:
‘Give everybody the data that he or she
needs’ (but also not much more)
![Page 41: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/41.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data lakeCatalog
Reservoir
Reservoir
Reservoir
Reservoir
Reservoir
Reservoir
European BI
Global BI
Marketing
Trust and Safety
Personalization
you name it
![Page 42: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/42.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Reservoir structure - S3 bucket
/in Incoming raw files in JSON format.Files appear right after saving in data lake.
/out Outgoing raw files in JSON format.Files will be copied to data lake in real time.
/parquet Incoming files in Parquet format.Files are partitioned hourly.
/tmp Folder for temporary files, can be used for higher-level data processing apps.
Reservoirs have individual data retention policies attached.
![Page 43: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/43.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Architecture overview
data lake
crawler
catalog.api
catalog.frontend
users
olxgroup-reservoir-???
log of new files
data pumpparker
IAMrole
log of new files
![Page 44: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/44.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Pump – raw data pre-processor
● Technology: Scala, Spark Streaming, EMR● Type: CPU intensive● Cluster:
○ Master: c4.xlarge○ Core: 15 * c4.xlarge○ Spot
● Throughput:○ 220K files / hour○ 280GB (compressed) / hour
● Price: $1500 / month
![Page 45: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/45.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Parker – raw-to-parquet converter
● Technology: Python3, PySpark, EMR● Type: Memory intensive● Cluster:
○ Master: r4.xlarge○ Core: 10 * r4.xlarge○ Auto scaling○ Spot
● Price: $1100 / month
![Page 46: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/46.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
![Page 47: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/47.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Platform A Platform B Any other stack
Top management
Business owner
Data analyst
Global reporting
Data self service tool
Operational dashboards
Data warehouses
Data reservoir A Data reservoir B
Data reservoir C Data reservoir D Data reservoir E
Global BI
Local reporting
![Page 48: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/48.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Platform A Platform B Any other stack
Top management
Business owner
Global reporting
Data self service tool
Operational dashboards
Data reservoir A Data reservoir B
Data reservoir C Data reservoir D Data reservoir E
Global BI
Local reporting
![Page 49: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/49.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Usage examples – Business intelligence
Reservoir
Amazon Redshift
![Page 50: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/50.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Usage examples – Personalization & Relevance
Reservoir
Amazon Redshift
![Page 51: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/51.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Usage examples – User communication
Reservoir
Amazon EMR
Amazon Redshift
![Page 52: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/52.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Usage examples – Exploration and monitoring
AmazonAthena
Reservoir
![Page 53: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/53.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Platform A Platform B Any other stack
Top management
Business owner
Global reporting
Data self service tool
Operational dashboards
Data reservoir A Data reservoir B
Data reservoir C Data reservoir D Data reservoir E
Global BI
Local reporting
![Page 54: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/54.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The (data) Theory of Everything:
~ ‘Everything’ is over-rated, nobody needs everything
![Page 55: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/55.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Key takeaways
• Not everybody needs all data• Different stakeholders need different data solutions• When it comes to user data, go with privacy by design
and default• Make sure to follow AWS Well-Architected framework• Use spot instances and auto scaling where possible – it
will help you focus on fault tolerance, and you will save money in return
![Page 56: The OLX data theory of everything - aws-de-media.s3 ...aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/June7/Apollo/The... · © 2018, Amazon Web Services, Inc. or its affiliates](https://reader030.vdocuments.net/reader030/viewer/2022041203/5d5170dc88c993da708b9e2d/html5/thumbnails/56.jpg)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
twitter blog open roles
@olxtechberlin tech.olx.com olxgroup.com