operating samza at skyscanner
TRANSCRIPT
01020304050607
IntroductionUnified log in SkyscannerBasics of SamzaUse cases for stream processing in SkyscannerDeployment & Local development environmentMonitoringFuture
Agenda
Introduction
• Skyscanner is a travel search company with over 50m UMVs and over 700 employees globally.
• Joseph Francis, Senior Software Engineer in Skyscanner
• Some use cases in Skyscanner
• Make samza jobs easily deployable and operable in a multi-tenant cluster
Past
• One (big) monolith SQL database for reporting and monitoring
• Central team to deliver data needs for the organization
• Not yet jumped into the bandwagon of large scale batch processing
Key Points
• Samza consumes 1 message at a time with at-least once delivery guarantee
• Single thread of execution
• API offers init(), process() and window() methods
• State management with embedded key-value store
Use Cases
• Building a user timeline
• Data enrichment downstream
• Stream join and windowed aggregations
Current Deployment
• No centralised configuration
• Restrictive source folder structure
• Ansible deployment scripts were embedded with the samza job
Application Logs
• Application logs forwarded to elasticsearch through logstash
• Requires a shared format for logging (log4j.xml)
• Yarn UI is not the most intuitive!
Future
• More generic jobs
• Developers should only worry about writing code
• Fully automated production deployment
• Cross the boundaries of Batch vs Streaming?