scalable eventing over apache mesos

20
© 2015 Autodesk Scalable Eventing Over Mesos Olivier Paugam SW Architect / Autodesk Cloud Big Data Montreal

Upload: olivier-paugam

Post on 11-Jan-2017

5.361 views

Category:

Engineering


1 download

TRANSCRIPT

Page 1: Scalable Eventing Over Apache Mesos

© 2015 Autodesk

Scalable Eventing Over Mesos

Olivier PaugamSW Architect / Autodesk Cloud

Big Data Montreal

Page 2: Scalable Eventing Over Apache Mesos

© 2015 Autodesk

Goals & Challenges

Page 3: Scalable Eventing Over Apache Mesos

© 2015 Autodesk 3

The Mission

General purpose, high-volume eventing system. Batch oriented I/O. Target audience: 20+ teams within Autodesk. Must be active/active across multiple data-centers. Must be able to scale at any time. Must be able to absorb traffic spikes. Must be accessible via a single API. Must be secure (transport + data at rest). Must not be tied to a specific provider.

Page 4: Scalable Eventing Over Apache Mesos

© 2015 Autodesk 4

A Few Use Cases

Application log pre-aggregation transport. Metering updates from our Platform API. Analytics transport prior to indexing. Event transport for Search, Activity & other services. Identity updates down to our IT systems. Editing increments for large 3D model collaboration.

Page 5: Scalable Eventing Over Apache Mesos

© 2015 Autodesk 5

Our 5 Technical Commandments

Must use Docker. Must run on Apache Mesos + Marathon. Must leverage Apache Kafka. Must be as autonomous & low-maintenance as possible. No automation scripting allowed (Chef, Salt, Ansible…).

Page 6: Scalable Eventing Over Apache Mesos

© 2015 Autodesk

Introducing Ochopod

Page 7: Scalable Eventing Over Apache Mesos

© 2015 Autodesk 7

Ochopod

100% Open Source ! Novel container-centric orchestration model. Mix between a discovery & an init system. No need for dedicated frameworks. Direct Peer To Peer HTTP I/O. Can run on Mesos, K8S, etc. Relies on ZK.

Page 8: Scalable Eventing Over Apache Mesos

© 2015 Autodesk 8

The Stack

Page 9: Scalable Eventing Over Apache Mesos

© 2015 Autodesk 9

How Does It Work ?

Source of truth : Zookeeper. Each container belong to a “cluster”. A “leader” is picked per cluster. Leaders manage their peers via HTTP I/O. Settings passed via environment vars. Eventually consistent.

Page 10: Scalable Eventing Over Apache Mesos

© 2015 Autodesk 10

Proxy approach. 100% Mesos+Ochopod. Used for CI/CD as well. Proxy running on an edge node. Could easily factor OAUTH2 in. Access via direct HTTPS or using a CLI. Toolkit to deploy, list, query, kill & update containers

A quick DYI Mini-PaaS

Page 11: Scalable Eventing Over Apache Mesos

© 2015 Autodesk

Building verticals at scale

Page 12: Scalable Eventing Over Apache Mesos

© 2015 Autodesk 12

Architecture

Page 13: Scalable Eventing Over Apache Mesos

© 2015 Autodesk 13

Phone Switch & State Machines

Page 14: Scalable Eventing Over Apache Mesos

© 2015 Autodesk 14

Going Global

Page 15: Scalable Eventing Over Apache Mesos

© 2015 Autodesk 15

Shooting For Higher Scales

Unit of scale == 1 Kafka topic. Keep the pressure on each broker constant. Every sub-system can be scaled independently. API protocol designed to account for nodes shutting down. Mix of horizontal scaling & sharding via RabbitMQ. Checkpoints + idempotency + state-machines. Ochopod is critical to enable scaling.

Page 16: Scalable Eventing Over Apache Mesos

© 2015 Autodesk

Conclusion

Page 17: Scalable Eventing Over Apache Mesos

© 2015 Autodesk 17

6 man/month effort. 6 open-sourced 3rd-parties (Kafka, Zookeeper, RabbitMQ...). 3 deployments over 2 data-centers, using DCOS. 36+ c3.2xlarge CoreOS slaves on AWS/EC2 + VPC. ~20 Kafka brokers, ~40 Play! Nodes. ~150 live containers. ~500 live streaming sessions at any time. ~30M events / ~65M API hits a day. < 5 minor incidents, no major incident to date. 1 single dev/op (!).

Page 18: Scalable Eventing Over Apache Mesos

© 2015 Autodesk 18

Issues & Next Steps

What does one do if a slave goes offline ? Need for better placement constraints. Need for better storage schemes. The K8S “pod” concept is cool after all... We could invest into a dedicated Mesos framework. What about Spot instances ?

Page 19: Scalable Eventing Over Apache Mesos

© 2015 Autodesk 19

https://github.com/autodesk-cloud/ochopod

Page 20: Scalable Eventing Over Apache Mesos

Autodesk is a registered trademark of Autodesk, Inc., and/or its subsidiaries and/or affiliates in the USA and/or other countries. All other brand names, product names, or trademarks belong to their respective holders. Autodesk reserves the right to alter product and services offerings, and specifications and pricing at any time without notice, and is not responsible for typographical or graphical errors that may appear in this document.

© 2015 Autodesk