tectonic summit 2016: ticketmaster's public cloud & kubernetes strategy
TRANSCRIPT
Ticketmaster - CoreOS Tectonic Summit 2016
COREOS TECTONIC SUMMITDECEMBER 12, 2016
Ticketmaster - CoreOS Tectonic Summit 2016
JUSTIN DEAN● SVP, Platform & Technical Operations● ~1.75 Years at Ticketmaster● Passionate about building high
performance organizations ● Nerdy about automating my beer &
BBQ pipeline (see PitmasterPi on github)
Ticketmaster - CoreOS Tectonic Summit 2016
OUR STORY● About Ticketmaster● Our Journey● Large Enterprise Challenges &
Lessons Learned● Why Kubernetes● CoreOS Partnership● Up Next
Ticketmaster - CoreOS Tectonic Summit 2016
● Publicly Traded Company (LYV)● $7.6B Revenue● $25B in GTV (Gross Transaction Value)●Top 5 eCommerce site
ABOUT USHISTORY● 1976 - Founded at Arizona State University● 1996 - Ticketmaster.com launched● 2010 - Live Nation and Ticketmaster join forces to
power live experiences
Ticketmaster - CoreOS Tectonic Summit 2016
We power unforgettable moments of joy!
Ticketmaster - CoreOS Tectonic Summit 2016
Concerts, Sports, Arts & Theater, Small Venues & Clubs
Ticketmaster - CoreOS Tectonic Summit 2016
Ticketmaster - CoreOS Tectonic Summit 2016
TECH COMPLEXITY
Ticketmaster - CoreOS Tectonic Summit 2016
● Every era of software, many not ready for containers and cloud
● 1970s: Custom VMS OS on Emulated VAX (The Host)
● 2000s: Xen Cloud, Big-Iron Filers, NFS, custom built infrastructure
PRE-MODERN TECHNOLOGY Tech Museum
Ticketmaster - CoreOS Tectonic Summit 2016
TECH SCALE● 21 Ticketing Systems and over 250
unique products● 1,400+ people in Product & Tech org● Custom Private Cloud with over 22,000
VMs across 7 global data centers● Over 15,000+ network endpoints across
the world (Venues, Arenas, Kiosks, etc)● Over 60% VM growth in last year
1 BILLION MACHINES!!*
*Not really :)
Ticketmaster - CoreOS Tectonic Summit 2016
{
Onsales = Black Friday every day!● Huge spikes / demand for
tickets● Global company = across time
zones● Limited inventory (Beyonce
Tickets!)● Multiple sales channels
0 to 150M transactions in minutes! That’s a spike of >8 GBps !!!!!
Self Inflicted DDOS-as-a-Business
BIG SCALE, BIG CHALLENGES
Ticketmaster - CoreOS Tectonic Summit 2016
COMPETITION
Ticketmaster - CoreOS Tectonic Summit 2016
● Market leader with huge surface area ● Competitors of every size and shape ● Speed and agility are absolutely key● Scale and complexity of 40-year old business make rapid changes very
hard
COMPETITIVE LANDSCAPE
&
Ticketmaster - CoreOS Tectonic Summit 2016
TO RECAP...
Public company / market pressure / highly competitive landscape
Legacy tech, not ready for containers
Tech debt with high interest rates
Huge scale and complexity
Black Friday every day
Ticketmaster - CoreOS Tectonic Summit 2016
MUST
GETFASTER!
Ticketmaster - CoreOS Tectonic Summit 2016
SIMPLIFY OUR PLATFORMMore Revenue and
Market Share
Better Products & Features
Deliver Products Faster
Autonomous Product Teams
Simplify Our Platform
Ticketmaster - CoreOS Tectonic Summit 2016
OUR JOURNEY
Ticketmaster - CoreOS Tectonic Summit 2016
OUR JOURNEY
Self-d
isrup
tion
Lean
Transf
ormati
on
Autono
mous D
elivery
Teams
Public
Cloud
Kube
rnetes
2013 2016 2017WE ARE HERE
Ticketmaster - CoreOS Tectonic Summit 2016
SELF-DISRUPTIONSe
lf-
disru
ption
Lean
Transf
ormati
on
Auton
omou
s
Delivery
Team
s
Public
Cloud
Kube
rnetes
2013
2016
2017
Ticketmaster - CoreOS Tectonic Summit 2016
LEAN TRANSFORMATION● Laser focused on highest priorities● Created 65+ cross-functional delivery teams● Eventually all roads led to “blocked by ops”● Got faster at developing; did not get faster at delivering
Self-d
isrupti
on
Lean
Trans
form
ation
Auton
omou
s
Delivery
Team
s
Public
Cloud
Kube
rnetes
2013
2016
2017
Ticketmaster - CoreOS Tectonic Summit 2016
AUTONOMOUS DELIVERY TEAMS● Moved application support teams out of TechOps and into the
product teams directly● Embedded Systems Engineers into product delivery teams
(closer to truly “cross-functional”)● Self-Service Tools: Surge towards getting teams out of the ops
business● Self-Sufficient businesses (build it, run it, own it, optimize it,
monetize it)
Self-d
isrupti
on
Lean
Transf
ormati
on
Auton
omou
s
Delive
ry Te
ams
Public
Cloud
Kube
rnetes
2013
2016
2017
Microbusiness
Ticketmaster - CoreOS Tectonic Summit 2016
TRANSFORMATION INSIGHTSRealized our ability to innovate is dampened by our overly complex software factory:
30-50%Of development time spent
moving code around ($60M-$90M problem)
150Custom-built
ways to release products (often
manually)
~50%Incidents were preventable; mostly self-
inflicted
Self-d
isrupti
on
Lean
Transf
ormati
on
Auton
omou
s
Delivery
Team
s
Public
Cloud
Kube
rnetes
2013
2016
2017
Ticketmaster - CoreOS Tectonic Summit 2016
PUBLIC CLOUD
Self-d
isrupti
on
Lean
Transf
ormati
on
Auton
omou
s
Delivery
Team
s
Publi
c Clou
d
Kube
rnetes
2013
2016
2017
Vehicle for deep introspection of every product
Immediate access to infrastructure as APIs
Forcing function to modernize all products to cloud native standard (all the *.-ilities)
Public Cloud = Huge carbon filter
Ticketmaster - CoreOS Tectonic Summit 2016
CLOUD ENABLEMENT TEAM ● Small team of experts dedicated to developing:
▪ Future state architecture▪ Path to Public Cloud▪ Cloud Native Solution Patterns ▪ Cure us of our on-prem addiction (NFS, Always scaled, HW reliance, SW trees,
etc)● Provide Self-Service tooling and documentation for those solutions ● Enable teams to:
▪ Raise their tech maturity▪ Containerize and retool their app ▪ Migrate themselves to the cloud
Self-d
isrupti
on
Lean
Transf
ormati
on
Auton
omou
s
Delivery
Team
s
Publi
c Clou
d
Kube
rnetes
2013
2016
2017
Ticketmaster - CoreOS Tectonic Summit 2016
CLOUD ENABLEMENT METHOD7 “Simple” Steps:1. Containerize your app; use CoreOS2. Terraform your infrastructure3. Instrument everything, rich telemetry - no SSH or RDP! 4. Use synthetic monitoring to understand the health of your product5. Security, security, security6. Design shared-nothing architecture (no NFS)7. Build for availability - no single points of failure
Self-d
isrupti
on
Lean
Transf
ormati
on
Auton
omou
s
Delivery
Team
s
Publi
c Clou
d
Kube
rnetes
2013
2016
2017
Ticketmaster - CoreOS Tectonic Summit 2016
READY TO ROLL● Highly skilled team ● Modern new stack architecture● Comprehensive DIY toolkit/software● 1,000+ pages of detailed documentation and solution patterns
Self-d
isrupti
on
Lean
Transf
ormati
on
Auton
omou
s
Delivery
Team
s
Publi
c Clou
d
Kube
rnetes
2013
2016
2017
Ticketmaster - CoreOS Tectonic Summit 2016
Everybody has a plan until they get punched in the
face.- Mike Tyson
Ticketmaster - CoreOS Tectonic Summit 2016
LEARNINGS
Public Cloud
$$
$
$
$$
$
$
$
$
$$$
$
$$$
$
$$
$$
$
$
Learn the API's/Primitives, Learn to build Infrastructure,Learn to code it in Terraform
Programmatic Checkout Page
65,000 permutations on how to
use AWS service
offerings =
64,999 ways to get
it wrong
Rich set of Primitives and API's
100's of Devs, different tech stacks
Ticketmaster - CoreOS Tectonic Summit 2016
LEARNINGS SUMMARY● Huge learning curve● Hard to manage distributed systems at scale● Wrong people to build & optimize infrastructure (across 100+
teams)● Baking purchasing decisions into distributed terraform code is BAD
...Spending too much time writing software to deploy software
instead of writing software to make money
Ticketmaster - CoreOS Tectonic Summit 2016
SOLUTION: CONTAINER ORCHESTRATION● Abstract complexities of infrastructure from development teams,
including how to:▪ Design▪ Deploy▪ Purchase▪ Optimize
● Allows us to easily manage distributed systems at scale
Ticketmaster - CoreOS Tectonic Summit 2016
WE CHOSE KUBERNETES● Kubernetes started organically appearing all over our company● Ahead of other container management platform and rapidly
improving● Amazing community with hockey-stick velocity ● Kubernetes APIs and primitives are sweet!
▪ Iteration time is seconds VS minutes▪ Automated rollbacks▪ Scaling and self-healing are much faster than ASG’s
● Kubernetes gets us much better utilization of our EC2 instances● Successfully used it to solve a major stability issues
Ticketmaster - CoreOS Tectonic Summit 2016
OPENTSDB ON KUBERNETES● Critical system for application monitoring
▪ 500k metrics per second● Large queries during ticketing sales were DDOS’ing OpenTSDB
services● Kubernetes pod health checks detect this and restart the failed
containers● Kubernetes primitives took a service that required hand holding to
something that manages itself● Learning Moment! A reboot from an automated OS upgrade
required manual intervention
Ticketmaster - CoreOS Tectonic Summit 2016
SIMPLIFICATION WITH KUBERNETES
Public Cloud
$$
$
$
$$
$$
$
$
$$$$
$$$
$
$$
$$
$
$
Public CloudKubernetes cluster optimized by Cluster Ops team
Kubernetes APIs / abstraction
Homogenized deployments via Kubernetes
$ $$ $
$
Ticketmaster - CoreOS Tectonic Summit 2016
KUBERNETES PROJECTGOAL: Deploy a Ticketmaster product into a production-grade Kubernetes cluster and equip team with the skills required to support its operation.
● Fully-remote team of 6● Tons of work!
▪ How many clusters to build?▪ Which architecture is right for us?▪ How should we deploy and test the cluster?▪ Which networking option to use inside of AWS?
Ticketmaster - CoreOS Tectonic Summit 2016
QUESTIONS● Kubernetes @scale best practices and pitfalls
▪ Kubernetes @Ticketmaster Roadmap: − Documented Reference Architecture specific to
Ticketmaster based on all the below that includes answers to any below questions. We need a documented roadmap for the team to start building based on Apprenda Experience/Reference architecture.
▪ Guidance on what goes in K8S and what should not (if anything)
▪ What have we missed? What didn’t we ask?▪ Best practices around secrets; how do companies manage this
at scale? Risks, alternatives, etc.?▪ Kubernetes upgrades, possible w/o downtime?▪ Insight on cloud primitives that are not K8S managed
(Lambda, S3, SQS, KMS, RDS, etc….). What are other companies doing here? Are some of these on the K8S roadmap to orchestrate? Are these resources managed by “clusterops”, or do delivery teams self-build outside the k8s workflow? This is called the K8s service catalog
▪ What do they recommend for configuring containers within kubernetes
▪ How do they recommend granting iam roles to containers● Kubernetes cross-domain (AWS/onprem/other cloud) insight
▪ Good idea? Possible pitfalls?▪ How to front end AWS and Onprem so we can dynamically run
HOT aws expensive stuff on onprem behind the scenes▪ Cross AWS region?▪ If we run Kubernetes in Equinix, how do they recommend
logging into ECR with Kubernetes● Cluster Networking
▪ What do they recommend for loadbalancer services in aws▪ Overlay networking▪ Software defined firewall▪ Best ecosystem components (calico vs x, etc)
● Team / Operations▪ How do engineering teams interactive with the cluster, kubectl
on their laptops? Probably not▪ How long do they see it taking to build enough knowledge for
production support of k8s▪ Insight on other companies K8S support models (what does
ops do, what does devops do, what are the governance models)
▪ Understanding of Implications on chargeback in AWS. How much effort goes into tagging and reporting on ephemeral resources (containers) that move around on AWS primitives (EC2 instances)
● CET (cloud enablement team)▪ How to marry it into our CET strategy, specifically Terraform▪ Help on rollout strategy. Start working in context with early
adopter enthusiastic teams asap OR wait until we have it more ‘operationally mature’. Both tactics have merit, help us think through the strategy here.
● Persistent storage, period. ▪ Torus, Ceph, EFS, NFS, Gluster, portworx ; pros / cons▪ Databases (large/shared) on k8s?▪ Other persistent workloads: elastic, cache, message bus, etc..
● Ongoing Apprenda Engagement▪ Information regarding their consulting offerings/ prices/
models of engagement. On prem team? Support team? Customized kubernetes solutions and maintenance.
▪ Connect us to peer group in Kubernetes space● Should we just leverage Tectonic? ● Archtics (massive legacy windows/powerbuilder/sybase/rdp over internet
to sports teams) Help● Prometheus help
overlay networking?
Calico?
Flannel?
VPC networking? Canal? cluster ops
team?
Linkerd?
auth?
how many etcd
nodes?
Terraform vs Kube
API?
Prometheus?
24/7 support
?
Ticketmaster - CoreOS Tectonic Summit 2016
COMMUNITY ENGAGEMENT● Spent time with CoreOS, Kelsey
Hightower, Apprenda● Attended conferences● Hosted Meetups● Joined SIGs● Joined
Ticketmaster - CoreOS Tectonic Summit 2016
MILESTONESSimple Kubernetes
cluster
Operationalize Kubernetes
Enterprise Ready / HA Kubernetes Cluster
Address consumability by appsOn-call production support
First customers golive on Kubernetes
Expand!
1
2
3
45
6
*
Ticketmaster - CoreOS Tectonic Summit 2016
WORK BEGINS...BUT● Continued to identify new questions● Had not figured out operational support● Needed enterprise-level features (auth)● Needed answers based on experience; not theory● Needed to accelerate implementation
Ticketmaster - CoreOS Tectonic Summit 2016
STRATEGIC PARTNERSHIP
Ticketmaster - CoreOS Tectonic Summit 2016
MILESTONES✔ Simple Kubernetes
cluster
Operationalize Kubernetes
Enterprise Ready / HA Kubernetes Cluster
Address consumability by appsOn-call production support
First customers golive on Kubernetes
Expand!
1
2
3
45
6
*
Ticketmaster - CoreOS Tectonic Summit 2016
WHY TECTONIC● Vanilla upstream Kubernetes - No lock in● Immediate enterprise level confidence● Supported reference architecture (instead of DIY)● Recommendations on operational practices, service provider
integration, third party add-ons, etc. ● Production Go-Live Support● Automatic OS Updates! *Bummer, no more fun upgrade projects!
Ticketmaster - CoreOS Tectonic Summit 2016
COREOS PARTNERSHIP● Providing input on Tectonic roadmap● Influence the roadmap for things that REALLY matter to Enterprises● Jointly solve Enterprise + Web Scale challenges● Help foster the Enterprise Kubernetes community
Ticketmaster - CoreOS Tectonic Summit 2016
NEW TICKETMASTER WEB PLATFORM ON K8SBefore:
● Semi-manual stack creation, bespoke cloudformation + python boto scripts = 20+ mins to deploy
● Low Confidence
Now: ● K8S + Tectonic, fully
automated = 60 second app updates
● High Confidence● Unlocked Daily Delivery
Culture
Ticketmaster - CoreOS Tectonic Summit 2016
LET THE MAKERS MAKE
● We have an amazing company of Makers, Creators, Visionaries
● We must create the space for them to innovate and deliver great solutions to the market
Ticketmaster - CoreOS Tectonic Summit 2016
RECAP● Use Kubernetes to abstract infrastructure
complexities● Have a cluster ops team do the
optimization voodoo; not everyone else● Stop wasting effort writing software to
deploy software ● Let the Makers Make! Give time and
mindshare back to your most valuable asset (your people) to do what they do best: Make Things!
Ticketmaster - CoreOS Tectonic Summit 2016
TICKETMASTER KUBERNAUTSStop by and say hi during the break!
&Join us at the Sysdig/CoreOS/Ticketmaster
party tonight!Food, drinks, LIVE BAND!!
Justin Dean Kraig Amador Abe Ingersoll Bindi BelangerJean-François Nadeau
Ticketmaster - CoreOS Tectonic Summit 2016
[email protected]@justinmdean