tectonic summit 2016: ticketmaster's public cloud & kubernetes strategy

47
Ticketmaster - CoreOS Tectonic Summit 2016 COREOS TECTONIC SUMMIT DECEMBER 12, 2016

Upload: coreos

Post on 08-Jan-2017

63 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

COREOS TECTONIC SUMMITDECEMBER 12, 2016

Page 2: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

JUSTIN DEAN● SVP, Platform & Technical Operations● ~1.75 Years at Ticketmaster● Passionate about building high

performance organizations ● Nerdy about automating my beer &

BBQ pipeline (see PitmasterPi on github)

Page 3: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

OUR STORY● About Ticketmaster● Our Journey● Large Enterprise Challenges &

Lessons Learned● Why Kubernetes● CoreOS Partnership● Up Next

Page 4: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

● Publicly Traded Company (LYV)● $7.6B Revenue● $25B in GTV (Gross Transaction Value)●Top 5 eCommerce site

ABOUT USHISTORY● 1976 - Founded at Arizona State University● 1996 - Ticketmaster.com launched● 2010 - Live Nation and Ticketmaster join forces to

power live experiences

Page 5: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

We power unforgettable moments of joy!

Page 6: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

Concerts, Sports, Arts & Theater, Small Venues & Clubs

Page 7: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

Page 8: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

TECH COMPLEXITY

Page 9: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

● Every era of software, many not ready for containers and cloud

● 1970s: Custom VMS OS on Emulated VAX (The Host)

● 2000s: Xen Cloud, Big-Iron Filers, NFS, custom built infrastructure

PRE-MODERN TECHNOLOGY Tech Museum

Page 10: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

TECH SCALE● 21 Ticketing Systems and over 250

unique products● 1,400+ people in Product & Tech org● Custom Private Cloud with over 22,000

VMs across 7 global data centers● Over 15,000+ network endpoints across

the world (Venues, Arenas, Kiosks, etc)● Over 60% VM growth in last year

1 BILLION MACHINES!!*

*Not really :)

Page 11: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

{

Onsales = Black Friday every day!● Huge spikes / demand for

tickets● Global company = across time

zones● Limited inventory (Beyonce

Tickets!)● Multiple sales channels

0 to 150M transactions in minutes! That’s a spike of >8 GBps !!!!!

Self Inflicted DDOS-as-a-Business

BIG SCALE, BIG CHALLENGES

Page 12: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

COMPETITION

Page 13: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

● Market leader with huge surface area ● Competitors of every size and shape ● Speed and agility are absolutely key● Scale and complexity of 40-year old business make rapid changes very

hard

COMPETITIVE LANDSCAPE

&

Page 14: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

TO RECAP...

Public company / market pressure / highly competitive landscape

Legacy tech, not ready for containers

Tech debt with high interest rates

Huge scale and complexity

Black Friday every day

Page 15: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

MUST

GETFASTER!

Page 16: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

SIMPLIFY OUR PLATFORMMore Revenue and

Market Share

Better Products & Features

Deliver Products Faster

Autonomous Product Teams

Simplify Our Platform

Page 17: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

OUR JOURNEY

Page 18: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

OUR JOURNEY

Self-d

isrup

tion

Lean

Transf

ormati

on

Autono

mous D

elivery

Teams

Public

Cloud

Kube

rnetes

2013 2016 2017WE ARE HERE

Page 19: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

SELF-DISRUPTIONSe

lf-

disru

ption

Lean

Transf

ormati

on

Auton

omou

s

Delivery

Team

s

Public

Cloud

Kube

rnetes

2013

2016

2017

Page 20: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

LEAN TRANSFORMATION● Laser focused on highest priorities● Created 65+ cross-functional delivery teams● Eventually all roads led to “blocked by ops”● Got faster at developing; did not get faster at delivering

Self-d

isrupti

on

Lean

Trans

form

ation

Auton

omou

s

Delivery

Team

s

Public

Cloud

Kube

rnetes

2013

2016

2017

Page 21: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

AUTONOMOUS DELIVERY TEAMS● Moved application support teams out of TechOps and into the

product teams directly● Embedded Systems Engineers into product delivery teams

(closer to truly “cross-functional”)● Self-Service Tools: Surge towards getting teams out of the ops

business● Self-Sufficient businesses (build it, run it, own it, optimize it,

monetize it)

Self-d

isrupti

on

Lean

Transf

ormati

on

Auton

omou

s

Delive

ry Te

ams

Public

Cloud

Kube

rnetes

2013

2016

2017

Microbusiness

Page 22: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

TRANSFORMATION INSIGHTSRealized our ability to innovate is dampened by our overly complex software factory:

30-50%Of development time spent

moving code around ($60M-$90M problem)

150Custom-built

ways to release products (often

manually)

~50%Incidents were preventable; mostly self-

inflicted

Self-d

isrupti

on

Lean

Transf

ormati

on

Auton

omou

s

Delivery

Team

s

Public

Cloud

Kube

rnetes

2013

2016

2017

Page 23: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

PUBLIC CLOUD

Self-d

isrupti

on

Lean

Transf

ormati

on

Auton

omou

s

Delivery

Team

s

Publi

c Clou

d

Kube

rnetes

2013

2016

2017

Vehicle for deep introspection of every product

Immediate access to infrastructure as APIs

Forcing function to modernize all products to cloud native standard (all the *.-ilities)

Public Cloud = Huge carbon filter

Page 24: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

CLOUD ENABLEMENT TEAM ● Small team of experts dedicated to developing:

▪ Future state architecture▪ Path to Public Cloud▪ Cloud Native Solution Patterns ▪ Cure us of our on-prem addiction (NFS, Always scaled, HW reliance, SW trees,

etc)● Provide Self-Service tooling and documentation for those solutions ● Enable teams to:

▪ Raise their tech maturity▪ Containerize and retool their app ▪ Migrate themselves to the cloud

Self-d

isrupti

on

Lean

Transf

ormati

on

Auton

omou

s

Delivery

Team

s

Publi

c Clou

d

Kube

rnetes

2013

2016

2017

Page 25: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

CLOUD ENABLEMENT METHOD7 “Simple” Steps:1. Containerize your app; use CoreOS2. Terraform your infrastructure3. Instrument everything, rich telemetry - no SSH or RDP! 4. Use synthetic monitoring to understand the health of your product5. Security, security, security6. Design shared-nothing architecture (no NFS)7. Build for availability - no single points of failure

Self-d

isrupti

on

Lean

Transf

ormati

on

Auton

omou

s

Delivery

Team

s

Publi

c Clou

d

Kube

rnetes

2013

2016

2017

Page 26: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

READY TO ROLL● Highly skilled team ● Modern new stack architecture● Comprehensive DIY toolkit/software● 1,000+ pages of detailed documentation and solution patterns

Self-d

isrupti

on

Lean

Transf

ormati

on

Auton

omou

s

Delivery

Team

s

Publi

c Clou

d

Kube

rnetes

2013

2016

2017

Page 27: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

Everybody has a plan until they get punched in the

face.- Mike Tyson

Page 28: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

LEARNINGS

Public Cloud

$$

$

$

$$

$

$

$

$

$$$

$

$$$

$

$$

$$

$

$

Learn the API's/Primitives, Learn to build Infrastructure,Learn to code it in Terraform

Programmatic Checkout Page

65,000 permutations on how to

use AWS service

offerings =

64,999 ways to get

it wrong

Rich set of Primitives and API's

100's of Devs, different tech stacks

Page 29: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

LEARNINGS SUMMARY● Huge learning curve● Hard to manage distributed systems at scale● Wrong people to build & optimize infrastructure (across 100+

teams)● Baking purchasing decisions into distributed terraform code is BAD

...Spending too much time writing software to deploy software

instead of writing software to make money

Page 30: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

SOLUTION: CONTAINER ORCHESTRATION● Abstract complexities of infrastructure from development teams,

including how to:▪ Design▪ Deploy▪ Purchase▪ Optimize

● Allows us to easily manage distributed systems at scale

Page 31: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

WE CHOSE KUBERNETES● Kubernetes started organically appearing all over our company● Ahead of other container management platform and rapidly

improving● Amazing community with hockey-stick velocity ● Kubernetes APIs and primitives are sweet!

▪ Iteration time is seconds VS minutes▪ Automated rollbacks▪ Scaling and self-healing are much faster than ASG’s

● Kubernetes gets us much better utilization of our EC2 instances● Successfully used it to solve a major stability issues

Page 32: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

OPENTSDB ON KUBERNETES● Critical system for application monitoring

▪ 500k metrics per second● Large queries during ticketing sales were DDOS’ing OpenTSDB

services● Kubernetes pod health checks detect this and restart the failed

containers● Kubernetes primitives took a service that required hand holding to

something that manages itself● Learning Moment! A reboot from an automated OS upgrade

required manual intervention

Page 33: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

SIMPLIFICATION WITH KUBERNETES

Public Cloud

$$

$

$

$$

$$

$

$

$$$$

$$$

$

$$

$$

$

$

Public CloudKubernetes cluster optimized by Cluster Ops team

Kubernetes APIs / abstraction

Homogenized deployments via Kubernetes

$ $$ $

$

Page 34: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

KUBERNETES PROJECTGOAL: Deploy a Ticketmaster product into a production-grade Kubernetes cluster and equip team with the skills required to support its operation.

● Fully-remote team of 6● Tons of work!

▪ How many clusters to build?▪ Which architecture is right for us?▪ How should we deploy and test the cluster?▪ Which networking option to use inside of AWS?

Page 35: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

QUESTIONS● Kubernetes @scale best practices and pitfalls

▪ Kubernetes @Ticketmaster Roadmap: − Documented Reference Architecture specific to

Ticketmaster based on all the below that includes answers to any below questions. We need a documented roadmap for the team to start building based on Apprenda Experience/Reference architecture.

▪ Guidance on what goes in K8S and what should not (if anything)

▪ What have we missed? What didn’t we ask?▪ Best practices around secrets; how do companies manage this

at scale? Risks, alternatives, etc.?▪ Kubernetes upgrades, possible w/o downtime?▪ Insight on cloud primitives that are not K8S managed

(Lambda, S3, SQS, KMS, RDS, etc….). What are other companies doing here? Are some of these on the K8S roadmap to orchestrate? Are these resources managed by “clusterops”, or do delivery teams self-build outside the k8s workflow? This is called the K8s service catalog

▪ What do they recommend for configuring containers within kubernetes

▪ How do they recommend granting iam roles to containers● Kubernetes cross-domain (AWS/onprem/other cloud) insight

▪ Good idea? Possible pitfalls?▪ How to front end AWS and Onprem so we can dynamically run

HOT aws expensive stuff on onprem behind the scenes▪ Cross AWS region?▪ If we run Kubernetes in Equinix, how do they recommend

logging into ECR with Kubernetes● Cluster Networking

▪ What do they recommend for loadbalancer services in aws▪ Overlay networking▪ Software defined firewall▪ Best ecosystem components (calico vs x, etc)

● Team / Operations▪ How do engineering teams interactive with the cluster, kubectl

on their laptops? Probably not▪ How long do they see it taking to build enough knowledge for

production support of k8s▪ Insight on other companies K8S support models (what does

ops do, what does devops do, what are the governance models)

▪ Understanding of Implications on chargeback in AWS. How much effort goes into tagging and reporting on ephemeral resources (containers) that move around on AWS primitives (EC2 instances)

● CET (cloud enablement team)▪ How to marry it into our CET strategy, specifically Terraform▪ Help on rollout strategy. Start working in context with early

adopter enthusiastic teams asap OR wait until we have it more ‘operationally mature’. Both tactics have merit, help us think through the strategy here.

● Persistent storage, period. ▪ Torus, Ceph, EFS, NFS, Gluster, portworx ; pros / cons▪ Databases (large/shared) on k8s?▪ Other persistent workloads: elastic, cache, message bus, etc..

● Ongoing Apprenda Engagement▪ Information regarding their consulting offerings/ prices/

models of engagement. On prem team? Support team? Customized kubernetes solutions and maintenance.

▪ Connect us to peer group in Kubernetes space● Should we just leverage Tectonic? ● Archtics (massive legacy windows/powerbuilder/sybase/rdp over internet

to sports teams) Help● Prometheus help

overlay networking?

Calico?

Flannel?

VPC networking? Canal? cluster ops

team?

Linkerd?

auth?

how many etcd

nodes?

Terraform vs Kube

API?

Prometheus?

24/7 support

?

Page 36: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

COMMUNITY ENGAGEMENT● Spent time with CoreOS, Kelsey

Hightower, Apprenda● Attended conferences● Hosted Meetups● Joined SIGs● Joined

Page 37: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

MILESTONESSimple Kubernetes

cluster

Operationalize Kubernetes

Enterprise Ready / HA Kubernetes Cluster

Address consumability by appsOn-call production support

First customers golive on Kubernetes

Expand!

1

2

3

45

6

*

Page 38: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

WORK BEGINS...BUT● Continued to identify new questions● Had not figured out operational support● Needed enterprise-level features (auth)● Needed answers based on experience; not theory● Needed to accelerate implementation

Page 39: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

STRATEGIC PARTNERSHIP

Page 40: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

MILESTONES✔ Simple Kubernetes

cluster

Operationalize Kubernetes

Enterprise Ready / HA Kubernetes Cluster

Address consumability by appsOn-call production support

First customers golive on Kubernetes

Expand!

1

2

3

45

6

*

Page 41: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

WHY TECTONIC● Vanilla upstream Kubernetes - No lock in● Immediate enterprise level confidence● Supported reference architecture (instead of DIY)● Recommendations on operational practices, service provider

integration, third party add-ons, etc. ● Production Go-Live Support● Automatic OS Updates! *Bummer, no more fun upgrade projects!

Page 42: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

COREOS PARTNERSHIP● Providing input on Tectonic roadmap● Influence the roadmap for things that REALLY matter to Enterprises● Jointly solve Enterprise + Web Scale challenges● Help foster the Enterprise Kubernetes community

Page 43: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

NEW TICKETMASTER WEB PLATFORM ON K8SBefore:

● Semi-manual stack creation, bespoke cloudformation + python boto scripts = 20+ mins to deploy

● Low Confidence

Now: ● K8S + Tectonic, fully

automated = 60 second app updates

● High Confidence● Unlocked Daily Delivery

Culture

Page 44: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

LET THE MAKERS MAKE

● We have an amazing company of Makers, Creators, Visionaries

● We must create the space for them to innovate and deliver great solutions to the market

Page 45: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

RECAP● Use Kubernetes to abstract infrastructure

complexities● Have a cluster ops team do the

optimization voodoo; not everyone else● Stop wasting effort writing software to

deploy software ● Let the Makers Make! Give time and

mindshare back to your most valuable asset (your people) to do what they do best: Make Things!

Page 46: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

TICKETMASTER KUBERNAUTSStop by and say hi during the break!

&Join us at the Sysdig/CoreOS/Ticketmaster

party tonight!Food, drinks, LIVE BAND!!

Justin Dean Kraig Amador Abe Ingersoll Bindi BelangerJean-François Nadeau

Page 47: Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Ticketmaster - CoreOS Tectonic Summit 2016

[email protected]@justinmdean